SPOT QUIZ: What is an SMB?
Hint: It consumes 14 W of power and there could be 32 of these in an 8-socket system.
FACTS:
Bottlenecks:- The memory system accounts for 20-40% of total system power. Significant power is dissipated in DRAM chips, on-board buffer chips, and the memory controller.
- Single DRAM chip power (Micron power calculator): 0.5 W. On-board buffer chip power (Intel SMB datasheet): 14 W. Memory controller power (Intel SCC prototype): 19-69% of chip power.
- Future memory systems: 3D stacks, more on-board buffering, higher channel frequencies, higher refresh overheads.
- And ... we have an off-chip memory bandwidth problem! Pin counts have stagnated.
You Cannot Be Serious!! |
- Number of papers on volatile memory systems: ~1 per conference.
- Number of papers on the processor-memory interconnect: ~1 per year.
- Number of papers that have to define the terms "rank" and "bank": all of them.
- Year of first processor paper to leverage DVFS: 2000. Year of first memory paper to leverage DVFS: 2011.
- Percentage of readers that have to look up the term "SMB": > 89%. (ok, I made up that fact :-) ... but I bet I'm right)
For every 1,000 papers written on the processor, 20 papers
are written on the memory system, and 1 paper is written on the
processor-memory interconnect. This is
absurd given that the processor and memory are the two fundamental elements of
any computer system and memory energy can exceed processor energy. While the routers in an NoC have been heavily
optimized, the community understands very little about the off-chip memory
channel. The memory system is a very
obvious fertile area for future research.
QUIZ 1:
Most ISCA attendees know what a virtual channel is, but most would be hard-pressed to answer 2 of the following 5 basic memory channel questions:
- What is FB-DIMM?
- What is an SMI?
- Why are buffer chips placed between the memory controller and DRAM chips?
- What is SERDES and why is it important?
- Why do the downstream and upstream SMI channels have asymmetric widths?
QUIZ 2:
Many ISCA attendees know the difference between PAp and GAg branch predictor configurations, but most will struggle to answer the following basic memory system questions:- How many DRAM sub-arrays are activated to service one cache line request?
- What circuit implements the DRAM row buffer?
- Where is a row buffer placed?
- Why do DRAM chips not implement row buffer caches?
- What is overfetch?
- What is tFAW?
- Describe a basic algorithm to implement chipkill. (What is chipkill?)
- What is scrubbing?
ACTION ITEMS FOR THE RESEARCH COMMUNITY:
- Identify a priority list of bottlenecks. Step outside our comfort zone to learn about new system components. Increase memory system coverage in computer architecture classes.
- Find ways to address obvious sources of energy inefficiencies in the memory system: reduce overfetch, improve row buffer hit rates, reduce refresh power.
- Find ways to leverage 3D stacking of memory and logic. Exploit 3D to take our first steps in the area of DRAM chip modifications (an area that has traditionally been off-limits).
- Understand the considerations in designing memory channels and on-board buffer chips. Propose new channel architectures and new microarchitectures for buffer chips.
- Understand memory controller microarchitectures and design complexity-effective memory controllers.
- Design new architectures that integrate photonics and NVMs in the memory system.