Sunday, January 15, 2012

Waking Up to Bottleneck Realities!

NSF is organizing a workshop on Cross-Layer Power Optimization and Management (CPOM) next month.  The workshop will identify important future research areas to help guide funding agencies and the research community at large.  Below is my position statement for the same.  While the DRAM main memory system is a primary bottleneck in most platforms, the computer architecture community has been slow to react with innovations that look beyond the processor chip.

SPOT QUIZ:     What is an SMB?
Hint:  It consumes 14 W of power and there could be 32 of these in an 8-socket system.

  • The memory system accounts for 20-40% of total system power.  Significant power is dissipated in DRAM chips, on-board buffer chips, and the memory controller.
  • Single DRAM chip power (Micron power calculator): 0.5 W.  On-board buffer chip power (Intel SMB datasheet): 14 W.  Memory controller power (Intel SCC prototype): 19-69% of chip power.
  • Future memory systems: 3D stacks, more on-board buffering, higher channel frequencies, higher refresh overheads.
  • And ... we have an off-chip memory bandwidth problem!  Pin counts have stagnated.
You Cannot Be Serious!!
(Exaggerated) State of Current Research: 
  • Number of papers on volatile memory systems:  ~1 per conference.
  • Number of papers on the processor-memory interconnect:  ~1 per year.
  • Number of papers that have to define the terms "rank" and "bank":  all of them.
  • Year of first processor paper to leverage DVFS: 2000.  Year of first memory paper to leverage DVFS: 2011.
  • Percentage of readers that have to look up the term "SMB":  > 89%.  (ok, I made up that fact :-) ... but I bet I'm right)

For every 1,000 papers written on the processor, 20 papers are written on the memory system, and 1 paper is written on the processor-memory interconnect.  This is absurd given that the processor and memory are the two fundamental elements of any computer system and memory energy can exceed processor energy.  While the routers in an NoC have been heavily optimized, the community understands very little about the off-chip memory channel.  The memory system is a very obvious fertile area for future research.

QUIZ 1:  
Most ISCA attendees know what a virtual channel is, but most would be hard-pressed to answer 2 of the following 5 basic memory channel questions:
  1. What is FB-DIMM?
  2. What is an SMI?
  3. Why are buffer chips placed between the memory controller and DRAM chips?
  4. What is SERDES and why is it important?
  5. Why do the downstream and upstream SMI channels have asymmetric widths?
QUIZ 2: 
Many ISCA attendees know the difference between PAp and GAg branch predictor configurations, but most will struggle to answer the following basic memory system questions:
  1. How many DRAM sub-arrays are activated to service one cache line request?
  2. What circuit implements the DRAM row buffer?
  3. Where is a row buffer placed?
  4. Why do DRAM chips not implement row buffer caches?
  5. What is overfetch?
  6. What is tFAW?
  7. Describe a basic algorithm to implement chipkill.  (What is chipkill?)
  8. What is scrubbing?

In early 2009 (before my foray into memory systems), I would have scored zero on both quizzes.  Such a level of ignorance is perhaps ok for esoteric topics... but unforgivable for a component that accounts for 25% of server power.

  • Identify a priority list of bottlenecks.  Step outside our comfort zone to learn about new system components.  Increase memory system coverage in computer architecture classes.
  • Find ways to address obvious sources of energy inefficiencies in the memory system:  reduce overfetch, improve row buffer hit rates, reduce refresh power.
  • Find ways to leverage 3D stacking of memory and logic.  Exploit 3D to take our first steps in the area of DRAM chip modifications (an area that has traditionally been off-limits).
  • Understand the considerations in designing memory channels and on-board buffer chips.  Propose new channel architectures and new microarchitectures for buffer chips.
  • Understand memory controller microarchitectures and design complexity-effective memory controllers.
  • Design new architectures that integrate photonics and NVMs in the memory system.