Wednesday, February 22, 2012


We've just released the simulation infrastructure for the Memory Scheduling Championship (MSC), to be held at ISCA this year.

The central piece in the infrastructure is USIMM, the Utah SImulated Memory Module.  It reads in application traces, models the progression of application instructions through the reorder buffers of a multi-core processor, and manages the memory controller read/write queues.  Every memory cycle, USIMM checks various DRAM timing parameters to figure out the set of memory commands that can be issued in that cycle.  It then hands control to a scheduler function that picks a command from this candidate set.  An MSC contestant will only have to modify the scheduler function, i.e., restrict all of your changes to scheduler.c and scheduler.h.  This clean interface makes it very easy to produce basic schedulers.  Each of my students produced a simple scheduler in the matter of hours; these have been included in the code distribution as examples to help get one started.

In the coming weeks, we'll release a number of traces that will be used for the competition.  The initial distribution includes five short single-thread traces from PARSEC that people can use for initial testing.

The competition will be judged in three different tracks: performance, energy-delay-product (EDP), and performance-fairness-product (PFP).  The final results will be based on the most current version of USIMM, as of June 1st 2012.

We request that contestants focus on scheduling algorithms that are easily implementable, i.e., doable within a few processor cycles and within a 68 KB storage budget.  A program committee will evaluate the implementability of your algorithm, among other things.

We'll post updates and bug fixes in the comments section of this blog post as well as to the mailing list (sign up here).  Users are welcome to use the blog or mailing list to post their own suggestions, questions, or bug reports.  Email if you have a question for just the code developers.

Code Updated on 04/17/2012:

Code download:

Changes in Version 1.1:
Changes in Version 1.2:
Changes in Version 1.3:

USIMM Tech Report:

The contest website:

Users mailing list sign-up:

Wednesday, February 8, 2012

Trip Report -- NSF Workshop -- WETI

I was at an NSF-sponsored Workshop on Emerging Technologies for Interconnects (WETI) last week that was attempting to frame important interconnect research directions.  I encourage everyone to check out the talk slides; talk videos will also soon be posted.  In the coming months, a detailed report will be written to capture the discussion.  This post summarizes some personal take-home messages.

1. Applications of Photonics: An important conclusion in my view was that photonics offers little latency, energy, and bandwidth advantage for on-chip communication.  Its primary advantage is for off-chip communication.  It is also worthwhile to look at limited long-distance on-chip communication with photonics.  For example, if a photonic signal has entered a chip, you might as well take the signal to a point near the destination, thus reducing the cost of global wire traversal.  Nearly half the workshop focused on photonics; many of the challenges appeared to be at the device level.

2. Processing in Memory: Our group has some initial work on processing-in-memory (PIM) with 3D chip stacks.  It was re-assuring to see that many people believe in PIM.  Because it reduces communication distance, it is viewed as a vital ingredient in the march towards energy-efficient exascale computing.  However, to distinguish such ideas from those in the 1990s, it is best to market them as "processing near memory". :-)

3. Micron HMC: The talk by Gurtej Sandhu of Micron had some great details on the Hybrid Memory Cube (HMC).  An HMC-based system sees significant energy contributions from the DRAM arrays, the logic layer on the 3D stack, and the host interface (the memory controller on the processor).  SerDes circuits account for 66% of the power in the logic layer.

4. Electrical Interconnect Scaling: Shekhar Borkar's talk was interesting as always.  He reiterated that mesh NoCs are overkill and hierarchical buses are the way forward.  The wire energy for a 16 mm traversal matches the energy cost per bit for a router; frequent routers therefore get in the way of energy efficiency.  He pointed out that the NoC in the Intel 80-core Polaris contributed 28% to chip power because the computational units were so simple.  The NoC in Intel's SCC chip consumes more power than the NoC in Polaris, but the overall contribution is lower (10%), because the cores are more beefy and realistic.  In moving from 45 nm to 7 nm, compute energy will reduce by 6x; correspondingly, the electrical interconnect energy to travel a fixed length on-chip reduces by only 1.6x and the energy for off-chip interconnect reduces by less than 2x.  So the communication energy bottleneck will grow, unless we can reduce communication and communication distances.

5. Miscellaneous: There was a buzz about near threshold computing (NTC).  It appears to be one of the few big arrows left in the quiver for processor energy efficiency.  It was also one of many techniques that Patrick Chiang mentioned for energy-efficient communication.  He also talked about low-swing, transmission lines, and wireless interconnects.  Pradip Bose's talk had lots of interesting power breakdowns, also showing trends for the IBM Power series.