Utah Arch: Memory Scheduling Championship Wrap-Up

The MSC Workshop was last week at ISCA, featuring a strong set of presentations. The MSC webpage has all the final results and papers. Congrats to the winners of each track! Thanks to Zeshan Chishti and Intel for sponsoring the trophies and award certificates (pictured below with the winners of each track). Numerically speaking, the workshop had about 30 attendees, we received 11 submissions, and the USIMM code has received 1254 downloads to date (261 downloads for version 1.3).

Ikeda et al., Winners of the Performance Track.

Fang et al., Winners of the Energy-Delay Track.

Ishii et al., Winners of the Performance-Fairness Track.

Some important take-home messages from the competition:

It is nearly impossible to win a scheduler competition that implements a single new idea. A good scheduler is a combination of multiple optimizations. All of the following must be carefully handled: read/write interference, row buffer locality, early precharge, early activates, refresh, power-up/down, fairness, etc. The talk by Ishii et al. did an especially good job combining multiple techniques and breaking down the contribution of each technique. During the workshop break, some of the audience suggested building a Franken-scheduler (along the lines of Gabe Loh's Frankenpredictor) that combined the best of all submitted schedulers. I think that idea has a lot of merit.
If I had to pick a single scheduling artifact that seemed to play a strong role in many submitted schedulers, it would be the smart handling of read/write interference. Because baseline memory write handling increases execution time by about 36%, it offers the biggest room for improvement.
Our initial experiments seemed to reveal that all three metrics (performance, energy-delay, and performance-fairness) were strongly correlated, i.e., we expected that a single scheduler would win on all three metrics. It was a surprise that we had different winners in each track. Apparently, each winning scheduler had a wrinkle that gave it the edge for one metric.

The USIMM simulation infrastructure seemed to work well for the competition, but there are a few things we'd like to improve upon in the coming months:

The current processor model is simple; it does not model instruction dependences and assumes that all memory operations within a reorder buffer window can be issued as soon as they are fetched. Adding support to model dependences would make the traces larger and slow down the simulator (hence, it wasn't done for the competition).
We have already integrated USIMM with our version of Simics. This automatically takes care of the dependency problem in the previous bullet, but the simulations are much slower. In general, reviewers at top conferences will prefer the rigor of execution-driven simulation over the speed of trace-driven simulation. It would be worthwhile to understand how conclusions differ with the two simulation styles.
The DRAMSim2 tool has an excellent validation methodology. We'd like to possibly re-create something similar for USIMM.
We'd like to augment the infrastructure to support prefetched data streams.
Any new scheduler would have to compare itself against other state-of-the-art schedulers. The USIMM infrastructure already includes a simple FCFS and an opportunistic close page policy, among others. All the code submitted to the competition is on-line as well. It would be good to also release a version of the TCM algorithm (MICRO'10) in the coming months.

If you have an idea for a future research competition, please email the JWAC organizers, Alaa Alameldeen (Intel) and Eric Rotenberg (NCSU).

Tuesday, June 19, 2012

Memory Scheduling Championship Wrap-Up

1 comment: