|Ikeda et al., Winners of the Performance Track.|
|Fang et al., Winners of the Energy-Delay Track.|
|Ishii et al., Winners of the Performance-Fairness Track.|
Some important take-home messages from the competition:
- It is nearly impossible to win a scheduler competition that implements a single new idea. A good scheduler is a combination of multiple optimizations. All of the following must be carefully handled: read/write interference, row buffer locality, early precharge, early activates, refresh, power-up/down, fairness, etc. The talk by Ishii et al. did an especially good job combining multiple techniques and breaking down the contribution of each technique. During the workshop break, some of the audience suggested building a Franken-scheduler (along the lines of Gabe Loh's Frankenpredictor) that combined the best of all submitted schedulers. I think that idea has a lot of merit.
- If I had to pick a single scheduling artifact that seemed to play a strong role in many submitted schedulers, it would be the smart handling of read/write interference. Because baseline memory write handling increases execution time by about 36%, it offers the biggest room for improvement.
- Our initial experiments seemed to reveal that all three metrics (performance, energy-delay, and performance-fairness) were strongly correlated, i.e., we expected that a single scheduler would win on all three metrics. It was a surprise that we had different winners in each track. Apparently, each winning scheduler had a wrinkle that gave it the edge for one metric.
- The current processor model is simple; it does not model instruction dependences and assumes that all memory operations within a reorder buffer window can be issued as soon as they are fetched. Adding support to model dependences would make the traces larger and slow down the simulator (hence, it wasn't done for the competition).
- We have already integrated USIMM with our version of Simics. This automatically takes care of the dependency problem in the previous bullet, but the simulations are much slower. In general, reviewers at top conferences will prefer the rigor of execution-driven simulation over the speed of trace-driven simulation. It would be worthwhile to understand how conclusions differ with the two simulation styles.
- The DRAMSim2 tool has an excellent validation methodology. We'd like to possibly re-create something similar for USIMM.
- We'd like to augment the infrastructure to support prefetched data streams.
- Any new scheduler would have to compare itself against other state-of-the-art schedulers. The USIMM infrastructure already includes a simple FCFS and an opportunistic close page policy, among others. All the code submitted to the competition is on-line as well. It would be good to also release a version of the TCM algorithm (MICRO'10) in the coming months.