Utah Arch: November 2011

I'm guessing everyone's digging themselves out after the ISCA deadline. I had promised not to talk about energy drinks and deadlines, so let's just say that our deadline week looked a bit like this :-)... without the funky math of course.

I'm constantly hounding my students to get their writing done early. Good writing doesn't magically show up 24 hours before the deadline. While a Results section will undoubtedly see revisions in the last few days, there's no valid excuse for not finishing most other sections well before the deadline.

The graph below shows two examples of how our writing evolved this year... better than in previous years, but still not quite early enough! We also labored for more than the usual amount to meet the 22-page budget... I guess we got a little spoilt after the 26-page HPCA'12 format. I am amazed that producing a refined draft one week before the deadline is an impossible task, but producing a refined 22-page document 3 minutes before the deadline is a virtual certainty. I suppose I can blame Parkinson's Law, not my angelic students :-), for accelerating my aging process...

We recently read a couple of SOSP papers in our reading group: RAMCloud and FAWN. These are terrific papers with significant implications for architects and application developers. Both papers target the design of energy-efficient and low-latency datacenter platforms for a new breed of data-intensive workloads. FAWN uses many wimpy nodes and a Flash storage system; RAMCloud replaces disk with DRAM. While the two papers share many arguments, I'll focus the rest of this post on RAMCloud because its conclusion is more surprising.

In RAMCloud, each individual server is effectively disk-less (disks are only used for back-up and not to service application reads and writes). All data is placed in DRAM main memory. Each server is configured to have high memory capacity and every processor has access via the network to the memory space of all servers in the RAMCloud. It is easy to see that such a system should offer high performance because high-latency disk access (a few milli-seconds) is replaced by low-latency DRAM+network access (micro-seconds).

An immediate architectural concern that comes to mind is cost. DRAM has a dollar/GB purchase price that is 50-100 X higher than that of disk. A server with 10 TB of disk space costs $2K, while a server with 64 GB DRAM and no disk costs $3K (2009 data from the FAWN paper). It's a little more tricky to compare the power consumption of DRAM and disk. An individual access to DRAM consumes much less energy than an individual access to disk, but DRAM has a higher static energy overhead (the cost of refresh). If the access rate is high enough, DRAM is more energy-efficient than disk. For the same server example as above, the server with the 10 TB high-capacity disk has a power rating of 250 W, whereas the server with the 64 GB high-capacity DRAM memory has a power rating of 280 W (2009 data from the FAWN paper). This is not quite an apples-to-apples comparison because the DRAM-bound server services many more requests at 280 W than the disk-bound server at 250 W. But it is clear that in terms of operating (energy) cost per GB, DRAM is again much more expensive than disk. Note that total cost of ownership (TCO) is the sum of capital expenditure (capex) and operational expenditure (opex). The above data points make it appear that RAMCloud incurs a huge penalty in terms of TCO.

However, at least to my initial surprise, the opposite is true for a certain large class of workloads. Assume that an application has a fixed high data bandwidth demand and this is the key determinant of overall performance. Each disk offers very low bandwidth because of the low rotational speed of the spindle, especially for random access. In order to meet the high bandwidth demands of the application, you would need several disks and several of the 250 W, 10 TB servers. If data was instead placed in DRAM (as in RAMCloud), that same high rate of data demand can be fulfilled with just a few 280 W, 64 GB servers. The difference in data bandwidth rates for DRAM and disk is over 600X. So even though each DRAM server in the example above is more expensive in terms of capex and opex, you'll need 600 times fewer servers with RAMCloud. This allows overall TCO for RAMCloud to be lower than that of a traditional disk-based platform.

I really like Figure 2 in the RAMCloud CACM paper (derived from the FAWN paper and reproduced below). It shows that in terms of TCO, for a given capacity requirement, DRAM is a compelling design point at high access rates. In short, if data bandwidth is the bottleneck, it is cheaper to use technology (DRAM) that has high bandwidth, even if it incurs a much higher energy and purchase price per byte.

Source: RAMCloud CACM paper

If architectures or arguments like RAMCloud become popular in the coming years, it opens up a slew of interesting problems for architects:

1. Already, the DRAM main memory system is a huge energy bottleneck. RAMCloud amplifies the contribution of the memory system to overall datacenter energy, making memory energy efficiency a top priority.

2. Queuing delays at the memory controller are a major concern. With RAMCloud, a single memory controller will service many requests from many servers, increasing the importance of the memory scheduling algorithm.

3. With each new DDR generation, fewer main memory DIMMs can be attached to a single high-speed electrical channel. To support high memory capacity per server, innovative channel designs are required.

4. If the Achilles' heel of disks is their low bandwidth, are there ways to design disk and server architectures that prioritize disk bandwidth/dollar over other metrics?

Friday, November 25, 2011

ISCA Deadline

Tuesday, November 8, 2011

Memory Has an Eye on Disk's Space