Utah Arch: 2013

Wednesday, November 27, 2013

A DRAM Refresh Tutorial

In a recent project, we dug into the details of the DRAM refresh process. Unfortunately, there is no public document that explains this well. Hopefully, the description below helps fill this gap. It is possible that not all DRAM chips follow the procedure outlined below.

The well-known basics of DRAM refresh:

The charge on a DRAM cell weakens over time. The DDR standard requires every cell to be refreshed within a 64 ms interval, referred to as the retention time. At temperatures higher than 85° C (referred to as extended temperature range), the retention time is halved to 32 ms to account for the higher leakage rate. The refresh of a memory rank is partitioned into 8,192 smaller refresh operations. One such refresh operation has to be issued every 7.8 µs (64 ms/8192). This 7.8 µs interval is referred to as the refresh interval, tREFI. The DDR3 standard requires that eight refresh operations be issued within a time window equal to 8 x tREFI, giving the memory controller some flexibility when scheduling these refresh operations. Refresh operations are issued at rank granularity in DDR3 and DDR4. Before issuing a refresh operation, the memory controller precharges all banks in the rank. It then issues a single refresh command to the rank. DRAM chips maintain a row counter to keep track of the last row that was refreshed -- this row counter is used to determine the rows that must be refreshed next.

tRFC and recovery time:

Upon receiving a refresh command, the DRAM chips enter a refresh mode that has been carefully designed to perform the maximum amount of cell refresh in as little time as possible. During this time, the current carrying capabilities of the power delivery network and the charge pumps are stretched to the limit. The operation lasts for a time referred to as the refresh cycle time, tRFC. Towards the end of this period, the refresh process starts to wind down and some recovery time is provisioned so that the banks can be precharged and charge is restored to the charge pumps. Providing this recovery time at the end allows the memory controller to resume normal operation at the end of tRFC. Without this recovery time, the memory controller would require a new set of timing constraints that allow it to gradually ramp up its operations in parallel with charge pump restoration. Since such complexity can't be expected of every memory controller, the DDR standards include the recovery time in the tRFC specification. As soon as the tRFC time elapses, the memory controller can issue four consecutive Activate commands to different banks in the rank.

Refresh penalty:

On average, in every tREFI window, the rank is unavailable for a time equal to tRFC. So for a memory-bound application on a 1-rank memory system, the percentage of execution time that can be attributed to refresh (the refresh penalty) is tRFC/tREFI. In reality, the refresh penalty can be a little higher because directly prior to the refresh operation, the memory controller wastes some time precharging all the banks. Also, right after the refresh operation, since all rows are closed, the memory controller has to issue a few Activates to re-populate the row buffers. These added delays can grow the refresh penalty from (say) 8% in a 32 Gb chip to 9%. The refresh penalty can also be lower than the tRFC/tREFI ratio if the processors can continue to execute independent instructions in their reorder buffers while the memory system is unavailable. In a multi-rank memory system, the refresh penalty depends on whether ranks are refreshed together or in a staggered manner. If ranks are refreshed together, the refresh penalty, as above, is in the neighborhood of tRFC/tREFI. If ranks are refreshed in a staggered manner, the refresh penalty can be greater. Staggered refresh is frequently employed because it reduces the memory's peak power requirement.

Some refresh misconceptions:

We next describe how a few rows in all banks are refreshed during the tRFC period. As DRAM chip capacities increase, the number of rows on the chip also increases. Since the retention time (64 ms) and refresh interval (7.8 µs) have remained constant, the number of rows that must be refreshed in every refresh interval has increased. In modern 4Gb chips, eight rows must be refreshed in every bank in a single tRFC window. Some prior works have assumed that a row refresh is equivalent to an Activate+Precharge sequence for that row. Therefore, the refresh process was assumed to be equivalent to eight sequential Activate+Precharge commands per bank, with multiple banks performing these operations in parallel. However, DRAM chip specifications reveal that the above model over-simplifies the refresh process. First, eight sequential Activate+Precharge sequences will require time = 8 x tRC. For a 4Gb DRAM chip, this equates to 390 ns. But tRFC is only 260 ns, i.e., there is no time to issue eight sequential Activate+Precharge sequences and allow recovery time at the end. Also, parallel Activates in eight banks would draw far more current than is allowed by the tFAW constraint. Second, the DRAM specifications provide the average current drawn during an Activate/Precharge (IDD0) and Refresh (IDD5). If Refresh was performed with 64 Activate+Precharge sequences (64 = 8 banks x 8 rows per bank), we would require much more current than that afforded by IDD5. Hence, the refresh process uses a method that has higher efficiency in terms of time and current than a sequence of Activate and Precharge commands.

The actual refresh process:

This process is based on the high number of subarrays provisioned in every bank. For example, a bank may have 16 subarrays, of which only four are accessed during a regular Activate operation. This observation also formed the basis for the recent subarray-level parallelism (SALP) idea of Kim et al. During a refresh operation, the same row in all 16 subarrays undergo an Activation and Precharge. In this example, four rows worth of data are being refreshed in parallel within a single bank. Also, the current requirement for this is not 4x the current for a regular Activate; by sharing many of the circuits within the bank, the current does not increase linearly with the extent of subarray-level parallelism. Thus, a single bank uses the maximum allowed current draw to perform parallel refresh in a row in every subarray; each bank is handled sequentially (refreshes in two banks may overlap slightly based on current profiles), and there is a recovery time at the end.

Update added on 12/16/2013:

Refresh in DDR4:

One important change expected in DDR4 devices is a Fine Granularity Refresh (FGR) operation (this ISCA'13 paper from Cornell/IBM has good details on FGR). FGR-1x can be viewed as a regular refresh operation, similar to that in DDR3. FGR-2x partitions the regular refresh operation into 2 smaller "half-refresh" operations. In essence, the tREFI is halved (half-refresh operations must be issued twice as often), and the tRFC also reduces (since each half-refresh operation does half the work). FGR-4x partitions each regular refresh operation into 4 smaller "quarter-refresh" operations. Since each FGR operation renders a rank unavailable for a short time, it has the potential to reduce overall queuing delays for processor reads. But it does introduce one significant overhead. A single FGR-2x operation has to refresh half the cells refreshed in an FGR-1x operation, thus potentially requiring half the time. But an FGR-2x operation and an FGR-1x operation must both incur the same recovery cost at the end to handle depleted charge pumps. DDR4 projections for 32 Gb chips show that tRFC for FGR-1x is 640 ns, but tRFC for FGR-2x is 480 ns. The overheads of the recovery time are so significant that two FGR-2x operations take 50% longer than a single FGR-1x operation. Similarly, going to FGR-4x mode results in a tRFC of 350 ns. Therefore, four FGR-4x refresh operations would keep the rank unavailable for 1400 ns, while a single FGR-1x refresh operation would refresh the same number of cells, but keep the rank unavailable for only 640 ns. The high refresh recovery overheads in FGR-2x and FGR-4x limit their effectiveness in reducing queuing delays.

Refresh in LPDDR2:

LPDDR2 also provides a form of fine granularity refresh. It allows a single bank to be refreshed at a time with a REFpb command (per-bank refresh). For an 8-bank LPDDR2 chip, eight per-bank refreshes handle as many cells as a single regular all-bank refresh (REFab command). A single REFpb command takes way more than 1/8th the time taken by a REFab command -- REFab takes 210 ns in an 8Gb chip and REFpb takes 90 ns (see pages 75 and 141 in this datasheet). Similar to DDR4's FGR, we see that breaking a refresh operation into smaller units imposes a significant overhead. However, LPDDR2 adds one key feature. While a REFpb command is being performed in one bank, regular DRAM operations can be serviced by other banks. DDR3 and DDR4 do not allow refresh to be overlapped with other operations (although, this appears to be the topic of two upcoming papers at HPCA 2014). Page 54 of this datasheet indicates that a REFpb has a similar impact on tFAW as an Activate command. Page 75 of the same datasheet indicates that an Activate and REFpb can't be viewed similarly by the memory scheduler. We suspect that REFpb has a current profile that is somewhere between a single-bank Activate and a REFab.

Jointly authored by Rajeev Balasubramonian (University of Utah), Manju Shevgoor (University of Utah), and Jung-Sik Kim (Samsung).

Monday, August 12, 2013

Student Reports for ISCA 2013

I recently handled student travel grants for ISCA 2013. As is customary, I asked awardees to send me a brief trip report:

"... explain what you saw at the conference that had a high impact on you. This could be a keynote or talk that you thought was especially impressive, it could be a commentary on research areas that deserve more/less attention, on especially effective presentation methods, on ways to improve our conference/reviewing system, etc. Please feel free to be creative..."

37 of the 68 awardees responded. By most accounts, this was one of the most memorable ISCAs ever. Several students highlighted the talk on DNA Computing. Many also wished there was a session at the start with 90-second paper summaries (as was done at MICRO 2012).

Some of the more interesting comments (lightly edited and anonymized) are summarized below.

"... we have to leave the dream of hardware generality if we still want to increase performances with reasonable energy budgets. I noticed a lot of work sitting between hardware specialized units and general purpose architectures. ... I really enjoyed the work presented by Arkaprava Basu in the big data session called Efficient Virtual Memory for Big Memory Servers. The introductory quote in the paper reads: 'Virtual memory was invented in a time of scarcity. Is it still a good idea?' Experiments on widely used server applications show that many virtual memory features are not needed."

"I strongly recommend using the lightning session in future conferences."

"... Breadth of topics has been increasing over the years. The paper on DNA computing was really, really good. A tutorial or panel on emerging technology would also be very cool. Potential list of topics: DNA/protein computation, quantum, resistive element storage & computing, bio-inspired computing and circuit technology, optical, circuits for near/sub threshold computing.

The split sessions were a bit off-putting. I would also like all sessions to be recorded.

During the business meeting the idea of turning CAN into something like WDDD was brought up. I really like this idea.

I found Uri's discussion of memory-intensive architectures particularly compelling. I rather enjoy keynotes that present interesting, largely undeveloped research directions and ideas. One thing that I thought was missing from this year's ISCA was a session on program characterization or application analysis. Given the amount of application- or workload- specific stuff we are seeing, this topic seems increasingly important. The release of data sets, benchmarks, and thorough analyses of application domains and workloads should not be relegated to ASPLOS/ISPASS -- I'd like to see them here at ISCA and encourage more of them. Especially in data center research (and to a lesser extent mobile) it seems like large companies have far more intuition and data on their workloads than is generally available to the community. Perhaps making a top-tier conference like ISCA a venue for that information would make the release of some (admittedly proprietary) information possible or attractive."

"I really enjoyed the panel in the Monday workshop that discussed the state of computer architecture in 15 years. Similarly, I liked the second keynote on future directions and specifically the memristor."

"The talk by Gadi Haber in the AMAS-BT workshop was memorable. In the talk, Gadi states that binary translation can be seen as an extension to micro-architecture. Things that are difficult in hardware are sometimes much easier to do in software. In fact, many groups are co-designing hardware with binary translators."

"Many talks/keynotes encouraged the use of cost as a metric for future evaluations. I really enjoyed the session of emerging technologies, especially the DNA computing talk. I also enjoyed the talk by Andreas Moshovos that had an entirely different way to pass the insight of their idea.

The most useful session for me was the data centers session. Specifically, the last talk by Jason Mars was excellent and I really liked the insight that was provided for a large company like Google. Knowing that the studies mentioned in this work are important for a key player in the data center industry was reassuring.

One minor suggestion is to end the last day right after lunch to facilitate travel."

"I especially enjoyed the talk on ZSim. I think simulators should be a more discussed area in computer architecture research.

One thing I would suggest is that the keynotes should be more technically informative. I thought the first keynote contained more personal opinions than technical reasons.

Another thing I would suggest is that all speakers should be required to present their paper in 2-3 mins at the very start of the conference."

"I especially enjoyed the 'Roadmaps in Computer Architecture' workshop."

"The thing that had the most impact on me was the chats I had with stalwarts of computing in the hallway. ... I think the idea of lightning talks like that in MICRO 2012 would have been really helpful."

"The two keynotes were very complementary. One looked back at history and the other was very inspiring for future research directions. The most impressive paper presentation was on the Zsim simulator. The author ran the power point on their simulator with dynamically updated performance stats. I would also suggest recording all presentations."

"I followed with interest the opening keynote by Dr. Dileep Bhandarkar. For a newbie, it's really nice to listen to the history of computer evolution. Another interesting presentation was on the ZSim simulator. It was very funny to see the thousand core systems simulator up and running during the presentation itself. The presenter precisely and clearly explained how choices were made to get the maximum performance."

"Among the talks I attended, the ideas that mostly intrigued me were Intel's Triggered Instructions work (Parashar et al.), and the introduction of palloc as an approach toward moving energy management control for datacenters to the software level, similar to the use of malloc for memory management (Wang et al.). I also found the thread scheduling work on asymmetric CMPs very interesting (Joao et al.).

... some presentations also had obvious flaws on the static part, i.e., the slides - full sentences, no smooth transitions between slides, overloaded content. Maybe an improvement could be achieved by imposing some rules (the same way as rules are set for paper submissions), or by organizing a tutorial session during the conference where 'Giving a good presentation' would be taught.

I thought that the time dedicated for Q&A after each presentation was quite limited. One thing I could think of is having (additional) Q&A time for each set of papers rather than each single paper, so that the dedicated time can be filled up according to the audience's interest for each of that set's papers."

"I'd like to see a poster session in future ISCA editions, e.g., including papers that didn't make it to the main conference."

"I have been interested in taking security research further down the hardware software stack, but it appears as though most security research at ISCA is focused on things such as side channel attacks. I think that one interesting area is to look at security accelerators or security mechanisms in hardware that increase either performance of common security solutions or improve security of said solutions. Security co-processors, as I observed in a few of the talks, do not solve primary security issues, and the problems need to be tackled at more fundamental levels."

"The most impressive talk for me was by Richard Muscat, 'DNA-based Molecular Architecture with Spatially Localized Components'. I was truly amazed when he reached a specific slide that explains how he managed to use DNA molecules as a wire to transmit the result of a computation and, therefore, enabling composition of many modules of DNA computation, while the previous approach to DNA computing is limited to doing a single computation in a soup of DNA. This is a huge step towards enabling intelligent drugs that implement some logic by using DNA molecules. I also especially appreciated the last two talks about program debugging ('QuickRec: Prototyping an Intel Architecture Extension for Record and Replay of Multithreaded Programs' and 'Non-Race Concurrency Bug Detection Through Order-Sensitive Critical Sections'). They offer interesting insights on how to enable better debugging of parallel programs, which currently is very frustrating to do. I hope that in the near future we have better options to efficiently debug parallel software instead of having to stick to 'errorless programming' :) "

"I want to emphasize the 'Emerging Technologies' session (Session 3A) and especially the work about DNA-based circuits by Richard A. Muscat et al. I have to admit that I was not really aware of the fact that there is that much research going on in the field of DNA, which might also be of interest for the computer architecture community. Nevertheless, especially in a time where we discuss whether Moore's law may not hold any more in the near future (as it was also a topic throughout the keynotes at ISCA'40), I think that investigating all kinds of alternative ways to construct "logic circuits" must be paid high attention. Assembling such circuits based on a modular approach using DNA structures may sound like a science fiction movie these days (at least for myself at the moment), but who imagined a few decades ago that we are going to run around in public, wearing camera- and other high-tech-equipped glasses? So although it does not fall into my research area at all, please keep up that great work!

One of the authors presenting a workshop paper was not able to attend. Therefore, they prepared a screen-captured video presentation. Basically, I am not really a fan of such presentation methods, but then I was positively amazed because they really did a great job and presented their work very well. However, I think in general and especially for a large audience (like the main symposium of ISCA), physically present speakers should be favored in the future ('discussions' with a laptop and a projector are somehow difficult :)."

"The session with the highest impact on me was 'Emerging Technologies'. The proposals regarding quantum computers and DNA-based molecular architecture provided an insight about how computing will be in the next years. Thus, in my opinion similar type of works should be supported."

"The most interesting part was the keynote given by Prof. Uri Weiser. He talked about heterogeneous and memory intensive architecture as the next frontier. I think ISCA may need more such talks about future technology."

"There are three highlights that come to mind: the keynote by Dr. Dileep Bhandarkar, the presentation of 'Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached', and the Workshop on Energy Efficient Design.

Keynote: ... Dr. Bhandarkar's advice to 'wear many hats' , 'you don't grow until you step out of your comfort zone', 'don't be encumbered by past history', and statement that 'strategy without execution is doomed' are particularly noteworthy. Dr. Bhandarkar's anecdote concerning the development of Itanium was also illustrative, I was previously aware of the external controversy over the architecture but did not know of the degree to which Intel sought to protect Itanium from internal competitors. Additionally, Dr. Bhandarkar's assertion that destabilizing technologies come from below (cheaper, lower powered, efficiency vs. performance) was certainly thought provoking. Finally, Dr. Bhandarkar's demonstration of the complexities of Qualcomm's snapdragon system architecture and assertion that DSP's will require new levels of programmability during the question session was intriguing.

Thin Servers: I enjoyed the presentation and had higher than average insight into this topic as my background is in networking. Briefly, I was intrigued with the custom platform developed to speed lookups, I thought the performance analysis was well done. A drawback of the work was the lack of evaluation vs. NetFPGA solutions, but the presenter claimed that their SoC solution was more compliant with the existing memcached protocol. At a high level, I think it is an interesting counterpoint to Dr. Bhandarkar's assertion that increasing programmability is necessary. It would seem that flexible, low cost development and fabrication platforms are also extremely important to developing heterogeneous systems.

WEED: I found the discussion session led by Karthick Rajamani at the end of the workshop to be thought provoking. Especially his comments pointing to increasing interest in the power consumption and control of memory systems. Additionally, I appreciated his efforts to show the impact that past workshop papers had via follow up papers at conferences."

"The opening keynote by Dr. Bhandarkar was very interesting. ... I really liked the fact that the security session and the reliability session were scheduled back to back as they often share similar audiences. I would recommend such scheduling in the future."

"I would like to comment on the Emerging Technologies Session. I feel it was too much introduction and too little presentation of what was done in the research. I propose to double time for this session. First part should be introduction, for those who are not aware of those technologies, and second part should be deep analysis of the work done by the authors."

"I thoroughly enjoyed the Accelerators and Emerging Architectures session for their creativities in facing the dark silicon and utilization wall problems head on. I also was particularly interested in the Big Data session as this is my research direction and I believe the architecture community is and should be focusing on this area. I regret that I was not able to attend the Power and Energy session as it was put in the session parallel with the Big Data session; I believe solving power and energy problems is imperative in all aspects of hardware architecture/design. I enjoyed Uri's keynote on heterogeneous and memory intensive architecture. I generally agree with his take on the future of computing being heterogeneous and memory intensive, however, I am not sold on the applicability and feasibility of the proposed MultiAmdahl law just yet. I think more research on heterogeneous and memory intensive architectures would help the community."

"The most impressive talk was DNA-based Molecular Architecture with Spatially Localized Components. It is amazing how computer architecture evolved in the last 40 years, but the ability of computing using DNA sequences is something beyond extraordinary. New secure hardware methods to prevent rootkit evasion and infection were also pretty interesting, I would like to see more talks on security in the future. Besides the technical part, the fact the conference was held in Tel Aviv gave an exotic personality to the event. The dinner with the 'Jerusalem Old City Wall' sightseeing, followed by an exciting drum experience, really promoted a smooth integration between the participants."

"I found two things to be quite inspiring at ISCA. The first was the initial keynote at the conference. Dr. Bhandarkar's talk drove home the fact that a lot of innovation in our field is indeed driven by these disruptive shifts to smaller, more pervasive form factors. I have always been a fan of the history of computers, but it was great to see how one person could touch on so many significant paths through that trajectory over a career. The second was a paper from the University of Washington on DNA-based computing. While it may not be the next disruptive change, it's always important to keep perspective on how we can apply technology from our field to other areas, as it opens up doors that we never even thought of before. I hope that future conferences continue to have such diverse publications, in order to encourage others in our field to also think outside the box."

"When I talked with fellow students at the conference, the ideas that amazed me most were actually from the session parallel with the one in which I presented: I really like the ideas in online malware detection with performance counters, as well as the notion of a provable totally secure memory system. Now that we have parallel sessions, ISCA could also do something like 3-min highlights for each paper, or a poster session for all papers. It's really a pity to miss something in the parallel session!"

"I really enjoyed ISCA this year, particularly because of the broader range of research areas represented. I found the sessions on Big Data and Datacenters a great addition to the more traditional ISCA material. I also liked Power and Energy and Heterogeneity. I would love to see ISCA continue to take a broader definition of computer architecture research in the upcoming conferences. Additionally, the presentations themselves were extremely high quality this year.

However, I think the 25 minute talk slot was not long enough. Most talks had time for only one question, which is ridiculous. Part of the value of attending the conference rather than just reading the paper is to interact with the other researchers. However, often when an industry person (such as from Google or Microsoft) with some useful insight to add was not allowed to speak. Either the sessions should be lengthened or the talks should be shortened, but there should definitely be more Q & A time."

"Firstly, this was the nicest conference I've attended thus far in my graduate studies (out of previous ISCAs and others). The venue was in a very beautiful location, the excursion was educational and quite fun, the conference itself was very well organized, and I felt that the quality of papers this year was strong. Although I probably can't say that I fully understood each paper in this session, I thought that the Emerging Technologies session this year was very interesting; especially the paper 'DNA-based Molecular Architecture with Spatially Localized Components'. This area of research is quite different than my current focus on GPGPU architectures, but I found it very intriguing to see how they were using DNA strands to create very basic digital logic structures. As this research seems to be in its infancy, I'd be interested to see where this research goes in the future and how it's applicability to the human body evolves. Commenting on one of the especially effective presentation methods, I thought that "ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems" was presented very well. The presenter, Daniel Sanchez, was actually running the entire power point presentation through their simulation framework, showing various statistics at the bottom of the screen during the talk. I thought that this was a very cool way of highlighting the speed and applicability of such a simulator while presenting the detailed implementation in the talk."

"I'll focus my comments on the Roadmap Workshop, which was the highlight of the conference for me. The talks I attended focused on where technology would be a decade from now. Doug Burger was of the opinion that the cell phone would become the primary device for all personal computation. Devices will collect large amounts of data about users and use that data to predict things like the user's health. The cloud would be split into two parts, a personal cloud running out of the user's home and the cloud that lives in large data centers. Privacy would be a major issue, hence all of the users' personal information would lie in the personal cloud, while anonymized data would be uploaded into the cloud. The large hordes of data in the data center cloud in combination with the personal cloud would be used to identify patterns and predict events (health related) in the life of the user. Devices would rely more on non-standard modes of input like gestures (kinect) and touch. Personal assistant applications would become more intelligent and be able to do more that just maintain calendars. In conclusion, devices would become smarter than humans in a large variety of tasks, helped by their ability to parse through huge amounts of data. I thought this was one of the best talks at this year's ISCA.

Other talks focused more on the end of Dennard scaling. The speakers were of the opinion that Moore's law would continue for a few more generations, but Dennard scaling was at an end because voltage no longer scales with smaller technology nodes. More exotic, one-off technologies would be used in the future. Though many believe that the only way to scale going forward is with the introduction of Dark Silicon, most speakers believed that Dark silicon was economically infeasible. Instead, dim silicon was believed to be the solution."

"I found ISCA to be well organized and with a solid technical program. Almost all presentations I attended were interesting contributions, although I was not particularly shocked by any of them as to highlight it. So, I want to focus on the issue of where to organize ISCA abroad. Personally, I enjoyed ISCA being in Israel probably more than if it had been in any other place in the world, in good part due to the possibility of touring Jerusalem. However, I found it troubling that many people from US companies and universities that usually attend ISCA did not make it to Tel-Aviv. Cost may have been an issue (especially for students, even after factoring in the travel grant), distance/time zones may have also been an issue. Maybe even the perceived safety risks of being in Israel? Maybe this was my own personal perception and I may be wrong about it (I had not attended ISCA since 2009). I don't know how attendance numbers compare to previous ISCAs, but it would be interesting to poll regular ISCA attendees in our community to ask them why they did not go, and then consider that input for the decision about where to organize future ISCAs. Anyway, I understand the trade-offs of organizing ISCA abroad and I know it's hard to pick a place that is both interesting and easy to travel to from the US and from Europe, and that also has a team of local people in the community who can do a good job organizing it."

"I think my experience at ISCA was different from that of most of the attendees because I'm a fresh grad student in the area of quantum computing. A talk that really impressed me was the DNA Computing talk. One of the challenges in presenting an emerging architecture to a general audience is providing enough background to make the topic accessible. The speaker was able to present the material in a way that gave me a good understanding of the challenges and innovations in DNA computing in 20 minutes without getting bogged down in details."

"I particularly liked the emerging technology session. Those are wild yet reasonable and well-developed ideas. Another paper I liked is the convolution engine one from Stanford. It has a clear motivation and convincing solution, as well as rich data that not only supports their own work, but also gives me good intuition on energy consumption distribution in modern processors. I also benefited a lot from the DRAM session."

"For future ISCAs, I would like to see the conference scope slightly extended such that application specialists can find their place in the conference. Application are a driving factor in the development of new systems. While PPoPP and HPCA are co-located, there is considerably little interaction between these two communities even though their interests overlap."

Monday, January 7, 2013

Observations on the Flipped Classroom

I engaged in a "flipped classroom" experiment for my graduate computer architecture class last semester. In an ideal flipped classroom, the lecturing happens outside class (via on-line videos), and class time is used entirely for problem-solving and discussions. For a number of reasons, I couldn't bring myself to completely eliminate in-class lecturing. So in a typical lecture, I'd spend 80% of the time doing a quick recap of the videos, clarifying doubts, covering advanced topics, etc. The remaining 20% of the time was used to solve example problems. My observations below pertain to this modified flipped classroom model:

My estimate is that 80-90% of the students did watch videos beforehand, which was a pleasant surprise. The percentage dropped a bit as the semester progressed because (among other things) students realized that I did fairly detailed recaps in class.
I felt that students did learn more this year than in previous years. Students came to exams better prepared and the exam scores seemed to be higher. In their teaching evaluations, students commented that they liked this model. FWIW, the teaching evaluation scores were slightly higher -- an effective course rating of 5.64/6 this year (my previous offerings of this class have averaged a score of 5.54).
In my view, the course was more effective this year primarily because students had access to videos. Based on YouTube view counts, it was clear that students were re-visiting the pertinent videos while working on their assignments and while studying for exams. The in-class problem-solving was perhaps not as much of a game-changer.
Making a video is not as much fun for me as lecturing in class. From a teacher's perspective, this model is more work and less fun. But it's gratifying to have impact beyond the physical campus.
Miscellaneous observations about the recording process: I used Camtasia Studio to create screencasts and I thought it worked great; I didn't spend much time on editing and only chopped out about 10 seconds worth of audio blemishes per video; a headset mic is so much better than a laptop's in-built mic (too bad it took me 7 videos to realize this).

Next year, I'll try to make the course more effective by doing more problem-solving and less lecturing in class. Next year will also be a lot less stressful since the videos are already in place :-).