Saturday, March 19, 2011

Grad School Ranking Methodology

We are in the midst of the graduate admissions process and faculty candidate interviews.  A large part of a candidate's decision is often based on a university's rank in computer science.  This is especially true for prospective graduate students.  Unfortunately, the university rankings commonly in use are flawed and inappropriate.

The US News rankings seem to be most visible and hence most popular.  However, they are a blatant popularity contest.  They are based on an average of 1-5 ratings given by each CS department chair for all other CS departments.  While I'm sure that some department chairs take their rating surveys very seriously, there are likely many that simply go off of past history (a vicious cycle where a department is reputed because of its previous lofty rank), recent news, or reputation within the department chair's own research area.  In spite of several positive changes within our department, our reputation score hasn't budged in recent years and our ranking tends to randomly fluctuate in the 30s and 40s.  But no, this post is not about the impact of these rankings on our department; it is about what I feel is the right way for a prospective grad student to rank schools.

The NRC released a CS ranking way back in 1993.  After several years of work, they released a new ranking a few months back.  However, this seems to be based on a complex equation and there have been several concerns about how data has been collected.  We have ourselves seen several incorrect data points.  The CRA has a statement on this.

Even if the NRC gets their act together, I strongly feel that prospective grad students should not be closely looking at overall CS rankings.  What they should be focusing on is a ranking for their specific research area.  Such a ranking should be one of a number of factors that they should be considering, although, in my view, it should be a dominant factor.

What should such a research area ranking take into account?  It is important to consider faculty count and funding levels in one's research area.  But primarily, one should consider the end result: research productivity.  This is measured most reliably by counting recent publications by a department in top-tier venues.  Ideally, one should measure impact and not engage in bean-counting.  But by focusing the bean-counting on top-tier venues, impact and quality can be strongly factored in.  This is perhaps the closest we can get to an objective measure of impact.  Ultimately, when a student graduates with a Ph.D., his/her subsequent job depends most strongly on his/her publication record.  By measuring the top-tier publications produced by a research group, students can measure their own likely productivity if they were to join that group.

I can imagine a few reasonable variations.  If we want to be more selective about quality and impact, we could measure best-paper awards or IEEE Micro Top Picks papers.  However, those counts are small enough that they are likely not statistically significant in many cases.  One could also derive a stronger measure of impact by looking at citation counts for papers at top-tier venues.  Again, for recent papers, these citation counts tend to be noisy and often dominated by self-citations.  Besides, it's more work for whoever is computing the rankings :-).  It may also be reasonable to divide the pub-count by the number of faculty, but counting the number of faculty in an area can sometimes be tricky.  Besides, I feel a department should get credit for having more faculty in a given area; a grad student should want to join a department where he/she has many options and a large peer group.  One could make an argument for different measurement windows -- a short window adapts quickly to new faculty hires, faculty departures, etc.  The window also needs to be long enough to absorb noise from sabbaticals, funding cycles, infrastructure building efforts, etc.  Perhaps, five years (the average length of a Ph.D.) is a sweet spot.

So here is my proposed ranking metric for computer architecture: number of papers by an institution at ISCA, MICRO, ASPLOS, HPCA in the last five years.

Indeed, I have computed the computer architecture ranking for 2010.  This is based on top-tier papers in the 2006-2010 time-frame for all academic institutions world-wide.  I have not differentiated between CS and ECE departments.  An institution gets credit even if a single author on a paper is from that institution.  If you are considering joining grad school for computer architecture research (or are simply curious), email me for a link to this ranking.  I have decided not to link the ranking here because I feel that only prospective grads need to see it.  A public ranking might only foster a sense of competition that is likely not healthy for the community.

If you are shy about emailing me (why?! :-) or are curious about where your institution may stand in such a ranking, here's some data:  All top-5 schools had 44+ papers in 5 years; 20+ papers equates to a top-10 rank; 15+ papers = top-15; 12+ papers = top-20; 9+ papers = top-25.  There were 44 institutions with 4+ papers and 69 institutions with 2+ papers.

Please chime in with comments and suggestions.  I'd be happy to use a better metric for the 2011 ranking.

Friday, March 11, 2011

A Case for Archer

Disclaimer : The opinions expressed in this post are my own, based on personal experiences.

Almost any computer architecture researcher realizes the importance of simulators to the community. Architectural level simulators allow us to model and evaluate new proposals in a timely fashion, without having to go through the pain of fabricating and testing a chip.

So, what makes for a good simulator? I started grad school in the good old days when Simplescalar used to be the norm. It had a detailed, cycle-accurate pipeline model and most importantly, it was pretty fast. Once you got over the learning curve, additions to the model were fairly easy. It did have a few drawbacks. There was little support for simulating CMP architectures, there was no coherence protocol in place, and a detailed DRAM model was missing. Moreover, interference of the operating system with an application's behavior was not considered. But, these issues were less important back then.

Soon, it was apparent that CMPs (chip multi-processors) were here to stay. The face of applications changed as well. Web 2.0-based, data-intensive applications came into existence. To support these, a number of server-side applications needed to be run on backend datacenters. This made the performance of multi-threaded applications, and that of main memory of paramount importance.

To keep up with these requirements, the focus of the architecture community changed as well. Gone were the days of trying to optimize the pipeline and extracting ILP. TLP, MLP, and memory system performance (caches, DRAM) became important. One also could no longer ignore the importance of interference from the operating system when making design decisions. The community was now on the lookout for a simulation platform that could take all of these factors into account.

It was around this time that a number of full-system level simulators came into being. Off the top of my head, I can count a number of these, popular with the community and with a fairly large user base - Wind River's Simics, M5, Zesto, Simflex, SESC, to name a few. For community wide adoption, a simulator platform needed to be fast, have a modular code base, and have a good support system (being cycle-accurate was a given). Simics was one of the first platforms that I tried personally and found their support to be extremely responsive, which also garnered a large participation from the academic community. Also, with release of GEMS framework from Wisconsin, I didn't need a reason to look any further.

In spite of all the options out there, getting the infrastructure (simulator, benchmark binaries, workloads and checkpoints) in place is a time consuming and arduous process. As a result, groups seem to have gravitated towards simulators that best suited their needs in terms of features and ease of use. The large number of options today also implies that different proposals on the same topic inevitably use different simulation platforms. As a result, it is often difficult to compare results across papers and exactly reproduce results of prior work. This was not as significant a problem before when nearly everyone used Simplescalar.

In some sub-areas, it is common practice to compare an innovation against other state-of-the-art innovations (cache policies, DRAM scheduling, etc.). Faithfully modeling the innovations of others can be especially troublesome for new grad students learning the ropes of the process. I believe a large part of this effort can be reduced if these models (and by model I mean code :-) were already publicly available as part of a common simulator framework.

The Archer project, as some of you might know, is a recent effort in the direction of collaborative research in computer architecture. From the project's website, they strive for a noble goal -
"To thoroughly evaluate a new computer architecture idea, researchers and students need access to high-performance computers, simulation tools, benchmarks, and datasets - which are not often readily available to many in the community. Archer is a project funded by the National Science Foundation CRI program to address this need by providing researchers, educators and students with a cyberinfrastructure where users in the community benefit from sharing hardware, software, tools, data, documentation, and educational material."

In its current format, Archer provides a large pool of batch-scheduled computing resources, a set of commonly used tools and simulators and some benchmarks and datasets. It also has support for sharing files via NFS and a wiki-based infrastructure to aggregate shared knowledge and experiences.

If widely adopted, this will provide a solution to many of the issues I listed above. It can help push the academic community towards a common infrastructure while at the same time reduce the effort to setup simulation infrastructure and to reproduce prior work.

Although Archer is a great initial step, I believe it still can be improved upon. If I had my wishes, I would like a sourceforge like platform, where the model for a particular optimization is owned by a group of people (say, the authors of the research paper), available to be checked out from a version-control system. Anyone using the Archer platform for their research, that results in a peer-reviewed publication would be obliged to release their model into the public domain under a GPL. Bug reports will be sent to the owners who will in turn release revised versions of the model(s).

In recent years, collaborative research efforts in computer science have been very successful. Emulab is a widely used resource by the networking community. The TCS community too has been involved in successful collaborative research, e.g. the polymath project and the recent collaborative review of Deolalikar's paper. I believe that there is certainly room for larger collaborative efforts within the computer architecture community.