Saturday, March 19, 2011

Grad School Ranking Methodology

We are in the midst of the graduate admissions process and faculty candidate interviews.  A large part of a candidate's decision is often based on a university's rank in computer science.  This is especially true for prospective graduate students.  Unfortunately, the university rankings commonly in use are flawed and inappropriate.

The US News rankings seem to be most visible and hence most popular.  However, they are a blatant popularity contest.  They are based on an average of 1-5 ratings given by each CS department chair for all other CS departments.  While I'm sure that some department chairs take their rating surveys very seriously, there are likely many that simply go off of past history (a vicious cycle where a department is reputed because of its previous lofty rank), recent news, or reputation within the department chair's own research area.  In spite of several positive changes within our department, our reputation score hasn't budged in recent years and our ranking tends to randomly fluctuate in the 30s and 40s.  But no, this post is not about the impact of these rankings on our department; it is about what I feel is the right way for a prospective grad student to rank schools.

The NRC released a CS ranking way back in 1993.  After several years of work, they released a new ranking a few months back.  However, this seems to be based on a complex equation and there have been several concerns about how data has been collected.  We have ourselves seen several incorrect data points.  The CRA has a statement on this.

Even if the NRC gets their act together, I strongly feel that prospective grad students should not be closely looking at overall CS rankings.  What they should be focusing on is a ranking for their specific research area.  Such a ranking should be one of a number of factors that they should be considering, although, in my view, it should be a dominant factor.

What should such a research area ranking take into account?  It is important to consider faculty count and funding levels in one's research area.  But primarily, one should consider the end result: research productivity.  This is measured most reliably by counting recent publications by a department in top-tier venues.  Ideally, one should measure impact and not engage in bean-counting.  But by focusing the bean-counting on top-tier venues, impact and quality can be strongly factored in.  This is perhaps the closest we can get to an objective measure of impact.  Ultimately, when a student graduates with a Ph.D., his/her subsequent job depends most strongly on his/her publication record.  By measuring the top-tier publications produced by a research group, students can measure their own likely productivity if they were to join that group.

I can imagine a few reasonable variations.  If we want to be more selective about quality and impact, we could measure best-paper awards or IEEE Micro Top Picks papers.  However, those counts are small enough that they are likely not statistically significant in many cases.  One could also derive a stronger measure of impact by looking at citation counts for papers at top-tier venues.  Again, for recent papers, these citation counts tend to be noisy and often dominated by self-citations.  Besides, it's more work for whoever is computing the rankings :-).  It may also be reasonable to divide the pub-count by the number of faculty, but counting the number of faculty in an area can sometimes be tricky.  Besides, I feel a department should get credit for having more faculty in a given area; a grad student should want to join a department where he/she has many options and a large peer group.  One could make an argument for different measurement windows -- a short window adapts quickly to new faculty hires, faculty departures, etc.  The window also needs to be long enough to absorb noise from sabbaticals, funding cycles, infrastructure building efforts, etc.  Perhaps, five years (the average length of a Ph.D.) is a sweet spot.

So here is my proposed ranking metric for computer architecture: number of papers by an institution at ISCA, MICRO, ASPLOS, HPCA in the last five years.

Indeed, I have computed the computer architecture ranking for 2010.  This is based on top-tier papers in the 2006-2010 time-frame for all academic institutions world-wide.  I have not differentiated between CS and ECE departments.  An institution gets credit even if a single author on a paper is from that institution.  If you are considering joining grad school for computer architecture research (or are simply curious), email me for a link to this ranking.  I have decided not to link the ranking here because I feel that only prospective grads need to see it.  A public ranking might only foster a sense of competition that is likely not healthy for the community.

If you are shy about emailing me (why?! :-) or are curious about where your institution may stand in such a ranking, here's some data:  All top-5 schools had 44+ papers in 5 years; 20+ papers equates to a top-10 rank; 15+ papers = top-15; 12+ papers = top-20; 9+ papers = top-25.  There were 44 institutions with 4+ papers and 69 institutions with 2+ papers.

Please chime in with comments and suggestions.  I'd be happy to use a better metric for the 2011 ranking.

7 comments:

  1. Perhaps you could share your views about the top-tier venues themselves? Are they all essentially equivalent, or are there distinguishing factors?

    ReplyDelete
  2. The only good measure of quality for a PhD program is the success of its graduates. Specifically: How many and what fraction of your PhD alumni have the jobs they want? (The cruder "How many PhDs are in top-tier tenure-track faculty positions?" ignores the fact that many PhDs go into industry research or teaching positions by choice.) How many of your PhDs have tenure? How many top-tier papers do they publish after graduation? How many have PhD students of their own? How many have won awards for their research (eg, CAREER) or teaching?

    The number of publications in top venues conflates the success of the students with the success of the faculty, and measures student success only in the context of their advisor's research group. PhD programs exist to grow students into strong independent scholars and researchers; faculty success is of strictly secondary importance.

    ReplyDelete
  3. Anon: Among the top-4 venues, ISCA is considered the most prestigious, MICRO and ASPLOS are a tad less prestigious, and HPCA is a clear fourth. They all have pretty much the same acceptance rates (15-25%), similar PCs, and 4-6 high-quality reviews per paper. HPCA is younger and a little smaller, but has no shortage of high-impact papers. In terms of size, ISCA typically has an attendance of 400, MICRO/ASPLOS has 300, and HPCA has 250. HPCA has an attendance that is nearly twice that of our next best conference (PACT?). In terms of topics, ASPLOS papers are typically required to have at least 2 of the following 3 components: architecture, programming languages, operating systems.

    Jeffe: You make some great points. I agree that prospective grads should gather as much info as they can about a group's graduates. Collecting your suggested data for every school is well beyond my amateur efforts :-). I feel that the effect of one's grad school shows up more prominently within in-school metrics than within post-school metrics. At some point, personal talent starts to have a bigger effect than the great or ordinary education you received in grad school.

    ReplyDelete
  4. Reposting comment I've placed elsewhere since I think its relevant to (perceived) ranking quality. Here is another metric:

    http://citeseerx.ist.psu.edu/stats/venues

    Ranking are generated by Garfield's ranking:
    http://en.wikipedia.org/wiki/Impact_factor

    For arch research the generally accepted 'top tier'venues are ISCA, MICRO, ASPLOS, HPCA (probably in that order for most people).

    According to citeseer the rankings for impact are
    ASPLOS (8), HPCA(38), ICS(39), ISCA(81), MICRO(115), PACT (217), HiPC(444), SC(566), ISPASS (unranked).

    ReplyDelete
  5. The discussion of problems in ranking universities really only _begins_ with the points that (first) almost any ranking is very probably functionally unable to give the users all the relevant information that they need to make an informed decision about anything, and that (second) such rankings have a tendency to become canonized unduly. I bet that this is more the case the simpler the ranking is.

    And indeed, you're not the first to raise issue with US News specifically. In 1996 then-president of Stanford Gerhard Casper wrote a letter to the editor [http://www.stanford.edu/dept/pres-provost/president/speeches/961206gcfallow.html] condemning the rankings for perpetuating the functionally illiterate stereotype that public schools like UNC, Michigan, and Berkeley are not comparable to the name-brand schools like Stanford. It's hard to imagine that this has _no_ salience when the criticism comes from the President of such a name-brand school.

    I think that rankings will probably always be a leaky abstraction, but it should not be the case that you shouldn't at least struggle to make your ranking less stupid, although I would point out that generally the people who your rankings will appeal to are the sorts of people who probably already know that the US News rankings are not really that great.

    ReplyDelete
  6. Alex: My goal here was not to produce a look-up table that would directly tell people what school to join. When students come to me for grad school advice, I tell them to focus on 2-3 prominent factors. Others might focus of a different set of factors... hence the futility in computing a universal rank that works for everyone. However, "research reputation" is probably an important factor for all, and by default, the US News ranking is used as a proxy. I am simply attempting to codify a better metric for research reputation. In my view, every metric has its pros and cons. By combining multiple metrics, you not only inherit the positive attribute of each metric, but also its negative attributes. So it's not clear that a complex ranking measure is necessarily better. In my post, I do mention a few other metrics and my rationale for leaving them out.

    ReplyDelete
  7. You may want to add this - Contribution to open source tools, simulators. For example, gem5, macsim, and usim

    ReplyDelete