Big Pitch or Big Lottery? The unenviable task of evaluating the grant review system

This week’s issue of Science has an interesting article on The Big Pitch–a pilot NSF initiative to determine whether anonymizing proposals and dramatically cutting down their length (from 15 pages to 2) has a substantial impact on the results of the review process. The answer appears to be an unequivocal yes. From the article:

What happens is a lot, according to the first two rounds of the Big Pitch. NSF’s grant reviewers who evaluated short, anonymized proposals picked a largely different set of projects to fund compared with those chosen by reviewers presented with standard, full-length versions of the same proposals.

Not surprisingly, the researchers who did well under the abbreviated format are pretty pleased:

Shirley Taylor, an awardee during the evolution round of the Big Pitch, says a comparison of the reviews she got on the two versions of her proposal convinced her that anonymity had worked in her favor. An associate professor of microbiology at Virginia Commonwealth University in Richmond, Taylor had failed twice to win funding from the National Institutes of Health to study the role of an enzyme in modifying mitochondrial DNA.

Both times, she says, reviewers questioned the validity of her preliminary results because she had few publications to her credit. Some reviews of her full proposal to NSF expressed the same concern. Without a biographical sketch, Taylor says, reviewers of the anonymous proposal could “focus on the novelty of the science, and this is what allowed my proposal to be funded.”

Broadly speaking, there are two ways to interpret the divergent results of the standard and abbreviated review. The charitable interpretation is that the change in format is, in fact, beneficial, inasmuch as it eliminates prior reputation as one source of bias and forces reviewers to focus on the big picture rather than on small methodological details. Of course, as Prof-Like Substance points out in an excellent post, one could mount a pretty reasonable argument that this isn’t necessarily a good thing. After all, a scientist’s past publication record is likely to be a good predictor of their future success, so it’s not clear that proposals should be anonymous when large amounts of money are on the line (and there are other ways to counteract the bias against newbies–e.g., NIH’s approach of explicitly giving New Investigators a payline boost until they get their first R01). And similarly, some scientists might be good at coming up with big ideas that sound plausible at first blush and not so good at actually carrying out the research program required to bring those big ideas to fruition. Still, at the very least, if we’re being charitable, The Big Pitch certainly does seem like a very different kind of approach to review.

The less charitable interpretation is that the reason the ratings of the standard and abbreviated proposals showed very little correlation is that the latter approach is just fundamentally unreliable. If you suppose that it’s just not possible to reliably distinguish a very good proposal from a somewhat good one on the basis of just 2 pages, it makes perfect sense that 2-page and 15-page proposal ratings don’t correlate much–since you’re basically selecting at random in the 2-page case. Understandably, researchers who happen to fare well under the 2-page format are unlikely to see it that way; they’ll probably come up with many plausible-sounding reasons why a shorter format just makes more sense (just like most researchers who tend to do well with the 15-page format probably think it’s the only sensible way for NSF to conduct its business). We humans are all very good at finding self-serving rationalizations for things, after all.

Personally I don’t have very strong feelings about the substantive merits of short versus long-format review–though I guess I do find it hard to believe that 2-page proposals could be ranked very reliably given that some very strange things seem to happen with alarming frequency even with 12- and 15-page proposals. But it’s an empirical question, and I’d love to see relevant data. In principle, the NSF could have obtained that data by having two parallel review panels rate all of the 2-page proposals (or even 4 panels, since one would also like to know how reliable the normal review process is). That would allow the agency to directly quantify the reliability of the ratings by looking at their cross-panel consistency. Absent that kind of data, it’s very hard to know whether the results Science reports on are different because 2-page review emphasizes different (but important) things, or because a rating process based on an extended 2-page abstract just amounts to a glorified lottery.

Alternatively, and perhaps more pragmatically, NSF could just wait a few years to see how the projects funded under the pilot program turn out (and I’m guessing this is part of their plan). I.e., do the researchers who do well under the 2-page format end producing science as good as (or better than) the researchers who do well under the current system? This sounds like a reasonable approach in principle, but the major problem is that we’re only talking about a total of ~25 funded proposals (across two different review panels), so it’s unclear that there will be enough data to draw any firm conclusions. Certainly many scientists (including me) are likely to feel a bit uneasy at the thought that NSF might end up making major decisions about how to allocate billions of dollars on the basis of two dozen grants.

Anyway, skepticism aside, this isn’t really meant as a criticism of NSF so much as an acknowledgment of the fact that the problem in question is a really, really difficult one. The task of continually evaluating and improving the grant review process is not one anyone should want to take on lightly. If time and money were no object, every proposed change (like dramatically shortened proposals) would be extensively tested on a large scale and directly compared to the current approach before being implemented. Unfortunately, flying thousands of scientists to Washington D.C. is a very expensive business (to say nothing of all the surrounding costs), and I imagine that testing out a substantively different kind of review process on a large scale could easily run into the tens of millions of dollars. In a sense, the funding agencies can’t really win. On the one hand, if they only ever pilot new approaches on a small scale, they never get enough empirical data to confidently back major changes in policy. On the other hand, if they pilot new approaches on a large scale and those approaches end up failing to improve on the current system (as is the fate of most innovative new ideas), the funding agencies get hammered by politicians and scientists alike for wasting taxpayer money in an already-harsh funding climate.

I don’t know what the solution is (or if there is one), but if nothing else, I do think it’s a good thing that NSF and NIH continue to actively tinker with their various processes. After all, if there’s anything most researchers can agree on, it’s that the current system is very far from perfect.

6 thoughts on “Big Pitch or Big Lottery? The unenviable task of evaluating the grant review system”

  1. So basically, it’s like writing a Letter of Intent, but then getting funding based entirely on that? Weird.

    I have to say I like Letter of Intents because they avoid wasting a huge amount of time on a full application that was never going to get up. Making that initial process anonymous seems a good way of making sure that it’s not just the same old people doing the same old things that always get funded. But I’d seriously worry if that was the *only* thing that mattered.

  2. I’m now somewhat curious whether the current NSF proposal process actually gives better-than-chance results over the long term. It certainly seems to have a lot of administrative cost associated with it How many bogus proposals are not submitted because of the expectation of rigorous review? (How many worthwhile proposals are not submitted because the researcher still remembers Sen. William Proxmire?)

  3. During a pre-proposal panel I was on recently, the POs were interested whether or not there were any proposals we felt we would fund now, without getting the full proposal. The general consensus was that we could likely fund the top couple of proposals outright, with only the 4 pager as assessment. However, I don’t think anyone would want to go to only 4 pagers, let alone 2. Part of the comfort people had in saying they would fund something after only 4 pages included the fact that those proposals were written by people who could clearly complete the work (not necessarily senior people, BTW).

  4. I can’t speak for other fields, but I would be concerned at the “less is more” approach for fMRI proposals. As it is I find it near impossible to evaluate a paper in the “best” journals because of their habit of relegating the methods to supplementary online material. In other words, the shorter article is a sleight of hand (imho). I think I would feel similarly with an anonymous, short proposal. If the proposal were short but identified then I could use the PI’s prior literature as a proxy for suitable methods. That’s possibly okay, but possibly not because the proposal could depart significantly from prior experiments, of course. In sum, then, I’d have to say that I need a lot of convincing before a short proposal could pass muster for most fMRI studies.

  5. Thanks for the comments!

    Jon, agreed. I think it makes sense to weed out a certain proportion of proposals at the front end based on a very short overview/abstract (but probably no more than half), but beyond that I don’t want my fate decided on the basis of 2 pages–no matter how good I think I might be at conveying the big picture in that amount of space.

    Garrett, I’d like to know that too. I would assume NSF and NIH have data on inter-reviewer reliability; they certainly have the raw data to support those kinds of analyses. I vaguely recall a paper a while back based on NIH panel data that I thought suggested reliability was not so great, but maybe I’m confabulating. Ring any bells for anyone?

    PLS, makes sense. And even if reviewers did feel comfortable funding based on 4-pagers, I’d still want to see some estimates of reliability–I don’t think confidence in one’s judgment bears much of a relation to quality of one’s judgment, unfortunately (though when reviewers start hemming and hawing about their own opinions, you’re probably in real trouble).

    practiCal, I agree up to a point, although one key difference is that, unlike a paper, in a grant proposal, no one really expects you to do exactly what you say you will. So in that sense, I think it’s kind of a waste of space having PIs go into excruciating detail about what they’re going to do when everyone knows the eventual studies will probably look quite different. I think there’s a sweet spot between an excessively long proposal that wastes the PI and reviewers’ time on trivial details (e.g., the 25-page proposals NIH used to require) and overly short proposals that can’t possibly communicate enough information to support reliable evaluation (e.g., 2 pages). Where exactly that sweet spot is, I don’t really know–though I think it’s probably closer to 12 pages than to 2. Unless of course NIH/NSF were to start funding investigators instead of projects (like NSERC’s Discovery Grants Program), in which case shorter proposals would make perfect sense.

  6. That might be this sort of an amazing cleverness that will you might be delivering so you allow the idea separate pertaining to zero cost. My spouse and i take pleasure in discovering website pages that will know your valuation involving offering a new superior source of information pertaining to no cost. It is your aged exactly what approximately occurs all-around regime .

Leave a Reply