To increase sustainability, NIH should yoke success rates to budgets

There’s a general consensus among biomedical scientists working in the United States that the NIH funding system is in a state of serious debilitation, if not yet on life support. After years of flat budgets and an ever-increasing number of PIs, success rates for R01s (the primary research grant mechanism at NIH) are at an all-time low, even as the average annual budget of awards has decreased in real dollars. The problem, unfortunately, is that there doesn’t appear to be an easy way to fix this problem. As many commentators have noted, there are some very deeply-rooted and systematic incentives that favor a perpetuation, and even exacerbation, of the current problems.

Last month, NIH released an RFI asking for suggestions for strategies to improve the impact and sustainability of biomedical research. This isn’t a formal program announcement, and doesn’t carry any real force at the moment, but it does at least signal some interest in making policy changes that could help prevent serious problems from getting worse.

Here’s my suggestion, which I’m also dutifully sending in to NIH in much-abridged form. The basic idea I’ll explore in this post is very simple: NIH should start yoking the success rates of proposals to the amount of money they request. The proposal is not meant to be a long-term solution, and is in some ways just a stopgap measure until more serious policy changes take place. But it’s a stopgap measure that could conceivably increase success rates by a few points for at least a few years, with relatively little implementation cost and few obvious downsides. So I think it’s at least worth considering.

The problem

At the moment, the NIH funding system arguably incentivizes PIs to ask for as much money as they think they can responsibly handle. To see why, let’s forget about NIH for the moment and consider, in day-to-day life, the typical relationship between investment cost and probability of investment (holding constant expected returns, which I’ll address later). Generally speaking, the two are inversely related. If a friend asks you to lend them $10, you might lend it without even asking them what they need it for. If, instead, your friend asks you for $100, you might want to know what it’s for, and you might also ask for some indication of how soon you’ll be paid back. But if your friend asks you for $10,000… well, you’re probably going to want to see a business plan and a legally-binding contract laying out a repayment schedule. There is a general understanding in most walks of life that if someone asks you to invest in them more heavily, they expect to see more evidence that you can deliver on whatever it is that you’re promising to do.

At NIH, things don’t work exactly that way. In many ways, there’s actually a positive incentive to ask for more money when writing a grant application. The perverse incentives play out at multiple levels–both across different grant mechanisms, and within the workhorse R01 mechanism. In the former case, a glance at the success rates for different R mechanisms reveals something that many PIs are, in my experience, completely unaware of: “small”-grant mechanisms like the R03 and R21 have lower–in some cases much lower–success rates than R01s at nearly all NIH institutes. This despite the fact that R21s and R03s are advertised as requiring little or no pilot data, and have low budget caps and short award durations (e.g., a maximum of $275,000 over  two years for the R21).

Now you might say: well, sure, if you have a grant program expressly designed for exploratory projects, it’s not surprising if the funding rate is much lower, because you’re probably getting an obscene number of applications from people who aren’t in a position to compete for a full-blown R01. But that’s not really it, because the number of R21 and R03 submissions is also much lower than the number of R01 submissions (e.g., in 2013, NCI funded 14.7% of 4,170 R01 applications, but only 10.6% of 2,557 R21 applications). In the grand scheme of things, the amount of money allocated to “small” grants at NIH pales in comparison to the amount allocated to R01s.

The reason that R21s and R03s aren’t much more common is… well, I actually don’t know. But the point is that the data suggest that, in general (though there are of course exceptions), it’s empirically a pretty bad idea to submit R03s and R21s (particularly if you’re an Early Stage Investigator). The succes rates for R01s are higher, you can ask for a lot more money, the project periods are longer, and the amount of work involved in writing the proposal is not dramatically higher. When you look at it that way, it’s not so surprising that PIs don’t submit that many R21/R03 applications: on average, they’re a bad time investment.

The same perverse incentives apply even if you focus on only R01 submissions. You might think that, other things being equal, NIH would prioritize proposals that ask for less money. That may well be true from an administrative standpoint, in the sense that, if two applications receive exactly the same score from a review panel, and are pretty similar in most respects, one imagines that most program officers would prefer to fund the proposal with the smaller budget. But the problem is that, in the grand scheme of things, discretionary awards (i.e., where the PO has the power to choose which award to fund) are a relatively small proportion of the total budget. The  majority of proposals get funded because they receive very good scores at review. And it turns out that, at review, asking for more money can actually work in a PI’s favor.

To see why, consider the official NIH guidelines for reviewing budgets. Reviewers are explicitly instructed not to judge a proposal’s merit based on its budget:

Unless specified otherwise in the Funding Opportunity Announcement, consideration of the budget and project period should not affect the overall impact score.

What should the reviewer do, in regards to the budget? Well, not much:

The reviewer should determine whether the requested budget is realistic for the conduct of the project proposed.

The explicit decoupling of budget from merit sets up a very serious problem, because if you allow yourself to ask for more money, you can also propose correspondingly grander work. By the time reviewers see your proposal, they have no real way of knowing whether you first decided on the minimum viable research program you want to run and then came up with an appropriate budget, or if you instead picked a largish number out of a hat and then proposed a perfectly reasonable (but large) amount of science you could do in order to fit that budget.

At the risk of making my own life a little bit more difficult, I’m willing to put my money where my mouth is on this point. For just about every proposal I’ve sent to NIH so far, I’ve asked for more money than I strictly need. Now, “need” is a tricky word in this context. I emphatically am not suggesting that I routinely ask NIH for more money just for the sake of having more money. I can honestly say that I’ve never asked for any funds that I didn’t think I could use responsibly in the pursuit of what I consider to be good science. But the trouble is, virtually every PI who’s ever applied for government funding will happily tell you that they could always do more good science if they just had more money. And, to a first order of approximation, they’re right. Unless a PI already has multiple major grants (which is a very small proportion of PIs at NIH), she or he probably could do more good work if given more money. There might be diminishing returns at some point, but for the most part it should not be terribly surprising if the average PI could increase her or his productivity level somewhat if given the money to hire more personnel, buy better equipment, run more experiments, and so on.

Unfortunately, the NIH budget is a zero-sum game. Every grant dollar I get is a grant dollar some other PI doesn’t get. So, when I go out and ask for a large-but-not-unreasonable amount of money, knowing full well that I could still run a research lab and get at least some good science done with less money, I am, in a sense, screwing everyone else over. Except that I’m not really screwing everyone else over, because everyone else is doing exactly the same thing I am. And the result is that we end up with a lot of PIs proposing a lot of very large projects. The PIs who win the grant lottery (because, increasingly, that’s what it is) will, generally, do a lot of good science with it. So it’s not so much that money is wasted; it’s more that it’s not distributed optimally, because the current system incentivizes people to ask for as much money as they think they can responsibly manage, rather than asking for the minimum amount they need to actually sustain a viable research enterprise.

The fix

The solution to this problem is, on paper, quite simple (which is probably why it’s only on paper). The way to induce PIs to ask for the minimum amount they think they can do their research with–thereby freeing up money for everyone else–is to explicitly yoke risk to reward, so that there’s a clearly discernible cost to asking for every increment in funding. You want $50,000 a year? Okay, that’s pretty easy to fund, so we’re not going to ask you a lot of questions. You want $500k/year? Well, hey, look, there are 10 people out in the hallway who each claim they can produce two papers a year on just $50k. So you’re going to have to explain why we should fund one of you instead of ten of them.

How would this proposal be implemented? There are many ways one could go about it, but here’s one that makes sense to me. First, we get rid of all of the research grant (R-type) mechanisms–except maybe for those that have some clearly differentiated purpose (e.g., R25s for training courses). Second, we introduce new R grant programs defined only by their budget caps and durations. For example, we might have R50s (max 50k/year for 2 years), R150s (max 150k/year for 3 years), R300s (max 300k/year for 5 years), and so on. The top tier would have no explicit cap, just like the current R01s. Third, we explicitly tie success rates to budget caps by deciding (and publicly disclosing) how much money we’re allocating to each tier. Each NIH institute would have to decide approximately what its payline for each tier would be for the next year–with the general constraint that the money would be allocated in such a way as to produce a strong inverse correlation between success rate and budget amount. So we might see, for instance, NIMH funding R50s at 50%, R150s at 28%, R300s at 22%, and R1000s at 8%. There would presumably be an initial period of fine-tuning, but over four or five award cycles, the system would almost certainly settle into a fairly stable equilibrium. Paylines would necessarily rise, because PIs would be incentivized to ask for only as much money as they truly need.

The objection(s)

Are there objections to the approach I’ve suggested above? Sure. Perhaps the most obvious concern will come from people who do genuinely “big” science–i.e., who work in fields where simply keeping a small lab running can cost hundreds of thousands of dollars a year. Researchers in such fields might complain that yoking success rates to budgets would mean that their colleagues who work on less expensive scientific problems have a major advantage when it comes to securing funding, and that Big Science types would consequently find it harder to survive.

There are several things to note about this objection. First, there’s actually no necessary reason why yoking success rates to budgets has to hurt larger applications. The only assumption this proposal depends on is that, at the moment, some proportion of budgets are inflated–i.e., there are many researchers who could operate successfully (if less comfortably) on smaller budgets than they currently do. The fact that many other investigators couldn’t operate on smaller budgets is immaterial. If 25% of NIH PIs voluntarily opt into a research grant program that guarantees higher success rates in return for smaller budgets, the other 75% of PIs could potentially benefit even if they do nothing at all (depending on how success rates are set). So if you currently run a lab that can’t possibly run on less than $500k/year, you don’t necessarily lose anything if one of your colleagues who was previously submitting grants with $250k annual budgets decides to start writing grants with $125k caps in return for, say, a 10% increase in funding likelihood. On the contrary, it could actually mean that there’s more money left over at the end of the day to fund your own big grants.

Now, it’s certainly true that NIH PIs who work in cheaper domains would have an easier time staying afloat than ones who work in expensive domains. And it’s also true that NIH could explicitly bias in favor of small grants by raising the success rates for small grants disproportionately. But that isn’t necessarily a problem. Personally, I would argue that a moderate bias towards small grants is actually a very good thing. Remember: funding is a zero-sum game. It may seem egalitarian to make success rates independent of operating costs, because it feels like we’re giving everyone a roughly equal shot at a career in biomedical science, no matter what science they like to do. But in another sense, we aren’t being egalitarian at all, because what we’re actually saying is that a scientist who likes to work on $500k problems is worth five times as much to the taxpayer as one who likes to work on $100k problems. That seems unlikely to be true in the general case (though it may certainly be true in a minority of cases), because it’s hard to believe that the cost of doing scientific research is very closely linked to the potential benefits to people’s health (i.e., there are almost certainly many very expensive scientific disciplines that don’t necessarily produce very big benefits to taxpayers). Personally, I don’t see anything wrong with setting a higher bar for research programs that cost more taxpayer money to fund. And note that I’m arguing against my own self-interest here, because my own research is relatively expensive (most of it involves software development, and the average developer salary is roughly double the average postdoc salary).

Lastly, it’s important to keep in mind that this proposal doesn’t in any way precludes the use of other, complementary, funding mechanisms. At present, NIH already routinely issues PAs and RFAs for proposals in areas of particular interest, or which for various reasons (including budget-related considerations) need to be considered separately from other applications. This wouldn’t change in any way under the proposed system. So, for example, if NIH officials decided that it was in the nation’s best interest to fund a round of $10 million grants to develop new heart transplant techniques, they could still issue a special call for such proposals. The plan I’ve sketched above would apply only to “normal” grants.

Okay, so that’s all I have. I was initially going to list a few other potential objections (and rebuttals), but decided to leave that for discussion. Please use the comments to tell me (and perhaps NIH) why this proposal would or wouldn’t work.

whether or not you should pursue a career in science still depends mostly on that thing that is you

I took the plunge a couple of days ago and answered my first question on Quora. Since Brad Voytek won’t shut up about how great Quora is, I figured I should give it a whirl. So far, Brad is not wrong.

The question in question is: “How much do you agree with Johnathan Katz’s advice on (not) choosing science as a career? Or how realistic is it today (the article was written in 1999)?” The Katz piece referred to is here. The gist of it should be familiar to many academics; the argument boils down to the observation that relatively few people who start graduate programs in science actually end up with permanent research positions, and even then, the need to obtain funding often crowds out the time one has to do actual science. Katz’s advice is basically: don’t pursue a career in science. It’s not an optimistic piece.

My answer is, I think, somewhat more optimistic. Here’s the full text:

The real question is what you think it means to be a scientist. Science differs from many other professions in that the typical process of training as a scientist–i.e., getting a Ph.D. in a scientific field from a major research university–doesn’t guarantee you a position among the ranks of the people who are training you. In fact, it doesn’t come close to guaranteeing it; the proportion of PhD graduates in science who go on to obtain tenure-track positions at research-intensive universities is very small–around 10% in most recent estimates. So there is a very real sense in which modern academic science is a bit of a pyramid scheme: there are a relatively small number of people at the top, and a lot of people on the rungs below laboring to get up to the top–most of whom will, by definition, fail to get there.

If you equate a career in science solely with a tenure-track position at a major research university, and are considering the prospect of a Ph.D. in science solely as an investment intended to secure that kind of position, then Katz’s conclusion is difficult to escape. He is, in most respects, correct: in most biomedical, social, and natural science fields, science is now an extremely competitive enterprise. Not everyone makes it through the PhD; of those who do, not everyone makes it into–and then through–one more more postdocs; and of those who do that, relatively few secure tenure-track positions. Then, of those few “lucky” ones, some will fail to get tenure, and many others will find themselves spending much or most of their time writing grants and managing people instead of actually doing science. So from that perspective, Katz is probably right: if what you mean when you say you want to become a scientist is that you want to run your own lab at a major research university, then your odds of achieving that at the outset are probably not very good (though, to be clear, they’re still undoubtedly better than your odds of becoming a successful artist, musician, or professional athlete). Unless you have really, really good reasons to think that you’re particularly brilliant, hard-working, and creative (note: undergraduate grades, casual feedback from family and friends, and your own internal gut sense do not qualify as really, really good reasons), you probably should not pursue a career in science.

But that’s only true given a rather narrow conception where your pursuit of a scientific career is motivated entirely by the end goal rather than by the process, and where failure is anything other than ending up with a permanent tenure-track position. By contrast, if what you’re really after is an environment in which you can pursue interesting questions in a rigorous way, surrounded by brilliant minds who share your interests, and with more freedom than you might find at a typical 9 to 5 job, the dream of being a scientist is certainly still alive, and is worth pursuing. The trivial demonstration of this is that if you’re one of the many people who actuallyenjoy the graduate school environment (yes, they do exist!), it may not even matter to you that much whether or not you have a good shot of getting a tenure-track position when you graduate.

To see this, imagine that you’ve just graduated with an undergraduate degree in science, and someone offers you a choice between two positions for the next six years. One position is (relatively) financially secure, but involves rather boring work of quesitonable utility to society, an inflexible schedule, and colleagues who are mostly only there for a paycheck. The other position has terrible pay, but offers fascinating and potentially important work, a flexible lifestyle, and colleagues who are there because they share your interests and want to do scientific research.

Admittedly, real-world choices are rarely this stark. Many non-academic jobs offer many of the same perceived benefits of academia (e.g., many tech jobs offer excellent working conditions, flexible schedules, and important work). Conversely, many academic environments don’t quite live up to the ideal of a place where you can go to pursue your intellectual passion unfettered by the annoyances of “real” jobs–there’s often just as much in the way of political intrigue, personality dysfunction, and menial due-paying duties. But to a first approximation, this is basically the choice you have when considering whether to go to graduate school in science or pursue some other career: you’re trading financial security and a fixed 40-hour work week against intellectual engagement and a flexible lifestyle. And the point to note is that, even if we completely ignore what happens after the six years of grad school are up, there is clearly a non-negligible segment of the population who would quite happy opt for the second choice–even recognizing full well that at the end of six years they may have to leave and move onto something else, with little to show for their effort. (Of course, in reality we don’t need to ignore what happens after six years, because many PhDs who don’t get tenure-track positions find rewarding careers in other fields–many of them scientific in nature. And, even though it may not be a great economic investment, having a Ph.D. in science is a great thing to be able to put on one’s resume when applying for a very broad range of non-academic positions.)

The bottom line is that whether or not you should pursue a career in science has as much or more to do with your goals and personality as it does with the current environment within or outside of (academic) science. In an ideal world (which is certainly what the 1970s as described by Katz sound like, though I wasn’t around then), it wouldn’t matter: if you had any inkling that you wanted to do science for a living, you would simply go to grad school in science, and everything would probably work itself out. But given real-world constraints, it’s absolutely essentially that you think very carefully about what kind of environment makes you happy and what your expectations and goals for the future are. You have to ask yourself: Am I the kind of person who values intellectual freedom more than financial security? Do I really love the process of actually doing science–not some idealized movie version of it, but the actual messy process–enough to warrant investing a huge amount of my time and energy over the next few years? Can I deal with perpetual uncertainty about my future? And ultimately, would I be okay doing something that I really enjoy for six years if at the end of that time I have to walk away and do something very different?

If the answer to all of these questions is yes–and for many people it is!–then pursuing a career in science is still a very good thing to do (and hey, you can always quit early if you don’t like it–then you’ve lost very little time!). If the answer to any of them is no, then Katz may be right. A prospective career in science may or may not be for you, but at the very least, you should carefully consider alternative prospects. There’s absolutely no shame in going either route; the important thing is just to make an honest decision that takes the facts as they are and not as you wish that they were.

A couple of other thoughts I’ll add belatedly:

  • Calling academia a pyramid scheme is admittedly a bit hyperbolic. It’s true that the personnel structure in academia broadly has the shape of a pyramid, but that’s true of most organizations in most other domains too. Pyramid schemes are typically built on promises and lies that (almost by definition) can’t be realized, and I don’t think many people who enter a Ph.D. program in science can claim with a straight face that they were guaranteed a permanent research position at the end of the road (or that it’s impossible to get such a position). As I suggested in this post, it’s much more likely that everyone involved is simply guilty of minor (self-)deception: faculty don’t go out of their way to tell prospective students what the odds are of actually getting a tenure-track position, and prospective grad students don’t work very hard to find out the painful truth, or to tell faculty what their real intentions are after they graduate. And it may actually be better for everyone that way.
  • Just in case it’s not clear from the above, I’m not in any way condoning the historically low levels of science funding, or the fact that very few science PhDs go on to careers in academic research. I would love for NIH and NSF budgets (or whatever your local agency is) to grow substantially–and for everyone get exactly the kind of job they want, academic or not. But that’s not the world we live in, so we may as well be pragmatic about it and try to identify the conditions under which it does or doesn’t make sense to pursue a career in science right now.
  • I briefly mention this above, but it’s probably worth stressing that there are many jobs outside of academia that still allow one to do scientific research, albeit typically with less freedom (but often for better hours and pay). In particular, the market for data scientists is booming right now, and many of the hires are coming directly from academia. One lesson to take away from this is: if you’re in a science Ph.D. program right now, you should really spend as much time as you can building up your quantitative and technical skills, because they could very well be the difference between a job that involves scientific research and one that doesn’t in the event you leave academia. And those skills will still serve you well in your research career even if you end up staying in academia.


Big Pitch or Big Lottery? The unenviable task of evaluating the grant review system

This week’s issue of Science has an interesting article on The Big Pitch–a pilot NSF initiative to determine whether anonymizing proposals and dramatically cutting down their length (from 15 pages to 2) has a substantial impact on the results of the review process. The answer appears to be an unequivocal yes. From the article:

What happens is a lot, according to the first two rounds of the Big Pitch. NSF’s grant reviewers who evaluated short, anonymized proposals picked a largely different set of projects to fund compared with those chosen by reviewers presented with standard, full-length versions of the same proposals.

Not surprisingly, the researchers who did well under the abbreviated format are pretty pleased:

Shirley Taylor, an awardee during the evolution round of the Big Pitch, says a comparison of the reviews she got on the two versions of her proposal convinced her that anonymity had worked in her favor. An associate professor of microbiology at Virginia Commonwealth University in Richmond, Taylor had failed twice to win funding from the National Institutes of Health to study the role of an enzyme in modifying mitochondrial DNA.

Both times, she says, reviewers questioned the validity of her preliminary results because she had few publications to her credit. Some reviews of her full proposal to NSF expressed the same concern. Without a biographical sketch, Taylor says, reviewers of the anonymous proposal could “focus on the novelty of the science, and this is what allowed my proposal to be funded.”

Broadly speaking, there are two ways to interpret the divergent results of the standard and abbreviated review. The charitable interpretation is that the change in format is, in fact, beneficial, inasmuch as it eliminates prior reputation as one source of bias and forces reviewers to focus on the big picture rather than on small methodological details. Of course, as Prof-Like Substance points out in an excellent post, one could mount a pretty reasonable argument that this isn’t necessarily a good thing. After all, a scientist’s past publication record is likely to be a good predictor of their future success, so it’s not clear that proposals should be anonymous when large amounts of money are on the line (and there are other ways to counteract the bias against newbies–e.g., NIH’s approach of explicitly giving New Investigators a payline boost until they get their first R01). And similarly, some scientists might be good at coming up with big ideas that sound plausible at first blush and not so good at actually carrying out the research program required to bring those big ideas to fruition. Still, at the very least, if we’re being charitable, The Big Pitch certainly does seem like a very different kind of approach to review.

The less charitable interpretation is that the reason the ratings of the standard and abbreviated proposals showed very little correlation is that the latter approach is just fundamentally unreliable. If you suppose that it’s just not possible to reliably distinguish a very good proposal from a somewhat good one on the basis of just 2 pages, it makes perfect sense that 2-page and 15-page proposal ratings don’t correlate much–since you’re basically selecting at random in the 2-page case. Understandably, researchers who happen to fare well under the 2-page format are unlikely to see it that way; they’ll probably come up with many plausible-sounding reasons why a shorter format just makes more sense (just like most researchers who tend to do well with the 15-page format probably think it’s the only sensible way for NSF to conduct its business). We humans are all very good at finding self-serving rationalizations for things, after all.

Personally I don’t have very strong feelings about the substantive merits of short versus long-format review–though I guess I do find it hard to believe that 2-page proposals could be ranked very reliably given that some very strange things seem to happen with alarming frequency even with 12- and 15-page proposals. But it’s an empirical question, and I’d love to see relevant data. In principle, the NSF could have obtained that data by having two parallel review panels rate all of the 2-page proposals (or even 4 panels, since one would also like to know how reliable the normal review process is). That would allow the agency to directly quantify the reliability of the ratings by looking at their cross-panel consistency. Absent that kind of data, it’s very hard to know whether the results Science reports on are different because 2-page review emphasizes different (but important) things, or because a rating process based on an extended 2-page abstract just amounts to a glorified lottery.

Alternatively, and perhaps more pragmatically, NSF could just wait a few years to see how the projects funded under the pilot program turn out (and I’m guessing this is part of their plan). I.e., do the researchers who do well under the 2-page format end producing science as good as (or better than) the researchers who do well under the current system? This sounds like a reasonable approach in principle, but the major problem is that we’re only talking about a total of ~25 funded proposals (across two different review panels), so it’s unclear that there will be enough data to draw any firm conclusions. Certainly many scientists (including me) are likely to feel a bit uneasy at the thought that NSF might end up making major decisions about how to allocate billions of dollars on the basis of two dozen grants.

Anyway, skepticism aside, this isn’t really meant as a criticism of NSF so much as an acknowledgment of the fact that the problem in question is a really, really difficult one. The task of continually evaluating and improving the grant review process is not one anyone should want to take on lightly. If time and money were no object, every proposed change (like dramatically shortened proposals) would be extensively tested on a large scale and directly compared to the current approach before being implemented. Unfortunately, flying thousands of scientists to Washington D.C. is a very expensive business (to say nothing of all the surrounding costs), and I imagine that testing out a substantively different kind of review process on a large scale could easily run into the tens of millions of dollars. In a sense, the funding agencies can’t really win. On the one hand, if they only ever pilot new approaches on a small scale, they never get enough empirical data to confidently back major changes in policy. On the other hand, if they pilot new approaches on a large scale and those approaches end up failing to improve on the current system (as is the fate of most innovative new ideas), the funding agencies get hammered by politicians and scientists alike for wasting taxpayer money in an already-harsh funding climate.

I don’t know what the solution is (or if there is one), but if nothing else, I do think it’s a good thing that NSF and NIH continue to actively tinker with their various processes. After all, if there’s anything most researchers can agree on, it’s that the current system is very far from perfect.