how many Cortex publications in the hand is a Nature publication in the bush worth?

A provocative and very short Opinion piece by Julien Mayor (Are scientists nearsighted gamblers? The misleading nature of impact factors) was recently posted on the Frontiers in Psychology website (open access! yay!). Mayor’s argument is summed up nicely in this figure:

The left panel plots the mean versus median number of citations per article in a given year (each year is a separate point) for 3 journals: Nature (solid circles), Psych Review (squares), and Psych Science (triangles). The right panel plots the number of citations each paper receives in each of the first 15 years following its publication. What you can clearly see is that (a) the mean and median are very strongly related for the psychology journals, but completely unrelated for Nature, implying that a very small number of articles account for the vast majority of Nature citations (Mayor cites data indicating that up to 40% of Nature papers are never cited); and (b) Nature papers tend to get cited heavily for a year or two, and then disappear, whereas Psych Science, and particularly Psych Review, tend to have much longer shelf lives. Based on these trends, Mayor concludes that:

From this perspective, the IF, commonly accepted as golden standard for performance metrics seems to reward high-risk strategies (after all your Nature article has only slightly over 50% chance of being ever cited!), and short-lived outbursts. Are scientists then nearsighted gamblers?

I’d very much like to believe this, in that I think the massive emphasis scientists collectively place on publishing work in broad-interest, short-format journals like Nature and Science is often quite detrimental to the scientific enterprise as a whole. But I don’t actually believe it, because I think that, for any individual paper, researchers generally do have good incentives to try to publish in the glamor mags rather than in more specialized journals. Mayor’s figure, while informative, doesn’t take a number of factors into account:

  • The type of papers that gets published in Psych Review and Nature are very different. Review papers, in general, tend to get cited more often, and for a longer time. A better comparison would be between Psych Review papers and only review papers in Nature (there’s not many of them, unfortunately). My guess is that that difference alone probably explains much of the difference in citation rates later on in an article’s life. That would also explain why the temporal profile of Psych Science articles (which are also overwhelmingly short empirical reports) is similar to that of Nature. Major theoretical syntheses stay relevant for decades; individual empirical papers, no matter how exciting, tend to stop being cited as frequently once (a) the finding fails to replicate, or (b) a literature builds up around the original report, and researchers stop citing individual studies and start citing review articles (e.g., in Psych Review).
  • Scientists don’t just care about citation counts, they also care about reputation. The reality is that much of the appeal of having a Nature or Science publication isn’t necessarily that you expect the work to be cited much more heavily, but that you get to tell everyone else how great you must be because you have a publication in Nature. Now, on some level, we know that it’s silly to hold glamor mags in such high esteem, and Mayor’s data are consistent with that idea. In an ideal world, we’d read all papers ultra-carefully before making judgments about their quality, rather than using simple but flawed heuristics like what journal those papers happen to be published in. But this isn’t an ideal world, and the reality is that people do use such heuristics. So it’s to each scientist’s individual advantage (but to the field’s detriment) to take advantage of that knowledge.
  • Different fields have very different citation rates. And articles in different fields have very different shelf lives. For instance, I’ve heard that in many areas of physics, the field moves so fast that articles are basically out of date within a year or two (I have no way to verify if this is true or not). That’s certainly not true of most areas of psychology. For instance, in cognitive neuroscience, the current state of the field in many areas is still reasonably well captured by highly-cited publications that are 5 – 10 years old. Most behavioral areas of psychology seem to advance even more slowly. So one might well expect articles in psychology journals to peak later in time than the average Nature article, because Nature contains a high proportion of articles in the natural sciences.
  • Articles are probably selected for publication in Nature, Psych Science, and Psych Review for different reasons. In particular, there’s no denying the fact that Nature selects articles in large part based on the perceived novelty and unexpectedness of the result. That’s not to say that methodological rigor doesn’t play a role, just that, other things being equal, unexpected findings are less likely to be replicated. Since Nature and Science overwhelmingly publish articles with new and surprising findings, it shouldn’t be surprising if the articles in these journals have a lower rate of replication several years on (and hence, stop being cited). That’s presumably going to be less true of articles in specialist journals, where novelty factor and appeal to a broad audience are usually less important criteria.

Addressing these points would probably go a long way towards closing, and perhaps even reversing, the gap implied  by Mayor’s figure. I suspect that if you could do a controlled experiment and publish the exact same article in Nature and Psych Science, it would tend to get cited more heavily in Nature over the long run. So in that sense, if citations were all anyone cared about, I think it would be perfectly reasonable for scientists to try to publish in the most prestigious journals–even though, again, I think the pressure to publish in such journals actually hurts the field as a whole.

Of course, in reality, we don’t just care about citation counts anyway; lots of other things matter. For one thing, we also need to factor in the opportunity cost associated with writing a paper up in a very specific format for submission to Nature or Science, knowing that we’ll probably have to rewrite much or all of it before it gets published. All that effort could probably have been spent on other projects, so one way to put the question is: how many lower-tier publications in the hand is a top-tier publication in the bush worth?

Ultimately, it’s an empirical matter; I imagine if you were willing to make some strong assumptions, and collect the right kind of data, you could come up with a meaningful estimate of the actual value of a Nature publication, as a function of important variables like the number of other publications the authors had, the amount of work invested in rewriting the paper after rejection, the authors’ career stage, etc. But I don’t know of any published work to that effect; it seems like it would probably be more trouble than it was worth (or, to get meta: how many Nature manuscripts can you write in the time it takes you to write a manuscript about how many Nature manuscripts you should write?). And, to be honest, I suspect that any estimate you obtained that way would have little or no impact on the actual decisions scientists make about where to submit their manuscripts anyway, because, in practice, such decisions are driven as much by guesswork and wishful thinking as by any well-reasoned analysis. And on that last point, I speak from extensive personal experience…

no one really cares about anything-but-zero

Tangentially related to the last post, Games With Words has a post up soliciting opinions about the merit of effect sizes. The impetus is a discussion we had in the comments on his last post about Jonah Lehrer’s New Yorker article. It started with an obnoxious comment (mine, of course) and then rapidly devolved into a  murderous duel civil debate about the importance (or lack thereof) of effect sizes in psychology. What I argued is that consideration of effect sizes is absolutely central to most everything psychologists do, even if that consideration is usually implicit rather than explicit. GWW thinks effect sizes aren’t that important, or at least, don’t have to be.

The basic observation in support of thinking in terms of effect sizes rather than (or in addition to) p values is simply that the null hypothesis is nearly always false. (I think I said “always” in the comments, but I can live with “nearly always”). There are exceedingly few testable associations between two or more variables that could plausibly have an effect size of exactly zero. Which means that if all you care about is rejecting the null hypothesis by reaching p < .05, all you really need to do is keep collecting data–you will get there eventually.

I don’t think this is a controversial point, and my sense is that it’s the received wisdom among (most) statisticians. That doesn’t mean that the hypothesis testing framework isn’t useful, just that it’s fundamentally rooted in ideas that turn out to be kind of silly upon examination. (For the record, I use significance tests all the time in my own work, and do all sorts of other things I know on some level to be silly, so I’m not saying that we should abandon hypothesis testing wholesale).

Anyway, GWW’s argument is that, at least in some areas of psychology, people don’t really care about effect sizes, and simply want to know if there’s a real effect or not. I disagree for at least two reasons. First, when people say they don’t care about effect sizes, I think what they really mean is that they don’t feel a need to explicitly think about effect sizes, because they can just rely on a decision criterion of p < .05 to determine whether or not an effect is ‘real’. The problem is that, since the null hypothesis is always false (i.e., effects are never exactly zero in the population), if we just keep collecting data, eventually all effects become statistically significant, rendering the decision criterion completely useless. At that point, we’d presumably have to rely on effect sizes to decide what’s important. So it may look like you can get away without considering effect sizes, but that’s only because, for the kind of sample sizes we usually work with, p values basically end up being (poor) proxies for effect sizes.

Second, I think it’s simply not true that we care about any effect at all. GWW makes a seemingly reasonable suggestion that even if it’s not sensible to care about a null of exactly zero, it’s quite sensible to care about nothing but the direction of an effect. But I don’t think that really works either. The problem is that, in practice, we don’t really just care about the direction of the effect; we also want to know that it’s meaningfully large (where ‘meaningfully’ is intentionally vague, and can vary from person to person or question to question). GWW gives a priming example: if a theoretical model predicts the presence of a priming effect, isn’t it enough just to demonstrate a statistically significant priming effect in the predicted direction? Does it really matter how big the effect is?

Yes. To see this, suppose that I go out and collect priming data online from 100,000 subjects, and happily reject the null at p < .05 based on a priming effect of a quarter of a millisecond (where the mean response time is, say, on the order of a second). Does that result really provide any useful support for my theory, just because I was able to reject the null? Surely not. For one thing, a quarter of a millisecond is so tiny that any reviewer worth his or her salt is going to point out that any number of confounding factors could be responsible for that tiny association. An effect that small is essentially uninterpretable. But there is, presumably, some minimum size for every putative effect which would lead us to say: “okay, that’s interesting. It’s a pretty small effect, but I can’t just dismiss it out of hand, because it’s big enough that it can’t be attributed to utterly trivial confounds.” So yes, we do care about effect sizes.

The problem, of course, is that what constitutes a ‘meaningful’ effect is largely subjective. No doubt that’s why null hypothesis testing holds such an appeal for most of us (myself included)–it may be silly, but it’s at least objectively silly. It doesn’t require you to put your subjective beliefs down on paper. Still, at the end of the day, that apprehensiveness we feel about it doesn’t change the fact that you can’t get away from consideration of effect sizes, whether explicitly or implicitly. Saying that you don’t care about effect sizes doesn’t actually make it so; it just means that you’re implicitly saying that you literally care about any effect that isn’t exactly zero–which is, on its face, absurd. Had you picked any other null to test against (e.g., a range of standardized effect sizes between -0.1 and 0.1), you wouldn’t have that problem.

To reiterate, I’m emphatically not saying that anyone who doesn’t explicitly report, or even think about, effect sizes when running a study should be lined up against a wall and fired upon at will is doing something terribly wrong. I think it’s a very good idea to (a) run power calculations before starting a study, (b) frequently pause to reflect on what kinds of effects one considers big enough to be worth pursuing; and (c) report effect size measures and confidence intervals for all key tests in one’s papers. But I’m certainly not suggesting that if you don’t do these things, you’re a bad person, or even a bad researcher. All I’m saying is that the importance of effect sizes doesn’t go away just because you’re not thinking about them. A decision about what constitutes a meaningful effect size is made every single time you test your data against the null hypothesis; so you may as well be the one making that decision explicitly, instead of having it done for you implicitly in a silly way. No one really cares about anything-but-zero.

the ‘decline effect’ doesn’t work that way

Over the last four or five years, there’s been a growing awareness in the scientific community that science is an imperfect process. Not that everyone used to think science was a crystal ball with a direct line to the universe or anything, but there does seem to be a growing recognition that scientists are human beings with human flaws, and are susceptible to common biases that can make it more difficult to fully trust any single finding reported in the literature. For instance, scientists like interesting results more than boring results; we’d rather keep our jobs than lose them; and we have a tendency to see what we want to see, even when it’s only sort-of-kind-of there, and sometimes not there at all. All of these things contrive to produce systematic biases in the kinds of findings that get reported.

The single biggest contributor to the zeitgeist shift nudge is undoubtedly John Ioannidis (recently profiled in an excellent Atlantic article), whose work I can’t say enough good things about (though I’ve tried). But lots of other people have had a hand in popularizing the same or similar ideas–many of which actually go back several decades. I’ve written a bit about these issues myself in a number of papers (1, 2, 3) and blog posts (1, 2, 3, 4, 5), so I’m partial to such concerns. Still, important as the role of the various selection and publication biases is in charting the course of science, virtually all of the discussions of these issues have had a relatively limited audience. Even Ioannidis’ work, influential as it’s been, has probably been read by no more than a few thousand scientists.

Last week, the debate hit the mainstream when the New Yorker (circulation: ~ 1 million) published an article by Jonah Lehrer suggesting–or at least strongly raising the possibility–that something might be wrong with the scientific method. The full article is behind a paywall, but I can helpfully tell you that some people seem to have un-paywalled it against the New Yorker’s wishes, so if you search for it online, you will find it.

The crux of Lehrer’s argument is that many, and perhaps most, scientific findings fall prey to something called the “decline effect”: initial positive reports of relatively large effects are subsequently followed by gradually decreasing effect sizes, in some cases culminating in a complete absence of an effect in the largest, most recent studies. Lehrer gives a number of colorful anecdotes illustrating this process, and ends on a decidedly skeptical (and frankly, terribly misleading) note:

The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.

While Lehrer’s article received pretty positive reviews from many non-scientist bloggers (many of whom, dismayingly, seemed to think the take-home message was that since scientists always change their minds, we shouldn’t trust anything they say), science bloggers were generally not very happy with it. Within days, angry mobs of Scientopians and Nature Networkers started murdering unicorns; by the end of the week, the New Yorker offices were reduced to rubble, and the scientists and statisticians who’d given Lehrer quotes were all rumored to be in hiding.

Okay, none of that happened. I’m just trying to keep things interesting. Anyway, because I’ve been characteristically lazy slow on the uptake, by the time I got around to writing this post you’re now reading, about eighty hundred and sixty thousand bloggers had already weighed in on Lehrer’s article. That’s good, because it means I can just direct you to other people’s blogs instead of having to do any thinking myself. So here you go: good posts by Games With Words (whose post tipped me off to the article), Jerry Coyne, Steven Novella, Charlie Petit, and Andrew Gelman, among many others.

Since I’ve blogged about these issues before, and agree with most of what’s been said elsewhere, I’ll only make one point about the article. Which is that about half of the examples Lehrer talks about don’t actually seem to me to qualify as instances of the decline effect–at least as Lehrer defines it. The best example of this comes when Lehrer discusses Jonathan Schooler’s attempt to demonstrate the existence of the decline effect by running a series of ESP experiments:

In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhirie’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—-the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”–a standard statistical measure–“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhinos,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”

This is a pretty bad way to describe what’s going on, because it makes it sound like it’s a general principle of data collection that effects systematically get smaller. It isn’t. The variance around the point estimate of effect size certainly gets smaller as samples get larger, but the likelihood of an effect increasing is just as high as the likelihood of it decreasing. The absolutely critical point Lehrer left out is that you would only get the decline effect to show up if you intervened in the data collection or reporting process based on the results you were getting. Instead, most of Lehrer’s article presents the decline effect as if it’s some sort of mystery, rather than the well-understood process that it is. It’s as though Lehrer believes that scientific data has the magical property of telling you less about the world the more of it you have. Which isn’t true, of course; the problem isn’t that science is malfunctioning, it’s that scientists are still (kind of!) human, and are susceptible to typical human biases. The unfortunate net effect is that Lehrer’s article, while tremendously entertaining, achieves exactly the opposite of what good science journalism should do: it sows confusion about the scientific process and makes it easier for people to dismiss the results of good scientific work, instead of helping people develop a critical appreciation for the amazing power science has to tell us about the world.

repost: narrative tips from a grad school applicant

Since it’s grad school application season for undergraduates, I thought I’d repost some narrative tips about how to go about writing a personal statement for graduate programs in psychology. This is an old, old post from a long-deceased blog; it’s from way back in 2002 when I was applying to grad school. It’s kind of a serious piece; if I were to rewrite it today, the tone would be substantially lighter. I can’t guarantee that following these tips will get you into grad school, but I can promise that you’ll be amazed at the results.

The first draft of my personal statement was an effortful attempt to succinctly sum up my motivation for attending graduate school. I wanted to make my rationale for applying absolutely clear, so I slaved over the statement for three or four days, stopping only for the occasional bite of food and hour or two of sleep every night. I was pretty pleased with the result. For a first draft, I thought it showed great promise. Here’s how it started:

I want to go to,o grajit skool cuz my frend steve is in grajit and he says its ez and im good at ez stuff

When I showed this to my advisor he said, “I don’t know if humor is the way to go for this thing.”

I said, “What do you mean, humor?”

After that I took a three month break from writing my personal statement while I completed a grade 12 English equivalency exam and read a few of the classics to build up my vocabulary. My advisor said that even clever people like me needed help sometimes. I read Ulysses, The Odyssey, and a few other Greek sounding books, and a book called The Cat in the Hat which was by the same author as the others, but published posthumously. Satisfied that I was able to write a letter that would impress every graduate admissions committee in the world, I set about writing a second version of my personal statement. Here’s how that went:

Dear Dirty Admissions Committee,
Solemn I came forward and mounted the round gunrest. I faced about and blessed gravely thrice the Ivory Tower, the surrounding country, and all the Profs. Then catching sight of the fMRI machine, I bent towards it and made rapid crosses in the air, gurgling in my throat and shaking my head.

“Too literary,” said my advisor when I showed him.

“Mud,” I said, and went back to the drawing board.

The third effort was much better. I had weaned myself off the classics and resolved to write a personal statement that fully expressed what a unique human being I was and why I would be an asset to the program. I talked about how I could juggle three bean bags and almost four, I was working on four, and how I’d stopped biting my fingernails last year so I had lots of free time to do psychology now. To show that I was good at following through on things that I started, I said,

p.s. when I can juggle four bean bags ( any day now) I will write you to let you know so you can update your file.

Satisfied that I had written the final copy of my statement, I showed it to my advisor. He was wild-eyed about it.

“You just don’t get it, do you,” he said, ripping my statement in two and throwing it into the wastepaper basket. “Tell you what. Why don’t I write a statement for you. And then you can go through it and make small changes to personalize it. Ok?”

“Sure,” I said. So the next day my advisor gave me a two-page personal statement he had written for me. Now I won’t bore you with all of the details, but I have to say, it was pretty bad. Here’s how it started:

After studying psychology for nearly four years at the undergraduate level, I have decided to pursue a Ph.D. in the field. I have developed a keen interest in [list your areas of interest here] and believe [university name here] will offer me outstanding opportunities.

“Now go make minor changes,” said my advisor.

“Mud,” I said, and went to make minor changes.

I came back with the final version a week later. It was truly a masterpiece; co-operating with my advisor had really helped. At first I had been skeptical because what he wrote was so bad the way he gave it to me, but with a judicious sprinkling of helpful clarifications, it turned into something really good. It was sort of like an ugly cocoon (his draft) bursting into a beautiful rainbow (my version). It went like this:

After studying psychology (and juggling!) for nearly four years at the undergraduate level (of university), I have decided to pursue a Ph.D. in the field. Cause I need it to become a Prof. I have developed a keen interest in [list your areas of interest here Vision, Language, Memory, Brain] and believe [university name hereStanford Princeton Mishigan] will offer me outstanding opportunities in psychology and for the juggling society.

“Brilliant,” said my advisor when I showed it to him. “You’ve truly outdone yourself.”

“Mud,” I said, and went to print six more copies.

what the arsenic effect means for scientific publishing

I don’t know very much about DNA (and by ‘not very much’ I sadly mean ‘next to nothing’), so when someone tells me that life as we know it generally doesn’t use arsenic to make DNA, and that it’s a big deal to find a bacterium that does, I’m willing to believe them. So too, apparently, are at least two or three reviewers for Science, which published a paper last week by a NASA group purporting to demonstrate exactly that.

Turns out the paper might have a few holes. In the last few days, the blogosphere has reached fever delirium pitch as critiques of the article have emerged from every corner; it seems like pretty much everyone with some knowledge of the science in question is unhappy about the paper. Since I’m not in any position to critique the article myself, I’ll take Carl Zimmer’s word for it in Slate yesterday:

Was this merely a case of a few isolated cranks? To find out, I reached out to a dozen experts on Monday. Almost unanimously, they think the NASA scientists have failed to make their case.  “It would be really cool if such a bug existed,” said San Diego State University’s Forest Rohwer, a microbiologist who looks for new species of bacteria and viruses in coral reefs. But, he added, “none of the arguments are very convincing on their own.” That was about as positive as the critics could get. “This paper should not have been published,” said Shelley Copley of the University of Colorado.

Zimmer then follows his Slate piece up with a blog post today in which he provides 13 experts’ unadulterated comments. While there are one or two (somewhat) positive reviews, the consensus clearly seems to be that the Science paper is (very) bad science.

Of course, scientists (yes, even Science reviewers) do occasionally make mistakes, so if we’re being charitable about it, we might chalk it up to human error (though some of the critiques suggest that these are elementary problems that could have been very easily addressed, so it’s possible there’s some disingenuousness involved). But what many bloggers (1, 2, 3, etc.) have found particularly inexcusable is the way NASA and the research team have handled the criticism. Zimmer again, in Slate:

I asked two of the authors of the study if they wanted to respond to the criticism of their paper. Both politely declined by email.

“We cannot indiscriminately wade into a media forum for debate at this time,” declared senior author Ronald Oremland of the U.S. Geological Survey. “If we are wrong, then other scientists should be motivated to reproduce our findings. If we are right (and I am strongly convinced that we are) our competitors will agree and help to advance our understanding of this phenomenon. I am eager for them to do so.”

“Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated,” wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. “The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner.”

A NASA spokesperson basically reiterated this point of view, indicating that NASA scientists weren’t going to respond to criticism of their work unless that criticism appeared in, you know, a respectable, peer-reviewed outlet. (Fortunately, at least one of the critics already has a draft letter to Science up on her blog.)

I don’t think it’s surprising that people who spend much of their free time blogging about science, and think it’s important to discuss scientific issues in a public venue, generally aren’t going to like being told that science blogging isn’t a legitimate form of scientific discourse. Especially considering that the critics here aren’t laypeople without scientific training; they’re well-respected scientists with areas of expertise that are directly relevant to the paper. In this case, dismissing trenchant criticism because it’s on the web rather than in a peer-reviewed journal seems kind of like telling someone who’s screaming at you that your house is on fire that you’re not going to listen to them until they adopt a more polite tone. It just seems counterproductive.

That said, I personally don’t think we should take the NASA team’s statements at face value. I very much doubt that what the NASA researchers are saying really reflect any deep philosophical view about the role of blogs in scientific discourse; it’s much more likely that they’re simply trying to buy some time while they figure out how to respond. On the face of it, they have a choice between two lousy options: either ignore the criticism entirely, which would be antithetical to the scientific process and would look very bad, or address it head-on–which, judging by the vociferousness and near-unanimity of the commentators, is probably going to be a losing battle. Shifting the terms of the debate by insisting on responding only in a peer-reviewed venue doesn’t really change anything, but it does buy the authors two or three weeks. And two or three weeks is worth like, forty attentional cycles in the blogosphere.

Mind you, I’m not saying we should sympathize with the NASA researchers just because they’re in a tough position. I think one of the main reasons the story’s attracted so much attention is precisely because people see it as a case of justice being served. The NASA team called a major press conference ahead of the paper’s publication, published its results in one of the world’s most prestigious science journals, and yet apparently failed to run relatively basic experimental controls in support of its conclusions. If the critics are to be believed, the NASA researchers are either disingenuous or incompetent; either way, we shouldn’t feel sorry for them.

What I do think this episode shows is that the rules of scientific publishing have fundamentally changed in the last few years–and largely for the better. I haven’t been doing science for very long, but even in the halcyon days of 2003, when I started graduate school, science blogging was practically nonexistent, and the main way you’d find out what other people thought about an influential new paper was by talking to people you knew at conferences (which could take several months) or waiting for critiques or replication failures to emerge in other peer-reviewed journals (which could take years). That kind of delay between publication and evaluation is disastrous for science, because in the time it takes for a consensus to emerge that a paper is no good, several research teams might have already started trying to replicate and extend the reported findings, and several dozen other researchers might have uncritically cited their paper peripherally in their own work. This delay is probably why, as John Ioannidis’ work so elegantly demonstrates, major studies published in high-impact journals tend to exert a disproportionate influence on the literature long after they’ve been resoundingly discredited.

The Arsenic Effect, if we can call it that, provides a nice illustration of the impact of new media on scientific communication. It’s a safe bet that there are now very few people who do anything even vaguely related to the NASA team’s research who haven’t been made aware that the reported findings are controversial. Which means that the process of attempting to replicate (or falsify) the findings will proceed much more quickly than it might have ten or twenty years ago, and there probably won’t be very many people who cite the Science paper as compelling evidence of terrestrial arsenic-based life. Perhaps more importantly, as researchers get used to the idea that their high-profile work is going to be instantly evaluated by thousands of pairs of highly trained eyes, any of which might be attached to a highly prolific pair of typing hands, there will be an increasingly strong disincentive to avoid being careless. That isn’t to say that bad science will disappear, of course; just that, in cases where the badness reflects a pressure to tell a good story at all costs, we’ll probably see less of it.