Category Archives: science journalism

aftermath of the NYT / Lindstrom debacle

Over the last few days the commotion over Martin Lindstrom’s terrible New York Times iPhone loving Op-Ed, which I wrote about in my last post, seems to have spread far and wide. Highlights include excellent posts by David Dobbs and the Neurocritic, but really there are too many to list at this point. And the verdict is overwhelmingly negative; I don’t think I’ve seen a single post in defense of Lindstrom, which is probably not a good sign (for him).

In the meantime, Russ Poldrack and over 40 other neuroscientists and psychologists (including me) wrote a letter to the NYT complaining about the Lindstrom Op-Ed, which the NYT has now published. As per usual, they edited down the letter till it almost disappeared. But the original, along with a list of signees, is on Russ’s blog.

Anyway, the fact that the Times published the rebuttal letter is all well and good, but as I mentioned in my last post, the bigger problem is that since the Times doesn’t include links to related content on their articles, people who stumble across the Op-Ed aren’t going to have any way of knowing that it’s been roundly discredited by pretty much the entire web. Lindstrom’s piece was the most emailed article on the Times website for a day or two, but only a tiny fraction of those readers will ever see (or even hear about) the critical response. As far as I know, the NYT hasn’t issued an explanation or apology for publishing the Op-Ed; they’ve simply published the letter and gone on about their business (I guess I can’t fault them for this–if they had to issue a formal apology for every mistake that gets published, they’d have no time for anything else; the trick is really to catch this type of screw-up at the front end). Adding links from each article to related content wouldn’t solve the problem entirely, of course, but it would be something. The fact that Times’ platform currently doesn’t have this capacity is kind of perplexing.

The other point worth mentioning is that, in the aftermath of the tsunami of criticism he received, Lindstrom left a comment on several blogs (Russ Poldrack and David Dobbs were lucky recipients; sadly, I wasn’t on the guest list). Here’s the full text of the comment:

My first foray into neuro-marketing research was for my New York Times bestseller Buyology: Truth and Lies about Why We Buy. For that book I teamed up with Neurosense, a leading independent neuro-marketing company that specializes in consumer research using functional magnetic resonance imaging (fMRI) headed by Oxford University trained Gemma Calvert, BSc DPhil CPsychol FRSA and Neuro-Insight, a market research company that uses unique brain-imaging technology, called Steady-State Topography (SST), to measure how the brain responds to communications which is lead by Dr. Richard Silberstein, PhD. This was the single largest neuro-marketing study ever conducted—25x larger than any such study to date and cost more than seven million dollars to run.

In the three-year effort scientists scanned the brains of over 2,000 people from all over the world as they were exposed to various marketing and advertising strategies including clever product placements, sneaky subliminal messages, iconic brand logos, shocking health and safety warnings, and provocative product packages. The purpose of all of this was to understand, quite successfully I may add, the key drivers behind why we make the purchasing decisions that we do.

For the research that my recent Op-Ed column in the New York Times was based on I turned to Dr. David Hubbard, a board-certified neurologist and his company MindSign Neuro Marketing, an independently owned fMRI neuro-marketing company. I asked Dr. Hubbard and his team a simple question, “Are we addicted to our iPhones?” After analyzing the brains of 8 men and 8 women between the ages of 18-25 using fMRI technology, MindSign answered my question using standardized answering methods and completely reproducible results. The conclusion was that we are not addicted to our iPhones, we are in love with them.

The thought provoking dialogue that has been generated from the article has been overwhelmingly positive and I look forward to the continued comments from professionals in the field, readers and fans.

Respectfully,

Martin Lindstrom

As evasive responses go, this is a masterpiece; at no point does Lindstrom ever actually address any of the substantive criticisms leveled at him. He spends most of his response name dropping (the list of credentials is almost long enough to make you forget that the rebuttal letter to his Op-Ed was signed by over 40 PhDs) and rambling about previous unrelated neuromarketing work (which may as well not exist, since none of it has ever been made public), and then closes by shifting the responsibility for the study to MindSign, the company he paid to run the iPhone study. The claim that MindSign “answered [his] question using standardized answering methods and completely reproducible results” is particularly ludicrous; as I explained in my last post, there currently aren’t any standardized methods for reading addiction or love off of brain images. And ‘completely reproducible results’ implies that one has, you know, successfully reproduced the results, which is simply false unless Lindstrom is suggesting that MindSign did the same experiment twice. It’s hard to see any “thought provoking dialogue” taking place here, and the neuroimaging community’s response to the Op-Ed column has been, virtually without exception, overwhelmingly negative, not positive (as Lindstrom claims).

That all said, I do think there’s one very positive aspect to this entire saga, and that’s the amazing speed and effectiveness of the response from scientists, science journalists, and other scientifically literate folks. Ten years ago, Lindstrom’s piece might have gone completely unchallenged–and even if someone like Russ Poldrack had written a response, it would probably have appeared much later, been signed by fewer scientists (because coordination would have been much more difficult), and received much less attention. But with 48 hours of Lindstrom’s Op-Ed being published, dozens of critical blog posts had appeared, and hundreds, if not thousands, of people all over the world had tweeted or posted links to these critiques (my last post alone received over 12,000 hits). Scientific discourse, which used to be confined largely to peer-reviewed print journals and annual conferences, now takes place at a remarkable pace online, and it’s fantastic to see social media used in this way. The hope is that as these technologies develop further and scientists take on a more active role in communicating with the public (something that platforms like Twitter and Google+ seem to be facilitating amazingly well), it’ll become increasingly difficult for people like Lindstrom to make crazy pseudoscientific claims without being immediately and visibly called out on it–even in those rare cases when the NYT makes the mistake of leaving one the biggest microphones on earth open and unmonitored.

the New York Times blows it big time on brain imaging

The New York Times has a terrible, terrible Op-Ed piece today by Martin Lindstrom (who I’m not going to link to, because I don’t want to throw any more bones his way). If you believe Lindstrom, you don’t just like your iPhone a lot; you love it. Literally. And the reason you love it, shockingly, is your brain:

Earlier this year, I carried out an fMRI experiment to find out whether iPhones were really, truly addictive, no less so than alcohol, cocaine, shopping or video games. In conjunction with the San Diego-based firm MindSign Neuromarketing, I enlisted eight men and eight women between the ages of 18 and 25. Our 16 subjects were exposed separately to audio and to video of a ringing and vibrating iPhone.

But most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion. The subjects’ brains responded to the sound of their phones as they would respond to the presence or proximity of a girlfriend, boyfriend or family member.

In short, the subjects didn’t demonstrate the classic brain-based signs of addiction. Instead, they loved their iPhones.

There’s so much wrong with just these three short paragraphs (to say nothing of the rest of the article, which features plenty of other whoppers) that it’s hard to know where to begin. But let’s try. Take first the central premise–that an fMRI experiment could help determine whether iPhones are no less addictive than alcohol or cocaine. The tacit assumption here is that all the behavioral evidence you could muster–say, from people’s reports about how they use their iPhones, or clinicians’ observations about how iPhones affect their users–isn’t sufficient to make that determination; to “really, truly” know if something’s addictive, you need to look at what the brain is doing when people think about their iPhones. This idea is absurd inasmuch as addiction is defined on the basis of its behavioral consequences, not (right now, anyway) by the presence or absence of some biomarker. What makes someone an alcoholic is the fact that they’re dependent on alcohol, have trouble going without it, find that their alcohol use interferes with multiple aspects of their day-to-day life, and generally suffer functional impairment because of it–not the fact that their brain lights up when they look at pictures of Johnny Walker red. If someone couldn’t stop drinking–to the point where they lost their job, family, and friends–but their brain failed to display a putative biomarker for addiction, it would be strange indeed to say “well, you show all the signs, but I guess you’re not really addicted to alcohol after all.”

Now, there may come a day (and it will be a great one) when we have biomarkers sufficiently accurate that they can stand in for the much more tedious process of diagnosing someone’s addiction the conventional way. But that day is, to put it gently, a long way off. Right now, if you want to know if iPhones are addictive, the best way to do that is to, well, spend some time observing and interviewing iPhone users (and some quantitative analysis would be helpful).

Of course, it’s not clear what Lindstrom thinks an appropriate biomarker for addiction would be in any case. Presumably it would have something to do with the reward system; but what? Suppose Lindstrom had seen robust activation in the ventral striatum–a critical component of the brain’s reward system–when participants gazed upon the iPhone: what then? Would this have implied people are addicted to iPhones? But people also show striatal activity when gazing on food, money, beautiful faces, and any number of other stimuli. Does that mean the average person is addicted to all of the above? A marker of pleasure or reward, maybe (though even that’s not certain), but addiction? How could a single fMRI experiment with 16 subjects viewing pictures of iPhones confirm or disconfirm the presence of addiction? Lindstrom doesn’t say. I suppose he has good reason not to say: if he really did have access to an accurate fMRI-based biomarker for addiction, he’d be in a position to make millions (billions?) off the technology. To date, no one else has come close to identifying a clinically accurate fMRI biomarker for any kind of addiction (for more technical readers, I’m talking here about cross-validated methods that have both sensitivity and specificity comparable to traditional approaches when applied to new subjects–not individual studies that claim 90% with-sample classification accuracy based on simple regression models). So we should, to put it mildly, be very skeptical that Lindstrom’s study was ever in a position to do what he says it was designed to do.

We should also ask all sorts of salient and important questions about who the people are who are supposedly in love with their iPhones. Who’s the “You” in the “You Love Your iPhone” of the title? We don’t know, because we don’t know who the participants in Lindstrom’s sample, were, aside from the fact that they were eight men and eight women aged 18 to 25. But we’d like to know some other important things. For instance, were they selected for specific characteristics? Were they, say, already avid iPhone users? Did they report loving, or being addicted to their iPhones? If so, would it surprise us that people chosen for their close attachment to their iPhones also showed brain activity patterns typical of close attachment? (Which, incidentally, they actually don’t–but more on that below.) And if not, are we to believe that the average person pulled off the street–who probably has limited experience with iPhones–really responds to the sound of their phones “as they would respond to the presence or proximity of a girlfriend, boyfriend or family member”? Is the takeaway message of Lindstrom’s Op-Ed that iPhones are actually people, as far as our brains are concerned?

In fairness, space in the Times is limited, so maybe it’s not fair to demand this level of detail in the Op-Ed iteslf. But the bigger problem is that we have no way of evaluating Lindstrom’s claims, period, because (as far as I can tell), his study hasn’t been published or peer-reviewed anywhere. Presumably, it’s proprietary information that belongs to the neuromarketing firm in question. Which is to say, the NYT is basically giving Lindstrom license to talk freely about scientific-sounding findings that can’t actually be independently confirmed, disputed, or critiqued by members of the scientific community with expertise in the very methods Lindstrom is applying (expertise which, one might add, he himself lacks). For all we know, he could have made everything up. To be clear, I don’t really think he did make everything up–but surely, somewhere in the editorial process someone at the NYT should have stepped in and said, “hey, these are pretty strong scientific claims; is there any way we can make your results–on which your whole article hangs–available for other experts to examine?”

This brings us to what might be the biggest whopper of all, and the real driver of the article title: the claim that “most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion“. Russ Poldrack already tore this statement to shreds earlier this morning:

Insular cortex may well be associated with feelings of love and compassion, but this hardly proves that we are in love with our iPhones.  In Tal Yarkoni’s recent paper in Nature Methods, we found that the anterior insula was one of the most highly activated part of the brain, showing activation in nearly 1/3 of all imaging studies!  Further, the well-known studies of love by Helen Fisher and colleagues don’t even show activation in the insula related to love, but instead in classic reward system areas.  So far as I can tell, this particular reverse inference was simply fabricated from whole cloth.  I would have hoped that the NY Times would have learned its lesson from the last episode.

But you don’t have to take Russ’s word for it; if you surf for a few terms on our Neurosynth website, making sure to select “forward inference” under image type, you’ll notice that the insula shows up for almost everything. That’s not an accident; it’s because the insula (or at least the anterior part of the insula) plays a very broad role in goal-directed cognition. It really is activated when you’re doing almost anything that involves, say, following instructions an experimenter gave you, or attending to external stimuli, or mulling over something salient in the environment. You can see this pretty clearly in this modified figure from our Nature Methods paper (I’ve circled the right insula):

Proportion of studies reporting activation at each voxel

The insula is one of a few ‘hotspots’ where activation is reported very frequently in neuroimaging articles (the other major one being the dorsal medial frontal cortex). So, by definition, there can’t be all that much specificity to what the insula is doing, since it pops up so often. To put it differently, as Russ and others have repeatedly pointed out, the fact that a given region activates when people are in a particular psychological state (e.g., love) doesn’t give you license to conclude that that state is present just because you see activity in the region in question. If language, working memory, physical pain, anger, visual perception, motor sequencing, and memory retrieval all activate the insula, then knowing that the insula is active is of very little diagnostic value. That’s not to say that some psychological states might not be more strongly associated with insula activity (again, you can see this on Neurosynth if you switch the image type to ‘reverse inference’ and browse around); it’s just that, probabilistically speaking, the mere fact that the insula is active gives you very little basis for saying anything concrete about what people are experiencing.

In fact, to account for Lindstrom’s findings, you don’t have to appeal to love or addiction at all. There’s a much simpler way to explain why seeing or hearing an iPhone might elicit insula activation. For most people, the onset of visual or auditory stimulation is a salient event that causes redirection of attention to the stimulated channel. I’d be pretty surprised, actually, if you could present any picture or sound to participants in an fMRI scanner and not elicit robust insula activity. Orienting and sustaining attention to salient things seems to be a big part of what the anterior insula is doing (whether or not that’s ultimately its ‘core’ function). So the most appropriate conclusion to draw from the fact that viewing iPhone pictures produces increased insula activity is something vague like “people are paying more attention to iPhones”, or “iPhones are particularly salient and interesting objects to humans living in 2011.” Not something like “no, really, you love your iPhone!”

In sum, the NYT screwed up. Lindstrom appears to have a habit of making overblown claims about neuroimaging evidence, so it’s not surprising he would write this type of piece; but the NYT editorial staff is supposedly there to filter out precisely this kind of pseudoscientific advertorial. And they screwed up. It’s a particularly big screw-up given that (a) as of right now, Lindstrom’s Op-Ed is the single most emailed article on the NYT site, and (b) this incident almost perfectly recapitulates another NYT article 4 years ago in which some neuroscientists and neuromarketers wrote a grossly overblown Op-Ed claiming to be able to infer, in detail, people’s opinions about presidential candidates. That time, Russ Poldrack and a bunch of other big names in cognitive neuroscience wrote a concise rebuttal that appeared in the NYT (but unfortunately, isn’t linked to from the original Op-Ed, so anyone who stumbles across the original now has no way of knowing how ridiculous it is). One hopes the NYT follows up in similar fashion this time around. They certainly owe it to their readers–some of whom, if you believe Lindstrom, are now in danger of dumping their current partners for their iPhones.

h/t: Molly Crockett

the ‘decline effect’ doesn’t work that way

Over the last four or five years, there’s been a growing awareness in the scientific community that science is an imperfect process. Not that everyone used to think science was a crystal ball with a direct line to the universe or anything, but there does seem to be a growing recognition that scientists are human beings with human flaws, and are susceptible to common biases that can make it more difficult to fully trust any single finding reported in the literature. For instance, scientists like interesting results more than boring results; we’d rather keep our jobs than lose them; and we have a tendency to see what we want to see, even when it’s only sort-of-kind-of there, and sometimes not there at all. All of these things contrive to produce systematic biases in the kinds of findings that get reported.

The single biggest contributor to the zeitgeist shift nudge is undoubtedly John Ioannidis (recently profiled in an excellent Atlantic article), whose work I can’t say enough good things about (though I’ve tried). But lots of other people have had a hand in popularizing the same or similar ideas–many of which actually go back several decades. I’ve written a bit about these issues myself in a number of papers (1, 2, 3) and blog posts (1, 2, 3, 4, 5), so I’m partial to such concerns. Still, important as the role of the various selection and publication biases is in charting the course of science, virtually all of the discussions of these issues have had a relatively limited audience. Even Ioannidis’ work, influential as it’s been, has probably been read by no more than a few thousand scientists.

Last week, the debate hit the mainstream when the New Yorker (circulation: ~ 1 million) published an article by Jonah Lehrer suggesting–or at least strongly raising the possibility–that something might be wrong with the scientific method. The full article is behind a paywall, but I can helpfully tell you that some people seem to have un-paywalled it against the New Yorker’s wishes, so if you search for it online, you will find it.

The crux of Lehrer’s argument is that many, and perhaps most, scientific findings fall prey to something called the “decline effect”: initial positive reports of relatively large effects are subsequently followed by gradually decreasing effect sizes, in some cases culminating in a complete absence of an effect in the largest, most recent studies. Lehrer gives a number of colorful anecdotes illustrating this process, and ends on a decidedly skeptical (and frankly, terribly misleading) note:

The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.

While Lehrer’s article received pretty positive reviews from many non-scientist bloggers (many of whom, dismayingly, seemed to think the take-home message was that since scientists always change their minds, we shouldn’t trust anything they say), science bloggers were generally not very happy with it. Within days, angry mobs of Scientopians and Nature Networkers started murdering unicorns; by the end of the week, the New Yorker offices were reduced to rubble, and the scientists and statisticians who’d given Lehrer quotes were all rumored to be in hiding.

Okay, none of that happened. I’m just trying to keep things interesting. Anyway, because I’ve been characteristically lazy slow on the uptake, by the time I got around to writing this post you’re now reading, about eighty hundred and sixty thousand bloggers had already weighed in on Lehrer’s article. That’s good, because it means I can just direct you to other people’s blogs instead of having to do any thinking myself. So here you go: good posts by Games With Words (whose post tipped me off to the article), Jerry Coyne, Steven Novella, Charlie Petit, and Andrew Gelman, among many others.

Since I’ve blogged about these issues before, and agree with most of what’s been said elsewhere, I’ll only make one point about the article. Which is that about half of the examples Lehrer talks about don’t actually seem to me to qualify as instances of the decline effect–at least as Lehrer defines it. The best example of this comes when Lehrer discusses Jonathan Schooler’s attempt to demonstrate the existence of the decline effect by running a series of ESP experiments:

In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhirie’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—-the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”–a standard statistical measure–“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhinos,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”

This is a pretty bad way to describe what’s going on, because it makes it sound like it’s a general principle of data collection that effects systematically get smaller. It isn’t. The variance around the point estimate of effect size certainly gets smaller as samples get larger, but the likelihood of an effect increasing is just as high as the likelihood of it decreasing. The absolutely critical point Lehrer left out is that you would only get the decline effect to show up if you intervened in the data collection or reporting process based on the results you were getting. Instead, most of Lehrer’s article presents the decline effect as if it’s some sort of mystery, rather than the well-understood process that it is. It’s as though Lehrer believes that scientific data has the magical property of telling you less about the world the more of it you have. Which isn’t true, of course; the problem isn’t that science is malfunctioning, it’s that scientists are still (kind of!) human, and are susceptible to typical human biases. The unfortunate net effect is that Lehrer’s article, while tremendously entertaining, achieves exactly the opposite of what good science journalism should do: it sows confusion about the scientific process and makes it easier for people to dismiss the results of good scientific work, instead of helping people develop a critical appreciation for the amazing power science has to tell us about the world.

what the arsenic effect means for scientific publishing

I don’t know very much about DNA (and by ‘not very much’ I sadly mean ‘next to nothing’), so when someone tells me that life as we know it generally doesn’t use arsenic to make DNA, and that it’s a big deal to find a bacterium that does, I’m willing to believe them. So too, apparently, are at least two or three reviewers for Science, which published a paper last week by a NASA group purporting to demonstrate exactly that.

Turns out the paper might have a few holes. In the last few days, the blogosphere has reached fever delirium pitch as critiques of the article have emerged from every corner; it seems like pretty much everyone with some knowledge of the science in question is unhappy about the paper. Since I’m not in any position to critique the article myself, I’ll take Carl Zimmer’s word for it in Slate yesterday:

Was this merely a case of a few isolated cranks? To find out, I reached out to a dozen experts on Monday. Almost unanimously, they think the NASA scientists have failed to make their case.  “It would be really cool if such a bug existed,” said San Diego State University’s Forest Rohwer, a microbiologist who looks for new species of bacteria and viruses in coral reefs. But, he added, “none of the arguments are very convincing on their own.” That was about as positive as the critics could get. “This paper should not have been published,” said Shelley Copley of the University of Colorado.

Zimmer then follows his Slate piece up with a blog post today in which he provides 13 experts’ unadulterated comments. While there are one or two (somewhat) positive reviews, the consensus clearly seems to be that the Science paper is (very) bad science.

Of course, scientists (yes, even Science reviewers) do occasionally make mistakes, so if we’re being charitable about it, we might chalk it up to human error (though some of the critiques suggest that these are elementary problems that could have been very easily addressed, so it’s possible there’s some disingenuousness involved). But what many bloggers (1, 2, 3, etc.) have found particularly inexcusable is the way NASA and the research team have handled the criticism. Zimmer again, in Slate:

I asked two of the authors of the study if they wanted to respond to the criticism of their paper. Both politely declined by email.

“We cannot indiscriminately wade into a media forum for debate at this time,” declared senior author Ronald Oremland of the U.S. Geological Survey. “If we are wrong, then other scientists should be motivated to reproduce our findings. If we are right (and I am strongly convinced that we are) our competitors will agree and help to advance our understanding of this phenomenon. I am eager for them to do so.”

“Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated,” wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. “The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner.”

A NASA spokesperson basically reiterated this point of view, indicating that NASA scientists weren’t going to respond to criticism of their work unless that criticism appeared in, you know, a respectable, peer-reviewed outlet. (Fortunately, at least one of the critics already has a draft letter to Science up on her blog.)

I don’t think it’s surprising that people who spend much of their free time blogging about science, and think it’s important to discuss scientific issues in a public venue, generally aren’t going to like being told that science blogging isn’t a legitimate form of scientific discourse. Especially considering that the critics here aren’t laypeople without scientific training; they’re well-respected scientists with areas of expertise that are directly relevant to the paper. In this case, dismissing trenchant criticism because it’s on the web rather than in a peer-reviewed journal seems kind of like telling someone who’s screaming at you that your house is on fire that you’re not going to listen to them until they adopt a more polite tone. It just seems counterproductive.

That said, I personally don’t think we should take the NASA team’s statements at face value. I very much doubt that what the NASA researchers are saying really reflect any deep philosophical view about the role of blogs in scientific discourse; it’s much more likely that they’re simply trying to buy some time while they figure out how to respond. On the face of it, they have a choice between two lousy options: either ignore the criticism entirely, which would be antithetical to the scientific process and would look very bad, or address it head-on–which, judging by the vociferousness and near-unanimity of the commentators, is probably going to be a losing battle. Shifting the terms of the debate by insisting on responding only in a peer-reviewed venue doesn’t really change anything, but it does buy the authors two or three weeks. And two or three weeks is worth like, forty attentional cycles in the blogosphere.

Mind you, I’m not saying we should sympathize with the NASA researchers just because they’re in a tough position. I think one of the main reasons the story’s attracted so much attention is precisely because people see it as a case of justice being served. The NASA team called a major press conference ahead of the paper’s publication, published its results in one of the world’s most prestigious science journals, and yet apparently failed to run relatively basic experimental controls in support of its conclusions. If the critics are to be believed, the NASA researchers are either disingenuous or incompetent; either way, we shouldn’t feel sorry for them.

What I do think this episode shows is that the rules of scientific publishing have fundamentally changed in the last few years–and largely for the better. I haven’t been doing science for very long, but even in the halcyon days of 2003, when I started graduate school, science blogging was practically nonexistent, and the main way you’d find out what other people thought about an influential new paper was by talking to people you knew at conferences (which could take several months) or waiting for critiques or replication failures to emerge in other peer-reviewed journals (which could take years). That kind of delay between publication and evaluation is disastrous for science, because in the time it takes for a consensus to emerge that a paper is no good, several research teams might have already started trying to replicate and extend the reported findings, and several dozen other researchers might have uncritically cited their paper peripherally in their own work. This delay is probably why, as John Ioannidis’ work so elegantly demonstrates, major studies published in high-impact journals tend to exert a disproportionate influence on the literature long after they’ve been resoundingly discredited.

The Arsenic Effect, if we can call it that, provides a nice illustration of the impact of new media on scientific communication. It’s a safe bet that there are now very few people who do anything even vaguely related to the NASA team’s research who haven’t been made aware that the reported findings are controversial. Which means that the process of attempting to replicate (or falsify) the findings will proceed much more quickly than it might have ten or twenty years ago, and there probably won’t be very many people who cite the Science paper as compelling evidence of terrestrial arsenic-based life. Perhaps more importantly, as researchers get used to the idea that their high-profile work is going to be instantly evaluated by thousands of pairs of highly trained eyes, any of which might be attached to a highly prolific pair of typing hands, there will be an increasingly strong disincentive to avoid being careless. That isn’t to say that bad science will disappear, of course; just that, in cases where the badness reflects a pressure to tell a good story at all costs, we’ll probably see less of it.

trouble with biomarkers and press releases

The latest issue of the Journal of Neuroscience contains an interesting article by Ecker et al in which the authors attempted to classify people with autism spectrum disorder (ASD) and health controls based on their brain anatomy, and report achieving “a sensitivity and specificity of up to 90% and 80%, respectively.” Before unpacking what that means, and why you probably shouldn’t get too excited (about the clinical implications, at any rate; the science is pretty cool), here’s a snippet from the decidedly optimistic press release that accompanied the study:

“Scientists funded by the Medical Research Council (MRC) have developed a pioneering new method of diagnosing autism in adults. For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy. The method could lead to the screening for autism spectrum disorders in children in the future.”

If you think this sounds too good to be true, that’s because it is. Carl Heneghan explains why in an excellent article in the Guardian:

How the brain scans results are portrayed is one of the simplest mistakes in interpreting diagnostic test accuracy to make. What has happened is, the sensitivity has been taken to be the positive predictive value, which is what you want to know: if I have a positive test do I have the disease? Not, if I have the disease, do I have a positive test? It would help if the results included a measure called the likelihood ratio (LR), which is the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder. In this case the LR is 4.5. We’ve put up an article if you want to know more on how to calculate the LR.

In the general population the prevalence of autism is 1 in 100; the actual chances of having the disease are 4.5 times more likely given a positive test. This gives a positive predictive value of 4.5%; about 5 in every 100 with a positive test would have autism.

For those still feeling confused and not convinced, let’s think of 10,000 children. Of these 100 (1%) will have autism, 90 of these 100 would have a positive test, 10 are missed as they have a negative test: there’s the 90% reported accuracy by the media.

But what about the 9,900 who don’t have the disease? 7,920 of these will test negative (the specificity3 in the Ecker paper is 80%). But, the real worry though, is the numbers without the disease who test positive. This will be substantial: 1,980 of the 9,900 without the disease. This is what happens at very low prevalences, the numbers falsely misdiagnosed rockets. Alarmingly, of the 2,070 with a positive test, only 90 will have the disease, which is roughly 4.5%.

In other words, if you screened everyone in the population for autism, and assume the best about the classifier reported in the JNeuro article (e.g., that the sample of 20 ASD participants they used is perfectly representative of the broader ASD population, which seems unlikely), only about 1 in 20 people who receive a positive diagnosis would actually deserve one.

Ecker et al object to this characterization, and reply to Heneghan in the comments (through the MRC PR office):

Our test was never designed to screen the entire population of the UK. This is simply not practical in terms of costs and effort, and besides totally  unjustified- why would we screen everybody in the UK for autism if there is no evidence whatsoever that an individual is affected?. The same case applies to other diagnostic tests. Not every single individual in the UK is tested for HIV. Clearly this would be too costly and unnecessary. However, in the group of individuals that are test for the virus, we can be very confident that if the test is positive that means a patient is infected. The same goes for our approach.

Essentially, the argument is that, since people would presumably be sent for an MRI scan because they were already under consideration for an ASD diagnosis, and not at random, the false positive rate would in fact be much lower than 95%, and closer to the 20% reported in the article.

One response to this reply–which is in fact Heneghan’s response in the comments–is to point out that the pre-test probability of ASD would need to be pretty high already in order for the classifier to add much. For instance, even if fully 30% of people who were sent for a scan actually had ASD, the posterior probability of ASD given a positive result would still be only 66% (Heneghan’s numbers, which I haven’t checked). Heneghan nicely contrasts these results with the standard for HIV testing, which “reports sensitivity of 99.7% and specificity of 98.5% for enzyme immunoassay.” Clearly, we have a long way to go before doctors can order MRI-based tests for ASD and feel reasonably confident that a positive result is sufficient grounds for an ASD diagnosis.

Setting Heneghan’s concerns about base rates aside, there’s a more general issue that he doesn’t touch on. It’s one that’s not specific to this particular study, and applies to nearly all studies that attempt to develop “biomarkers” for existing disorders. The problem is that the sensitivity and specificity values that people report for their new diagnostic procedure in these types of studies generally aren’t the true parameters of the procedure. Rather, they’re the sensitivity and specificity under the assumption that the diagnostic procedures used to classify patients and controls in the first place are themselves correct. In other words, in order to believe the results, you have to assume that the researchers correctly classified the subjects into patient and control groups using other procedures. In cases where the gold standard test used to make the initial classification is known to have near 100% sensitivity and specificity (e.g., for the aforementioned HIV tests), one can reasonably ignore this concern. But when we’re talking about mental health disorders, where diagnoses are fuzzy and borderline cases abound, it’s very likely that the “gold standard” isn’t really all that great to begin with.

Concretely,  studies that attempt to develop biomarkers for mental health disorders face two substantial problems. One is that it’s extremely unlikely that the clinical diagnoses are ever perfect; after all, if they were perfect, there’d be little point in trying to develop other diagnostic procedures! In this particular case, the authors selected subjects into the ASD group based on standard clinical instruments and structured interviews. I don’t know that there are many clinicians who’d claim with a straight face that the current diagnostic criteria for ASD (and there are multiple sets to choose from!) are perfect. From my limited knowledge, the criteria for ASD seem to be even more controversial than those for most other mental health disorders (which is saying something, if you’ve been following the ongoing DSM-V saga). So really, the accuracy of the classifier in the present study, even if you put the best face on it and ignore the base rate issue Heneghan brings up, is undoubtedly south of the 90% sensitivity / 80% specificity the authors report. How much south, we just don’t know, because we don’t really have any independent, objective way to determine who “really” should get an ASD diagnosis and who shouldn’t (assuming you think it makes sense to make that kind of dichotomous distinction at all). But 90% accuracy is probably a pipe dream, if for no other reason than it’s hard to imagine that level of consensus about autism spectrum diagnoses.

The second problem is that, because the researchers are using the MRI-based classifier to predict the clinician-based diagnosis, it simply isn’t possible for the former to exceed the accuracy of the latter. That bears repeating, because it’s important: no matter how good the MRI-based classifier is, it can only be as good as the procedures used to make the original diagnosis, and no better. It cannot, by definition, make diagnoses that are any more accurate than the clinicians who screened the participants in the authors’ ASD sample. So when you see the press release say this:

For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy.

You should really read it as this:

The method relies on structural (MRI) brain scans and has an accuracy rate approaching that of conventional clinical diagnosis.

That’s not quite as exciting, obviously, but it’s more accurate.

To be fair, there’s something of a catch-22 here, in that the authors didn’t really have a choice about whether or not to diagnose the ASD group using conventional criteria. If they hadn’t, reviewers and other researchers would have complained that we can’t tell if the ASD group is really an ASD group, because they authors used non-standard criteria. Under the circumstances, they did the only thing they could do. But that doesn’t change the fact that it’s misleading to intimate, as the press release does, that the new procedure might be any better than the old ones. It can’t be, by definition.

Ultimately, if we want to develop brain-based diagnostic tools that are more accurate than conventional clinical diagnoses, we’re going to need to show that these tools are capable of predicting meaningful outcomes that clinician diagnoses can’t. This isn’t an impossible task, but it’s a very difficult one. One approach you could take, for instance, would be to compare the ability of clinician diagnosis and MRI-based diagnosis to predict functional outcomes among subjects at a later point in time. If you could show that MRI-based classification of subjects at an early age was a stronger predictor of receiving an ASD diagnosis later in life than conventional criteria, that would make a really strong case for using the former approach in the real world. Short of that type of demonstration though, the only reason I can imagine wanting to use a procedure that was developed by trying to duplicate the results of an existing procedure is in the event that the new procedure is substantially cheaper or more efficient than the old one. Meaning, it would be reasonable enough to say “well, look, we don’t do quite as well with this approach as we do with a full clinical evaluation, but at least this new approach costs much less.” Unfortunately, that’s not really true in this case, since the price of even a short MRI scan is generally going to outweigh that of a comprehensive evaluation by a psychiatrist or psychotherapist. And while it could theoretically be much faster to get an MRI scan than an appointment with a mental health professional, I suspect that that’s not generally going to be true in practice either.

Having said all that, I hasten to note that all this is really a critique of the MRC press release and subsequently lousy science reporting, and not of the science itself. I actually think the science itself is very cool (but the Neuroskeptic just wrote a great rundown of the methods and results, so there’s not much point in me describing them here). People have been doing really interesting work with pattern-based classifiers for several years now in the neuroimaging literature, but relatively few studies have applied this kind of technique to try and discriminate between different groups of individuals in a clinical setting. While I’m not really optimistic that the technique the authors introduce in this paper is going to change the way diagnosis happens any time soon (or at least, I’d argue that it shouldn’t), there’s no question that the general approach will be an important piece of future efforts to improve clinical diagnoses by integrating biological data with existing approaches. But that’s not going to happen overnight, and in the meantime, I think it’s pretty irresponsible of the MRC to be issuing press releases claiming that its researchers can diagnose autism in adults with 90% accuracy.

ResearchBlogging.orgEcker C, Marquand A, Mourão-Miranda J, Johnston P, Daly EM, Brammer MJ, Maltezos S, Murphy CM, Robertson D, Williams SC, & Murphy DG (2010). Describing the brain in autism in five dimensions–magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30 (32), 10612-23 PMID: 20702694

a possible link between pesticides and ADHD

A forthcoming article in the journal Pediatrics that’s been getting a lot of press attention suggests that exposure to common pesticides may be associated with a substantially elevated risk of ADHD. More precisely, what the study found was that elevated urinary concentrations of organophosphate metabolites were associated with an increased likelihood of meeting criteria for an ADHD diagnosis. One of the nice things about this study is that the authors used archival data from the (very large) National Health and Nutrition Examination Survey (NHANES), so they were able to control for a relatively broad range of potential confounds (e.g., gender, age, SES, etc.). The primary finding is, of course, still based on observational data, so you wouldn’t necessarily want to conclude that exposure to pesticides causes ADHD. But it’s a finding that converges with previous work in animal models demonstrating that high exposure to organophosphate pesticides causes neurodevelopmental changes, so it’s by no means a crazy hypothesis.

I think it’s really pleasantly surprising to see how responsibly the popular press has covered this story (e.g., this, this, and this). Despite the obvious potential for alarmism, very few articles have led with a headline implying a causal link between pesticides and ADHD. They all say things like “associated with”, “tied to”, or “linked to”, which is exactly right. And many even explicitly mention the size of the effect in question–namely, approximately a 50% increase in risk of ADHD per 10-fold increase in concentration of pesticide metabolites. Given that most of the articles contain cautionary quotes from the study’s authors, I’m guessing the authors really emphasized the study’s limitations when dealing with the press, which is great. In any case, because the basic details of the study have already been amply described elsewhere (I thought this short CBS article was particularly good), I’ll just mention a few random thoughts here:

  • Often, epidemiological studies suffer from a gaping flaw in the sense that the more interesting causal story (and the one that prompts media attention) is far less plausible than other potential explanations (a nice example of this is the recent work on the social contagion of everything from obesity to loneliness). That doesn’t seem to be the case here. Obviously, there are plenty of other reasons you might get a correlation between pesticide metabolites and ADHD risk–for instance, ADHD is substantially heritable, so it could be that parents with a disposition to ADHD also have systematically different dietary habits (i.e., parental dispositions are a common cause of both urinary metabolites and ADHD status in children). But given the aforementioned experimental evidence, it’s not obvious that alternative explanations for the correlation are much more plausible than the causal story linking pesticide exposure to ADHD, so in that sense this is potentially a very important finding.
  • The use of a dichotomous dependent variable (i.e., children either meet criteria for ADHD or don’t; there are no shades of ADHD gray here) is a real problem in this kind of study, because it can make the resulting effects seem deceptively large. The intuitive way we think about the members of a category is to think in terms of prototypes, so that when you think about “ADHD” and “Not-ADHD” categories, you’re probably mentally representing an extremely hyperactive, inattentive child for the former, and a quiet, conscientious kid for the latter. If that’s your mental model, and someone comes along and tells you that pesticide exposure increases the risk of ADHD by 50%, you’re understandably going to freak out, because it’ll seem quite natural to interpret that as a statement that pesticides have a 50% chance of turning average kids into hyperactive ones. But that’s not the right way to think about it. In all likelihood, pesticides aren’t causing a small proportion of kids to go from perfectly average to completely hyperactive; instead, what’s probably happening is that the entire distribution is shifting over slightly. In other words, most kids who are exposed to pesticides (if we assume for the sake of argument that there really is a causal link) are becoming slightly more hyperactive and/or inattentive.
  • Put differently, what happens when you have a strict cut-off for diagnosis is that even small increases in underlying symptoms can result in a qualitative shift in category membership. If ADHD symptoms were measured on a continuous scale (which they actually probably were, before being dichotomized to make things simple and more consistent with previous work), these findings might have been reported as something like “a 10-fold increase in pesticide exposures is associated with a 2-point increase on a 30-point symptom scale,” which would have made it much clearer that, at worst, pesticides are only one of many other contributing factors to ADHD, and almost certainly not nearly as big a factor as some others. That’s not to say we shouldn’t be concerned if subsequent work supports a causal link, but just that we should retain perspective on what’s involved. No one’s suggesting that you’re going to feed your child an unwashed pear or two and end up with a prescription for Ritalin; the more accurate view would be that you might have a minority of kids who are already at risk for ADHD, and this would be just one more precipitating factor.
  • It’s also worth keeping in mind that the relatively large increase in ADHD risk is associated with a ten-fold increase in pesticide metabolites. As the authors note, that corresponds to the difference between the 25th and 75th percentiles in the sample. Although we don’t know exactly what that means in terms of real-world exposure to pesticides (because the authors didn’t have any data on grocery shopping or eating habits), it’s almost certainly a very sizable difference (I won’t get into the reasons why, except to note that the rank-order of pesticide metabolites must be relatively stable among children, or else there wouldn’t be any association with a temporally-extended phenotype like ADHD). So the point is, it’s probably not so easy to go from the 25th to the 75th percentile just by eating a few more fruits and vegetables here and there. So while it’s certainly advisable to try and eat better, and potentially to buy organic produce (if you can afford it), you shouldn’t assume that you can halve your child’s risk of ADHD simply by changing his or her diet slightly. These are, at the end of the day, small effects.
  • The authors report that fully 12% of children in this nationally representative sample met criteria for ADHD (mostly of the inattentive subtype). This, frankly, says a lot more about how silly the diagnostic criteria for ADHD are than about the state of the nation’s children. It’s frankly not plausible to suppose that 1 in 8 children really suffer from what is, in theory at least, a severe, potentially disabling disorder. I’m not trying to trivialize ADHD or argue that there’s no such thing, but simply to point out the dangers of medicalization. Once you’ve reached the point where 1 in every 8 people meet criteria for a serious disorder, the label is in danger of losing all meaning.

ResearchBlogging.orgBouchard, M., Bellinger, D., Wright, R., & Weisskopf, M. (2010). Attention-Deficit/Hyperactivity Disorder and Urinary Metabolites of Organophosphate Pesticides PEDIATRICS DOI: 10.1542/peds.2009-3058

the male brain hurts, or how not to write about science

My wife asked me to blog about this article on CNN because, she said, “it’s really terrible, and it shouldn’t be on CNN”. I usually do what my wife tells me to do, so I’m blogging about it. It’s by Louann Brizendine, M.D., author of the absolutely awful controversial book The Female Brain, and now, its manly counterpart, The Male Brain. From what I can gather, the CNN article, which is titled Love, Sex, and the Male Brain, is a precis of Brizendine’s new book (though I have no intention of reading the book to make sure). The article is pretty short, so I’ll go through the first half of it paragraph-by-paragraph. But I’ll warn you right now that it isn’t pretty, and will likely anger anyone with even a modicum of training in psychology or neuroscience.

Although women the world over have been doing it for centuries, we can’t really blame a guy for being a guy. And this is especially true now that we know that the male and female brains have some profound differences.

Our brains are mostly alike. We are the same species, after all. But the differences can sometimes make it seem like we are worlds apart.

So far, nothing terribly wrong here, just standard pop psychology platitudes. But it goes quickly downhill.

The “defend your turf” area — dorsal premammillary nucleus — is larger in the male brain and contains special circuits to detect territorial challenges by other males. And his amygdala, the alarm system for threats, fear and danger is also larger in men. These brain differences make men more alert than women to potential turf threats.

As Vaughan notes over at Mind Hacks, the dorsal premammillary nucleus (PMD) hasn’t been identified in humans, so it’s unclear exactly what chunk of tissue Brizendine’s referring to–let alone where the evidence that there are gender differences in humans might come from. The claim that the PMD is a “defend your turf” area might be plausible, if oh, I don’t know, you happen to think that the way rats behave under narrowly circumscribed laboratory conditions when confronted by an aggressor is a good guide to normal interactions between human males. (Then again, given that PMD lesions impair rats from running away when exposed to a cat, Brizendine could just as easily have concluded that the dorsal premammillary nucleus is the “fleeing” part of the brain.)

The amygdala claim is marginally less ridiculous: it’s not entirely clear that the amygdala is “the alarm system for threats, fear and danger”, but at least that’s a claim you can make with a straight face, since it’s one fairly common view among neuroscientists. What’s not really defensible is the claim that larger amygdalae “make men more alert than women to potential turf threats”, because (a) there’s limited evidence that the male amygdala really is larger than the female amygdala and (b) if such a difference exists, it’s very small, and (c) it’s not clear in any case how you go from a small between-group difference to the notion that somehow the amygdala is the reason why men maintain little interpersonal fiefdoms and women don’t.

Meanwhile, the “I feel what you feel” part of the brain — mirror-neuron system — is larger and more active in the female brain. So women can naturally get in sync with others’ emotions by reading facial expressions, interpreting tone of voice and other nonverbal emotional cues.

This falls under the rubric of “not even wrong“. The mirror neuron system isn’t a single “part of the brain”; current evidence suggests that neurons that show mirroring properties are widely distributed throughout multiple frontoparietal regions. So I don’t really know what brain region Brizendine is referring to (the fact that she never cites any empirical studies in support of her claims is something of an inconvenience in that respect). And even if I did know, it’s a safe bet it wouldn’t be the “I feel what you feel” brain region, because, as far as I know, no such thing exists. The central claim regarding mirror neurons isn’t that they support empathy per se, but that they support a much more basic type of representation–namely, abstract conceptual (as opposed to sensory/motor) representation of actions. And even that much weaker notion is controversial; for example, Greg Hickok has a couple of recent posts (and a widely circulated paper) arguing against it. No one, as far as I know, has provided any kind of serious evidence linking the mirror neuron system to females’ (modestly) superior nonverbal decoding ability.

Perhaps the biggest difference between the male and female brain is that men have a sexual pursuit area that is 2.5 times larger than the one in the female brain. Not only that, but beginning in their teens, they produce 200 to 250 percent more testosterone than they did during pre-adolescence.

Maybe the silliest paragraph in the whole article. Not only do I not know what region Brizendine is talking about here, I have absolutely no clue what the “sexual pursuit area” might be. It could be just me, I suppose, but I just searched Google Scholar for “sexual pursuit area” and got… zero hits. Is it a visual region? A part of the hypothalamus? The notoriously grabby motor cortex hand area? No one knows, and Brizendine isn’t telling.  Off-hand, I don’t know of any region of the human brain that shows the degree of sexual dimorphism Brizendine claims here.

If testosterone were beer, a 9-year-old boy would be getting the equivalent of a cup a day. But a 15-year-old would be getting the equivalent of nearly two gallons a day. This fuels their sexual engines and makes it impossible for them to stop thinking about female body parts and sex.

If each fiber of chest hair was a tree, a 12-year-old boy would have a Bonsai sitting on the kitchen counter, and a 30-year-old man would own Roosevelt National Forest. What you’re supposed to learn from this analogy, I honestly couldn’t tell you. It’s hard for me to think clearly about trees and hair you see, seeing as how I find it impossible to stop thinking about female body parts while I’m trying to write this.

All that testosterone drives the “Man Trance”– that glazed-eye look a man gets when he sees breasts. As a woman who was among the ranks of the early feminists, I wish I could say that men can stop themselves from entering this trance. But the truth is, they can’t. Their visual brain circuits are always on the lookout for fertile mates. Whether or not they intend to pursue a visual enticement, they have to check out the goods.

To a man, this is the most natural response in the world, so he’s dismayed by how betrayed his wife or girlfriend feels when she sees him eyeing another woman. Men look at attractive women the way we look at pretty butterflies. They catch the male brain’s attention for a second, but then they flit out of his mind. Five minutes later, while we’re still fuming, he’s deciding whether he wants ribs or chicken for dinner. He asks us, “What’s wrong?” We say, “Nothing.” He shrugs and turns on the TV. We smolder and fear that he’ll leave us for another woman.

This actually isn’t so bad if you ignore the condescending “men are animals with no self-control” implication and pretend Brizendine had just made the  indisputably true but utterly banal observation that men, on average, like to ogle women more than women, on average, like to ogle men.

Not surprisingly, the different objectives that men and women have in mating games put us on opposing teams — at least at first. The female brain is driven to seek security and reliability in a potential mate before she has sex. But a male brain is fueled to mate and mate again. Until, that is, he mates for life.

So men are driven to sleep around, again and again… until they stop sleeping around. It’s tautological and profound at the same time!

Despite stereotypes to the contrary, the male brain can fall in love just as hard and fast as the female brain, and maybe more so. When he meets and sets his sights on capturing “the one,” mating with her becomes his prime directive. And when he succeeds, his brain makes an indelible imprint of her. Lust and love collide and he’s hooked.

Failure to operationalize complex construct of “love” in a measurable way… check. Total lack of evidence in support of claim that men and women are equally love-crazy… check. Oblique reference to Star Trek universe… check. What’s not to like?

A man in hot pursuit of a mate doesn’t even remotely resemble a devoted, doting daddy. But that’s what his future holds. When his mate becomes pregnant, she’ll emit pheromones that will waft into his nostrils, stimulating his brain to make more of a hormone called prolactin. Her pheromones will also cause his testosterone production to drop by 30 percent.

You know, on the off-chance that something like this is actually true, I think it’s actually kind of neat. But I just can’t bring myself to do a literature search, because I’m pretty sure I’ll discover that the jury is still out on whether humans even emit and detect pheromones (ok, I know this isn’t a completely baseless claim), or that there’s little to no evidence of a causal relationship between women releasing pheromones and testosterone levels dropping in men. I don’t like to be disappointed, you see; it turns out it’s much easier to just decide what you want to believe ahead of time and then contort available evidence to fit that view.

Anyway, we’re only half-way through the article; Brizendine goes on in similar fashion for several hundred more words. Highlights include the origin of male poker face, the conflation of correlation and causation in sociable elderly men, and the effects of oxytocin on your grandfather. You should go read the reset of it if you practice masochism; I’m too full of rage depressed to write about it any more.

Setting aside the blatant exercise in irresponsible scientific communication (Brizendine has an MD, and appears to be at least nominally affiliated with UCSF’s psychiatry department, so ignorance shouldn’t really be a valid excuse here), I guess what I’d really like to know is what goes through Brizendine’s mind when she writes this sort of dreck. Does she really believe the ludicrous claims she makes? Is she fully aware she’s grossly distorting the empirical evidence if not outright confabulating, and is simply in it for the money? Or does she rationalize it as a case of the ends justifying the means, thinking the message she’s presenting is basically right, so it’s ok if nearly all a few of the details go missing in the process?

I understand that presenting scientific evidence in an accurate and entertaining manner is a difficult business, and many people who work hard at it still get it wrong pretty often (I make mistakes in my posts here all the time!). But many scientists still manage to find time in their busy schedules to write popular science books that present the science in an accessible way without having to make up ridiculous stories just to keep the reader entertained (Steven Pinker, Antonio Damasio, and Dan Gilbert are just a few of the first ones that spring to mind). And then there are amazing science writers like Carl Zimmer and David Dobbs who don’t necessarily have any professional training in the areas they write about, but still put in the time and energy to make sure they get the details right, and consistently write stories that blow me away (the highest compliment I can pay to a science story is that it makes me think “I wish I studied that“, and Zimmer’s articles routinely do that). That type of intellectual honesty is essential, because there’s really no point in going to the trouble of doing most scientific research if people get to disregard any findings they disagree with on ideological or aesthetic grounds, or can make up any evidence they like to fit their claims.

The sad thing is that Brizendine’s new book will probably sell more copies in its first year out than Carl Zimmer’s entire back catalogue. And it’s not going to sell all those copies because it’s a careful meditation on the subtle differences between genders that scientists have uncovered; it’s going to fly off the shelves because it basically regurgitates popular stereotypes about gender differences with a seemingly authoritative scientific backing. Instead of evaluating and challenging many of those notions with actual empirical data, people who read Brizendine’s work will now get to say “science proves it!”, making it that much more difficult for responsible scientists and journalists to tell the public what’s really true about gender differences.

You might say (or at least, Brizendine might say) that this is all well and good, but hopelessly naive and idealistic, and that telling an accurate story is always going to be less important than telling the public what it wants to hear about science, because the latter is the only way to ensure continued funding for and interest in scientific research. This isn’t that uncommon a sentiment; I’ve even heard a number of scientists who I otherwise have a great deal of respect for say something like this. But I think Brizendine’s work underscores the typical outcome of that type of reasoning: once you allow yourself to relax the standards for what counts as evidence, it becomes quite easy to rationalize almost any rhetorical abuse of science, and ultimately you abuse the public’s trust while muddying the waters for working scientists.

As with so many other things, I think Richard Feynman summed up this sentiment best:

I would like to add something that’s not essential to the science, but something I kind of believe, which is that you should not fool the layman when you’re talking as a scientist. I am not trying to tell you what to do about cheating on your wife, or fooling your girlfriend, or something like that, when you’re not trying to be a scientist, but just trying to be an ordinary human being. We’ll leave those problems up to you and your rabbi. I’m talking about a specific, extra type of integrity that is not lying, but bending over backwards to show how you are maybe wrong, that you ought to have when acting as a scientist. And this is our responsibility as scientists, certainly to other scientists, and I think to laymen.

For example, I was a little surprised when I was talking to a friend who was going to go on the radio. He does work on cosmology and astronomy, and he wondered how he would explain what the applications of this work were. “Well,” I said, “there aren’t any.” He said, “Yes, but then we won’t get support for more research of this kind.” I think that’s kind of dishonest. If you’re representing yourself as a scientist, then you should explain to the layman what you’re doing–and if they don’t want to support you under those circumstances, then that’s their decision.

No one doubts that men and women differ from one another, and the study of gender differences is an active and important area of psychology and neuroscience. But I can’t for the life of me see any merit in telling the public that men can’t stop thinking about breasts because they’re full of the beer-equivalent of two gallons of testosterone.

[Update 3/25: plenty of other scathing critiques pop up in the blogosphere today: Language Log, Salon, and Neuronarrative, and no doubt many others...]