Category Archives: science journalism

In defense of In Defense of Facebook

A long, long time ago (in social media terms), I wrote a post defending Facebook against accusations of ethical misconduct related to a newly-published study in PNAS. I won’t rehash the study, or the accusations, or my comments in any detail here; for that, you can read the original post (I also recommend reading this or this for added context). While I stand by most of what I wrote, as is the nature of things, sometimes new information comes to light, and sometimes people say things that make me change my mind. So I thought I’d post my updated thoughts and reactions. I also left some additional thoughts in a comment on my last post, which I won’t rehash here.

Anyway, in no particular order…

I’m not arguing for a lawless world where companies can do as they like with your data

Some people apparently interpreted my last post as a defense of Facebook’s data use policy in general. It wasn’t. I probably brought this on myself in part by titling the post “In Defense of Facebook”. Maybe I should have called it something like “In Defense of this one particular study done by one Facebook employee”. In any case, I’ll reiterate: I’m categorically not saying that Facebook–or any other company, for that matter–should be allowed to do whatever it likes with its users’ data. There are plenty of valid concerns one could raise about the way companies like Facebook store, manage, and use their users’ data. And for what it’s worth, I’m generally in favor of passing new rules regulating the use of personal data in the private sector. So, contrary to what some posts suggested, I was categorically not advocating for a laissez-faire world in which large corporations get to do as they please with your information, and there’s nothing us little people can do about it.

The point I made in my last post was much narrower than that–namely, that picking on the PNAS study as an example of ethically questionable practices at Facebook was a bad idea, because (a) there aren’t any new risks introduced by this manipulation that aren’t already dwarfed by the risks associated with using Facebook itself (which is not exactly a high-risk enterprise to begin with), and (b) there are literally thousands of experiments just like this being conducted every day by large companies intent on figuring out how best to market their products and services–so Facebook’s study doesn’t stand out in any respect. My point was not that you shouldn’t be concerned about who has your data and how they’re using it, but that it’s deeply counterproductive to go after Facebook for this particular experiment when Facebook is of the few companies in this arena who actually (occasionally) publish the results of their findings in the scientific literature, instead of hiding them entirely from the light, as almost everyone else does. Of course, that will probably change as a result of this controversy.

I Was Wrong–A/B Testing Edition.

One claim I made in my last post that was very clearly wrong is this (emphasis added):

What makes the backlash on this issue particularly strange is that I’m pretty sure most people do actually realize that their experience on Facebook (and on other websites, and on TV, and in restaurants, and in museums, and pretty much everywhere else) is constantly being manipulated. I expect that most of the people who’ve been complaining about the Facebook study on Twitter are perfectly well aware that Facebook constantly alters its user experience–I mean, they even see it happen in a noticeable way once in a while, whenever Facebook introduces a new interface.

After watching the commentary over the past two days, I think it’s pretty clear I was wrong about this. A surprisingly large number of people clearly were genuinely unaware that Facebook, Twitter, Google, and other major players in every major industry (not just tech–also banks, groceries, department stores, you name it) are constantly running large-scale, controlled experiments on their users and customers. For instance, here’s a telling comment left on my last post:

The main issue I have with the experiment is that they conducted it without telling us. Given, that would have been counterproductive, but even a small adverse affect is still an adverse affect. I just don’t like the idea that corporations can do stuff to me without my consent. Just my opinion.

Similar sentiments are all over the place. Clearly, the revelation that Facebook regularly experiments on its users without their knowledge was indeed just that to many people–a revelation. I suppose in this sense, there’s potentially a considerable upside to this controversy, inasmuch as it has clearly served to raise awareness of industry-standard practices.

Questions about the ethics of the PNAS paper’s publication

My post focused largely on the question of whether the experiment Facebook conducted was itself illegal or unethical. I took this to be the primary concern of most lay people who have expressed concern about the episode. As I discussed in my post, I think it’s quite clear that the experiment itself is (a) entirely legal and that (b) any ethical objections one could raise are actually much broader objections about the way we regulate data use and consumer privacy, and have nothing to do with Facebook in particular. However, there’s a separate question that does specifically concern Facebook–or really, the authors of the PNAS paper–which is whether the authors, in their efforts to publish their findings, violated any laws or regulations.

When I wrote my post, I was under the impression–based largely on reports of an interview with the PNAS editor, Susan Fiske–that the authors had in fact obtained approval to conduct the study from an IRB, and had simply neglected to include that information in the text (which would have been an Editorial lapse, but not an unethical act). I wrote as much in a comment on my post. I was not suggesting–as some seemed to take away–that Facebook doesn’t need to get IRB approval. I was operating on the assumption that it had obtained IRB approval, based on the information available at the time.

In any case, it now appears that may not be exactly what happened. Unfortunately, it’s not yet clear exactly what did happen. One version of events people have suggested is that the study’s authors exploited a loophole in the rules by having Facebook conduct and analyze the experiment without the involvement of the other authors–who only contributed to the genesis of the idea and the writing of the manuscript. However, this interpretation is not unambiguous, and risks maligning the authors’ reputations unfairly, because Adam Kramer’s post explaining the motivation for the experiment suggests that the idea for the experiment originated entirely at Facebook, and was related to internal needs:

The reason we did this research is because we care about the emotional impact of Facebook and the people that use our product. We felt that it was important to investigate the common worry that seeing friends post positive content leads to people feeling negative or left out. At the same time, we were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook. We didn’t clearly state our motivations in the paper.

How you interpret the ethics of the study thus depends largely on what you believe actually happened. If you believe that the genesis and design of the experiment were driven by Facebook’s internal decision-making, and the decision to publish an interesting finding came only later, then there’s nothing at all ethically questionable about the authors’ behavior. It would have made no more sense to seek out IRB approval for this one experiment than for any of the other in-house experiments Facebook regularly conducts. And there is, again, no question whatsoever that Facebook does not have to get approval from anyone to do experiments that are not for the purpose of systematic, generalizable research.

Moreover, since the non-Facebook authors did in fact ask the IRB to review their proposal to use archival data–and the IRB exempted them from review, as is routinely done for this kind of analysis–there would be no legitimacy to the claim that the authors acted unethically. About the only claim one could raise an eyebrow at is that the authors “didn’t clearly state” their motivations. But since presenting a post-hoc justification for one’s studies that has nothing to do with the original intention is extremely common in psychology (though it shouldn’t be), it’s not really fair to fault Kramer et al for doing something that is standard practice.

If, on the other hand, the idea for the study did originate outside of Facebook, and the authors deliberately attempted to avoid prospective IRB review, then I think it’s fair to say that their behavior was unethical. However, given that the authors were following the letter of the law (if clearly not the spirit), it’s not clear that PNAS should have, or could have, rejected the paper. It certainly should have demanded that information regarding interactions with the IRB be included in the manuscript, and perhaps it could have published some kind of expression of concern alongside the paper. But I agree with Michelle Meyer’s analysis that, in taking the steps they took, the authors are almost certainly operating within the rules, because (a) Facebook itself is not subject to HHS rules, (b) the non-Facebook authors were not technically “engaged in research”, and (c) the archival use of already-collected data by the non-Facebook authors was approved by the Cornell IRB (or rather, the study was exempted from further review).

Absent clear evidence of what exactly happened in the lead-up to publication, I think the appropriate course of action is to withhold judgment. In the interim, what the episode clearly does do is lay bare how ill-prepared the existing HHS regulations are for dealing with the research use of data collected online–particularly when the data was acquired by private entities. Actually, it’s not just research use that’s problematic; it’s clear that many people complaining about Facebook’s conduct this week don’t really give a hoot about the “generalizable knowledge” side of things, and are fundamentally just upset that Facebook is allowed to run these kinds of experiments at all without providing any notification.

In my view, what’s desperately called for is a new set of regulations that provide a unitary code for dealing with consumer data across the board–i.e., in both research and non-research contexts. This leaves aside exactly what such regulations would look like, of course. My personal view is that the right direction to move in is to tighten consumer protection laws to better regulate management and use of private citizens’ data, while simultaneously liberalizing the research use of private datasets that have already been acquired. For example, I would favor a law that (a) forced Facebook and other companies to more clearly and explicitly state how they use their users’ data, (b) provided opt-out options when possible, along with the ability for users to obtain report of how their data has been used in the past, and (c) gave blanket approval to use data acquired under these conditions for any and all academic research purposes so long as the data are deidentified. Many people will disagree with this, of course, and have very different ideas. That’s fine; the key point is that the conversation we should be having is about how to update and revise the rules governing research vs. non-research uses of data in such a way that situations like the PNAS study don’t come up again.

What Facebook does is not research–until they try to publish it

Much of the outrage over the Facebook experiment is centered around the perception that Facebook shouldn’t be allowed to conduct research on its users without their consent. What many people mean by this, I think, is that Facebook shouldn’t be allowed to conduct any experiments on its users for purposes of learning things about user experience and behavior unless Facebook explicitly asks for permission. A point that I should have clarified in my original post is that Facebook users are, in the normal course of things, not considered participants in a research study, no matter how or how much their emotions are manipulated. That’s because the HHS’s definition of research includes, as a necessary component, that there be an active intention to contribute to generalizable new knowledge.

Now, to my mind, this isn’t a great way to define “research”–I think it’s a good idea to avoid definitions that depend on knowing what people’s intentions were when they did something. But that’s the definition we’re stuck with, and there’s really no ambiguity over whether Facebook’s normal operations–which include constant randomized, controlled experimentation on its users–constitute research in this sense. They clearly don’t. Put simply, if Facebook were to eschew disseminating its results to the broader community, the experiment in question would not have been subject to any HHS regulations whatsoever (though, as Michelle Meyer astutely pointed out, technically the experiment probably isn’t subject to HHS regulation even now, so the point is moot). Again, to reiterate: it’s only the fact that Kramer et al wanted to publish their results in a scientific journal that opened them up to criticism of research misconduct in the first place.

This observation may not have any impact on your view if your concern is fundamentally about the publication process–i.e., you don’t object to Facebook doing the experiment; what you object to is Facebook trying to disseminate their findings as research. But it should have a strong impact on your views if you were previously under the impression that Facebook’s actions must have violated some existing human subjects regulation or consumer protection law. The laws in the United States–at least as I understand them, and I admittedly am not a lawyer–currently afford you no such protection.

Now, is it a good idea to have two very separate standards, one for research and one for everything else? Probably not. Should Facebook be allowed to do whatever it wants to your user experience so long as it’s covered under the Data Use policy in the user agreement you didn’t read? Probably not. But what’s unequivocally true is that, as it stands right now, your interactions with Facebook–no matter how your user experience, data, or emotions are manipulated–are not considered research unless Facebook manipulates your experience with the express intent of disseminating new knowledge to the world.

Informed consent is not mandatory for research studies

As a last point, there seems to be a very common misconception floating around among commentators that the Facebook experiment was unethical because it didn’t provide informed consent, which is a requirement for all research studies involving experimental manipulation. I addressed this in the comments on my last post in response to other comments:

[I]t’s simply not correct to suggest that all human subjects research requires informed consent. At least in the US (where Facebook is based), the rules governing research explicitly provide for a waiver of informed consent. Directly from the HHS website:

An IRB may approve a consent procedure which does not include, or which alters, some or all of the elements of informed consent set forth in this section, or waive the requirements to obtain informed consent provided the IRB finds and documents that:

(1) The research involves no more than minimal risk to the subjects;

(2) The waiver or alteration will not adversely affect the rights and welfare of the subjects;

(3) The research could not practicably be carried out without the waiver or alteration; and

(4) Whenever appropriate, the subjects will be provided with additional pertinent information after participation.

Granting such waivers is a commonplace occurrence; I myself have had online studies granted waivers before for precisely these reasons. In this particular context, it’s very clear that conditions (1) and (2) are met (because this easily passes the “not different from ordinary experience” test). Further, Facebook can also clearly argue that (3) is met, because explicitly asking for informed consent is likely not viable given internal policy, and would in any case render the experimental manipulation highly suspect (because it would no longer be random). The only point one could conceivably raise questions about is (4), but here again I think there’s a very strong case to be made that Facebook is not about to start providing debriefing information to users every time it changes some aspect of the news feed in pursuit of research, considering that its users have already agreed to its User Agreement, which authorizes this and much more.

Now, if you disagree with the above analysis, that’s fine, but what should be clear enough is that there are many IRBs (and I’ve personally interacted with some of them) that would have authorized a waiver of consent in this particular case without blinking. So this is clearly well within “reasonable people can disagree” territory, rather than “oh my god, this is clearly illegal and unethical!” territory.

I can understand the objection that Facebook should have applied for IRB approval prior to conducting the experiment (though, as I note above, that’s only true if the experiment was initially conducted as research, which is not clear right now). However, it’s important to note that there is no guarantee that an IRB would have insisted on informed consent at all in this case. There’s considerable heterogeneity in different IRBs’ interpretation of the HHS guidelines (and in fact, even across different reviewers within the same IRB), and I don’t doubt that many IRBs would have allowed Facebook’s application to sail through without any problems (see, e.g., this comment on my last post)–though I think there’s a general consensus that a debriefing of some kind would almost certainly be requested.

In defense of Facebook

[UPDATE July 1st: I've now posted some additional thoughts in a second post here.]

It feels a bit strange to write this post’s title, because I don’t find myself defending Facebook very often. But there seems to be some discontent in the socialmediaverse at the moment over a new study in which Facebook data scientists conducted a large-scale–over half a million participants!–experimental manipulation on Facebook in order to show that emotional contagion occurs on social networks. The news that Facebook has been actively manipulating its users’ emotions has, apparently, enraged a lot of people.

The study

Before getting into the sources of that rage–and why I think it’s misplaced–though, it’s worth describing the study and its results. Here’s a description of the basic procedure, from the paper:

The experiment manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether exposure to emotional content led people to post content that was consistent with the exposure—thereby testing whether exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion. People who viewed Facebook in English were qualified for selection into the experiment. Two parallel experiments were conducted for positive and negative emotion: One in which exposure to friends’ positive emotional content in their News Feed was reduced, and one in which exposure to negative emotional content in their News Feed was reduced. In these conditions, when a person loaded their News Feed, posts that contained emotional content of the relevant emotional valence, each emotional post had between a 10% and 90% chance (based on their User ID) of being omitted from their News Feed for that specific viewing.

And here’s their central finding:

What the figure shows is that, in the experimental conditions, where negative or positive emotional posts are censored, users produce correspondingly more positive or negative emotional words in their own status updates. Reducing the number of negative emotional posts users saw led those users to produce more positive, and fewer negative words (relative to the unmodified control condition); conversely, reducing the number of presented positive posts led users to produce more negative and fewer positive words of their own.

Taken at face value, these results are interesting and informative. For the sake of contextualizing the concerns I discuss below, though, two points are worth noting. First, these effects, while highly statistically significant, are tiny. The largest effect size reported had a Cohen’s d of 0.02–meaning that eliminating a substantial proportion of emotional content from a user’s feed had the monumental effect of shifting that user’s own emotional word use by two hundredths of a standard deviation. In other words, the manipulation had a negligible real-world impact on users’ behavior. To put it in intuitive terms, the effect of condition in the Facebook study is roughly comparable to a hypothetical treatment that increased the average height of the male population in the United States by about one twentieth of an inch (given a standard deviation of ~2.8 inches). Theoretically interesting, perhaps, but not very meaningful in practice.

Second, the fact that users in the experimental conditions produced content with very slightly more positive or negative emotional content doesn’t mean that those users actually felt any differently. It’s entirely possible–and I would argue, even probable–that much of the effect was driven by changes in the expression of ideas or feelings that were already on users’ minds. For example, suppose I log onto Facebook intending to write a status update to the effect that I had an “awesome day today at the beach with my besties!” Now imagine that, as soon as I log in, I see in my news feed that an acquaintance’s father just passed away. I might very well think twice about posting my own message–not necessarily because the news has made me feel sad myself, but because it surely seems a bit unseemly to celebrate one’s own good fortune around people who are currently grieving. I would argue that such subtle behavioral changes, while certainly responsive to others’ emotions, shouldn’t really be considered genuine cases of emotional contagion. Yet given how small the effects were, one wouldn’t need very many such changes to occur in order to produce the observed results. So, at the very least, the jury should still be out on the extent to which Facebook users actually feel differently as a result of this manipulation.

The concerns

Setting aside the rather modest (though still interesting!) results, let’s turn to look at the criticism. Here’s what Katy Waldman, writing in a Slate piece titled “Facebook’s Unethical Experiment“, had to say:

The researchers, who are affiliated with Facebook, Cornell, and the University of California–San Francisco, tested whether reducing the number of positive messages people saw made those people less likely to post positive content themselves. The same went for negative messages: Would scrubbing posts with sad or angry words from someone’s Facebook feed make that person write fewer gloomy updates?

The upshot? Yes, verily, social networks can propagate positive and negative feelings!

The other upshot: Facebook intentionally made thousands upon thousands of people sad.

Or consider an article in the The Wire, quoting Jacob Silverman:

“What’s disturbing about how Facebook went about this, though, is that they essentially manipulated the sentiments of hundreds of thousands of users without asking permission (blame the terms of service agreements we all opt into). This research may tell us something about online behavior, but it’s undoubtedly more useful for, and more revealing of, Facebook’s own practices.”

On Twitter, the reaction to the study has been similarly negative). A lot of people appear to be very upset at the revelation that Facebook would actively manipulate its users’ news feeds in a way that could potentially influence their emotions.

Why the concerns are misplaced

To my mind, the concerns expressed in the Slate piece and elsewhere are misplaced, for several reasons. First, they largely mischaracterize the study’s experimental procedures–to the point that I suspect most of the critics haven’t actually bothered to read the paper. In particular, the suggestion that Facebook “manipulated users’ emotions” is quite misleading. Framing it that way tacitly implies that Facebook must have done something specifically designed to induce a different emotional experience in its users. In reality, for users assigned to the experimental condition, Facebook simply removed a variable proportion of status messages that were automatically detected as containing positive or negative emotional words. Let me repeat that: Facebook removed emotional messages for some users. It did not, as many people seem to be assuming, add content specifically intended to induce specific emotions. Now, given that a large amount of content on Facebook is already highly emotional in nature–think about all the people sharing their news of births, deaths, break-ups, etc.–it seems very hard to argue that Facebook would have been introducing new risks to its users even if it had presented some of them with more emotional content. But it’s certainly not credible to suggest that replacing 10% – 90% of emotional content with neutral content constitutes a potentially dangerous manipulation of people’s subjective experience.

Second, it’s not clear what the notion that Facebook users’ experience is being “manipulated” really even means, because the Facebook news feed is, and has always been, a completely contrived environment. I hope that people who are concerned about Facebook “manipulating” user experience in support of research realize that Facebook is constantly manipulating its users’ experience. In fact, by definition, every single change Facebook makes to the site alters the user experience, since there simply isn’t any experience to be had on Facebook that isn’t entirely constructed by Facebook. When you log onto Facebook, you’re not seeing a comprehensive list of everything your friends are doing, nor are you seeing a completely random subset of events. In the former case, you would be overwhelmed with information, and in the latter case, you’d get bored of Facebook very quickly. Instead, what you’re presented with is a carefully curated experience that is, from the outset, crafted in such a way as to create a more engaging experience (read: keeps you spending more time on the site, and coming back more often). The items you get to see are determined by a complex and ever-changing algorithm that you make only a partial contribution to (by indicating what you like, what you want hidden, etc.). It has always been this way, and it’s not clear that it could be any other way. So I don’t really understand what people mean when they sarcastically suggest–as Katy Waldman does in her Slate piece–that “Facebook reserves the right to seriously bum you out by cutting all that is positive and beautiful from your news feed”. Where does Waldman think all that positive and beautiful stuff comes from in the first place? Does she think it spontaneously grows wild in her news feed, free from the meddling and unnatural influence of Facebook engineers?

Third, if you were to construct a scale of possible motives for manipulating users’ behavior–with the global betterment of society at one end, and something really bad at the other end–I submit that conducting basic scientific research would almost certainly be much closer to the former end than would the other standard motives we find on the web–like trying to get people to click on more ads. The reality is that Facebook–and virtually every other large company with a major web presence–is constantly conducting large controlled experiments on user behavior. Data scientists and user experience researchers at Facebook, Twitter, Google, etc. routinely run dozens, hundreds, or thousands of experiments a day, all of which involve random assignment of users to different conditions. Typically, these manipulations aren’t conducted in order to test basic questions about emotional contagion; they’re conducted with the explicit goal of helping to increase revenue. In other words, if the idea that Facebook would actively try to manipulate your behavior bothers you, you should probably stop reading this right now and go close your account. You also should definitely not read this paper suggesting that a single social message on Facebook prior to the last US presidential election the may have single-handedly increased national voter turn-out by as much as 0.6%). Oh, and you should probably also stop using Google, YouTube, Yahoo, Twitter, Amazon, and pretty much every other major website–because I can assure you that, in every single case, there are people out there who get paid a good salary to… yes, manipulate your emotions and behavior! For better or worse, this is the world we live in. If you don’t like it, you can abandon the internet, or at the very least close all of your social media accounts. But the suggestion that Facebook is doing something unethical simply by publishing the results of one particular experiment among thousands–and in this case, an experiment featuring a completely innocuous design that, if anything, is probably less motivated by a profit motive than most of what Facebook does–seems kind of absurd.

Fourth, it’s worth keeping in mind that there’s nothing intrinsically evil about the idea that large corporations might be trying to manipulate your experience and behavior. Everybody you interact with–including every one of your friends, family, and colleagues–is constantly trying to manipulate your behavior in various ways. Your mother wants you to eat more broccoli; your friends want you to come get smashed with them at a bar; your boss wants you to stay at work longer and take fewer breaks. We are always trying to get other people to feel, think, and do certain things that they would not otherwise have felt, thought, or done. So the meaningful question is not whether people are trying to manipulate your experience and behavior, but whether they’re trying to manipulate you in a way that aligns with or contradicts your own best interests. The mere fact that Facebook, Google, and Amazon run experiments intended to alter your emotional experience in a revenue-increasing way is not necessarily a bad thing if in the process of making more money off you, those companies also improve your quality of life. I’m not taking a stand one way or the other, mind you, but simply pointing out that without controlled experimentation, the user experience on Facebook, Google, Twitter, etc. would probably be very, very different–and most likely less pleasant. So before we lament the perceived loss of all those “positive and beautiful” items in our Facebook news feeds, we should probably remind ourselves that Facebook’s ability to identify and display those items consistently is itself in no small part a product of its continual effort to experimentally test its offering by, yes, experimentally manipulating its users’ feelings and thoughts.

What makes the backlash on this issue particularly strange is that I’m pretty sure most people do actually realize that their experience on Facebook (and on other websites, and on TV, and in restaurants, and in museums, and pretty much everywhere else) is constantly being manipulated. I expect that most of the people who’ve been complaining about the Facebook study on Twitter are perfectly well aware that Facebook constantly alters its user experience–I mean, they even see it happen in a noticeable way once in a while, whenever Facebook introduces a new interface. Given that Facebook has over half a billion users, it’s a foregone conclusion that every tiny change Facebook makes to the news feed or any other part of its websites induces a change in millions of people’s emotions. Yet nobody seems to complain about this much–presumably because, when you put it this way, it seems kind of silly to suggest that a company whose business model is predicated on getting its users to use its product more would do anything other than try to manipulate its users into, you know, using its product more.

Why the backlash is deeply counterproductive

Now, none of this is meant to suggest that there aren’t legitimate concerns one could raise about Facebook’s more general behavior–or about the immense and growing social and political influence that social media companies like Facebook wield. One can certainly question whether it’s really fair to expect users signing up for a service like Facebook’s to read and understand user agreements containing dozens of pages of dense legalese, or whether it would make sense to introduce new regulations on companies like Facebook to ensure that they don’t acquire or exert undue influence on their users’ behavior (though personally I think that would be unenforceable and kind of silly). So I’m certainly not suggesting that we give Facebook, or any other large web company, a free pass to do as it pleases. What I am suggesting, however, is that even if your real concerns are, at bottom, about the broader social and political context Facebook operates in, using this particular study as a lightning rod for criticism of Facebook is an extremely counterproductive, and potentially very damaging, strategy.

Consider: by far the most likely outcome of the backlash Facebook is currently experiencing is that, in future, its leadership will be less likely to allow its data scientists to publish their findings in the scientific literature. Remember, Facebook is not a research institute expressly designed to further understanding of the human condition; it’s a publicly-traded corporation that exists to create wealth for its shareholders. Facebook doesn’t have to share any of its data or findings with the rest of the world if it doesn’t want to; it could comfortably hoard all of its knowledge and use it for its own ends, and no one else would ever be any wiser for it. The fact that Facebook is willing to allow its data science team to spend at least some of its time publishing basic scientific research that draws on Facebook’s unparalleled resources is something to be commended, not criticized.

There is little doubt that the present backlash will do absolutely nothing to deter Facebook from actually conducting controlled experiments on its users, because A/B testing is a central component of pretty much every major web company’s business strategy at this point–and frankly, Facebook would be crazy not to try to empirically determine how to improve user experience. What criticism of the Kramer et al article will almost certainly do is decrease the scientific community’s access to, and interaction with, one of the largest and richest sources of data on human behavior in existence. You can certainly take a dim view of Facebook as a company if you like, and you’re free to critique the way they do business to your heart’s content. But haranguing Facebook and other companies like it for publicly disclosing scientifically interesting results of experiments that it is already constantly conducting anyway–and that are directly responsible for many of the positive aspects of the user experience–is not likely to accomplish anything useful. If anything, it’ll only ensure that, going forward, all of Facebook’s societally relevant experimental research is done in the dark, where nobody outside the company can ever find out–or complain–about it.

[UPDATE July 1st: I've posted some additional thoughts in a second post here.]

aftermath of the NYT / Lindstrom debacle

Over the last few days the commotion over Martin Lindstrom’s terrible New York Times iPhone loving Op-Ed, which I wrote about in my last post, seems to have spread far and wide. Highlights include excellent posts by David Dobbs and the Neurocritic, but really there are too many to list at this point. And the verdict is overwhelmingly negative; I don’t think I’ve seen a single post in defense of Lindstrom, which is probably not a good sign (for him).

In the meantime, Russ Poldrack and over 40 other neuroscientists and psychologists (including me) wrote a letter to the NYT complaining about the Lindstrom Op-Ed, which the NYT has now published. As per usual, they edited down the letter till it almost disappeared. But the original, along with a list of signees, is on Russ’s blog.

Anyway, the fact that the Times published the rebuttal letter is all well and good, but as I mentioned in my last post, the bigger problem is that since the Times doesn’t include links to related content on their articles, people who stumble across the Op-Ed aren’t going to have any way of knowing that it’s been roundly discredited by pretty much the entire web. Lindstrom’s piece was the most emailed article on the Times website for a day or two, but only a tiny fraction of those readers will ever see (or even hear about) the critical response. As far as I know, the NYT hasn’t issued an explanation or apology for publishing the Op-Ed; they’ve simply published the letter and gone on about their business (I guess I can’t fault them for this–if they had to issue a formal apology for every mistake that gets published, they’d have no time for anything else; the trick is really to catch this type of screw-up at the front end). Adding links from each article to related content wouldn’t solve the problem entirely, of course, but it would be something. The fact that Times’ platform currently doesn’t have this capacity is kind of perplexing.

The other point worth mentioning is that, in the aftermath of the tsunami of criticism he received, Lindstrom left a comment on several blogs (Russ Poldrack and David Dobbs were lucky recipients; sadly, I wasn’t on the guest list). Here’s the full text of the comment:

My first foray into neuro-marketing research was for my New York Times bestseller Buyology: Truth and Lies about Why We Buy. For that book I teamed up with Neurosense, a leading independent neuro-marketing company that specializes in consumer research using functional magnetic resonance imaging (fMRI) headed by Oxford University trained Gemma Calvert, BSc DPhil CPsychol FRSA and Neuro-Insight, a market research company that uses unique brain-imaging technology, called Steady-State Topography (SST), to measure how the brain responds to communications which is lead by Dr. Richard Silberstein, PhD. This was the single largest neuro-marketing study ever conducted—25x larger than any such study to date and cost more than seven million dollars to run.

In the three-year effort scientists scanned the brains of over 2,000 people from all over the world as they were exposed to various marketing and advertising strategies including clever product placements, sneaky subliminal messages, iconic brand logos, shocking health and safety warnings, and provocative product packages. The purpose of all of this was to understand, quite successfully I may add, the key drivers behind why we make the purchasing decisions that we do.

For the research that my recent Op-Ed column in the New York Times was based on I turned to Dr. David Hubbard, a board-certified neurologist and his company MindSign Neuro Marketing, an independently owned fMRI neuro-marketing company. I asked Dr. Hubbard and his team a simple question, “Are we addicted to our iPhones?” After analyzing the brains of 8 men and 8 women between the ages of 18-25 using fMRI technology, MindSign answered my question using standardized answering methods and completely reproducible results. The conclusion was that we are not addicted to our iPhones, we are in love with them.

The thought provoking dialogue that has been generated from the article has been overwhelmingly positive and I look forward to the continued comments from professionals in the field, readers and fans.

Respectfully,

Martin Lindstrom

As evasive responses go, this is a masterpiece; at no point does Lindstrom ever actually address any of the substantive criticisms leveled at him. He spends most of his response name dropping (the list of credentials is almost long enough to make you forget that the rebuttal letter to his Op-Ed was signed by over 40 PhDs) and rambling about previous unrelated neuromarketing work (which may as well not exist, since none of it has ever been made public), and then closes by shifting the responsibility for the study to MindSign, the company he paid to run the iPhone study. The claim that MindSign “answered [his] question using standardized answering methods and completely reproducible results” is particularly ludicrous; as I explained in my last post, there currently aren’t any standardized methods for reading addiction or love off of brain images. And ‘completely reproducible results’ implies that one has, you know, successfully reproduced the results, which is simply false unless Lindstrom is suggesting that MindSign did the same experiment twice. It’s hard to see any “thought provoking dialogue” taking place here, and the neuroimaging community’s response to the Op-Ed column has been, virtually without exception, overwhelmingly negative, not positive (as Lindstrom claims).

That all said, I do think there’s one very positive aspect to this entire saga, and that’s the amazing speed and effectiveness of the response from scientists, science journalists, and other scientifically literate folks. Ten years ago, Lindstrom’s piece might have gone completely unchallenged–and even if someone like Russ Poldrack had written a response, it would probably have appeared much later, been signed by fewer scientists (because coordination would have been much more difficult), and received much less attention. But with 48 hours of Lindstrom’s Op-Ed being published, dozens of critical blog posts had appeared, and hundreds, if not thousands, of people all over the world had tweeted or posted links to these critiques (my last post alone received over 12,000 hits). Scientific discourse, which used to be confined largely to peer-reviewed print journals and annual conferences, now takes place at a remarkable pace online, and it’s fantastic to see social media used in this way. The hope is that as these technologies develop further and scientists take on a more active role in communicating with the public (something that platforms like Twitter and Google+ seem to be facilitating amazingly well), it’ll become increasingly difficult for people like Lindstrom to make crazy pseudoscientific claims without being immediately and visibly called out on it–even in those rare cases when the NYT makes the mistake of leaving one the biggest microphones on earth open and unmonitored.

the New York Times blows it big time on brain imaging

The New York Times has a terrible, terrible Op-Ed piece today by Martin Lindstrom (who I’m not going to link to, because I don’t want to throw any more bones his way). If you believe Lindstrom, you don’t just like your iPhone a lot; you love it. Literally. And the reason you love it, shockingly, is your brain:

Earlier this year, I carried out an fMRI experiment to find out whether iPhones were really, truly addictive, no less so than alcohol, cocaine, shopping or video games. In conjunction with the San Diego-based firm MindSign Neuromarketing, I enlisted eight men and eight women between the ages of 18 and 25. Our 16 subjects were exposed separately to audio and to video of a ringing and vibrating iPhone.

But most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion. The subjects’ brains responded to the sound of their phones as they would respond to the presence or proximity of a girlfriend, boyfriend or family member.

In short, the subjects didn’t demonstrate the classic brain-based signs of addiction. Instead, they loved their iPhones.

There’s so much wrong with just these three short paragraphs (to say nothing of the rest of the article, which features plenty of other whoppers) that it’s hard to know where to begin. But let’s try. Take first the central premise–that an fMRI experiment could help determine whether iPhones are no less addictive than alcohol or cocaine. The tacit assumption here is that all the behavioral evidence you could muster–say, from people’s reports about how they use their iPhones, or clinicians’ observations about how iPhones affect their users–isn’t sufficient to make that determination; to “really, truly” know if something’s addictive, you need to look at what the brain is doing when people think about their iPhones. This idea is absurd inasmuch as addiction is defined on the basis of its behavioral consequences, not (right now, anyway) by the presence or absence of some biomarker. What makes someone an alcoholic is the fact that they’re dependent on alcohol, have trouble going without it, find that their alcohol use interferes with multiple aspects of their day-to-day life, and generally suffer functional impairment because of it–not the fact that their brain lights up when they look at pictures of Johnny Walker red. If someone couldn’t stop drinking–to the point where they lost their job, family, and friends–but their brain failed to display a putative biomarker for addiction, it would be strange indeed to say “well, you show all the signs, but I guess you’re not really addicted to alcohol after all.”

Now, there may come a day (and it will be a great one) when we have biomarkers sufficiently accurate that they can stand in for the much more tedious process of diagnosing someone’s addiction the conventional way. But that day is, to put it gently, a long way off. Right now, if you want to know if iPhones are addictive, the best way to do that is to, well, spend some time observing and interviewing iPhone users (and some quantitative analysis would be helpful).

Of course, it’s not clear what Lindstrom thinks an appropriate biomarker for addiction would be in any case. Presumably it would have something to do with the reward system; but what? Suppose Lindstrom had seen robust activation in the ventral striatum–a critical component of the brain’s reward system–when participants gazed upon the iPhone: what then? Would this have implied people are addicted to iPhones? But people also show striatal activity when gazing on food, money, beautiful faces, and any number of other stimuli. Does that mean the average person is addicted to all of the above? A marker of pleasure or reward, maybe (though even that’s not certain), but addiction? How could a single fMRI experiment with 16 subjects viewing pictures of iPhones confirm or disconfirm the presence of addiction? Lindstrom doesn’t say. I suppose he has good reason not to say: if he really did have access to an accurate fMRI-based biomarker for addiction, he’d be in a position to make millions (billions?) off the technology. To date, no one else has come close to identifying a clinically accurate fMRI biomarker for any kind of addiction (for more technical readers, I’m talking here about cross-validated methods that have both sensitivity and specificity comparable to traditional approaches when applied to new subjects–not individual studies that claim 90% with-sample classification accuracy based on simple regression models). So we should, to put it mildly, be very skeptical that Lindstrom’s study was ever in a position to do what he says it was designed to do.

We should also ask all sorts of salient and important questions about who the people are who are supposedly in love with their iPhones. Who’s the “You” in the “You Love Your iPhone” of the title? We don’t know, because we don’t know who the participants in Lindstrom’s sample, were, aside from the fact that they were eight men and eight women aged 18 to 25. But we’d like to know some other important things. For instance, were they selected for specific characteristics? Were they, say, already avid iPhone users? Did they report loving, or being addicted to their iPhones? If so, would it surprise us that people chosen for their close attachment to their iPhones also showed brain activity patterns typical of close attachment? (Which, incidentally, they actually don’t–but more on that below.) And if not, are we to believe that the average person pulled off the street–who probably has limited experience with iPhones–really responds to the sound of their phones “as they would respond to the presence or proximity of a girlfriend, boyfriend or family member”? Is the takeaway message of Lindstrom’s Op-Ed that iPhones are actually people, as far as our brains are concerned?

In fairness, space in the Times is limited, so maybe it’s not fair to demand this level of detail in the Op-Ed iteslf. But the bigger problem is that we have no way of evaluating Lindstrom’s claims, period, because (as far as I can tell), his study hasn’t been published or peer-reviewed anywhere. Presumably, it’s proprietary information that belongs to the neuromarketing firm in question. Which is to say, the NYT is basically giving Lindstrom license to talk freely about scientific-sounding findings that can’t actually be independently confirmed, disputed, or critiqued by members of the scientific community with expertise in the very methods Lindstrom is applying (expertise which, one might add, he himself lacks). For all we know, he could have made everything up. To be clear, I don’t really think he did make everything up–but surely, somewhere in the editorial process someone at the NYT should have stepped in and said, “hey, these are pretty strong scientific claims; is there any way we can make your results–on which your whole article hangs–available for other experts to examine?”

This brings us to what might be the biggest whopper of all, and the real driver of the article title: the claim that “most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion“. Russ Poldrack already tore this statement to shreds earlier this morning:

Insular cortex may well be associated with feelings of love and compassion, but this hardly proves that we are in love with our iPhones.  In Tal Yarkoni’s recent paper in Nature Methods, we found that the anterior insula was one of the most highly activated part of the brain, showing activation in nearly 1/3 of all imaging studies!  Further, the well-known studies of love by Helen Fisher and colleagues don’t even show activation in the insula related to love, but instead in classic reward system areas.  So far as I can tell, this particular reverse inference was simply fabricated from whole cloth.  I would have hoped that the NY Times would have learned its lesson from the last episode.

But you don’t have to take Russ’s word for it; if you surf for a few terms on our Neurosynth website, making sure to select “forward inference” under image type, you’ll notice that the insula shows up for almost everything. That’s not an accident; it’s because the insula (or at least the anterior part of the insula) plays a very broad role in goal-directed cognition. It really is activated when you’re doing almost anything that involves, say, following instructions an experimenter gave you, or attending to external stimuli, or mulling over something salient in the environment. You can see this pretty clearly in this modified figure from our Nature Methods paper (I’ve circled the right insula):

Proportion of studies reporting activation at each voxel

The insula is one of a few ‘hotspots’ where activation is reported very frequently in neuroimaging articles (the other major one being the dorsal medial frontal cortex). So, by definition, there can’t be all that much specificity to what the insula is doing, since it pops up so often. To put it differently, as Russ and others have repeatedly pointed out, the fact that a given region activates when people are in a particular psychological state (e.g., love) doesn’t give you license to conclude that that state is present just because you see activity in the region in question. If language, working memory, physical pain, anger, visual perception, motor sequencing, and memory retrieval all activate the insula, then knowing that the insula is active is of very little diagnostic value. That’s not to say that some psychological states might not be more strongly associated with insula activity (again, you can see this on Neurosynth if you switch the image type to ‘reverse inference’ and browse around); it’s just that, probabilistically speaking, the mere fact that the insula is active gives you very little basis for saying anything concrete about what people are experiencing.

In fact, to account for Lindstrom’s findings, you don’t have to appeal to love or addiction at all. There’s a much simpler way to explain why seeing or hearing an iPhone might elicit insula activation. For most people, the onset of visual or auditory stimulation is a salient event that causes redirection of attention to the stimulated channel. I’d be pretty surprised, actually, if you could present any picture or sound to participants in an fMRI scanner and not elicit robust insula activity. Orienting and sustaining attention to salient things seems to be a big part of what the anterior insula is doing (whether or not that’s ultimately its ‘core’ function). So the most appropriate conclusion to draw from the fact that viewing iPhone pictures produces increased insula activity is something vague like “people are paying more attention to iPhones”, or “iPhones are particularly salient and interesting objects to humans living in 2011.” Not something like “no, really, you love your iPhone!”

In sum, the NYT screwed up. Lindstrom appears to have a habit of making overblown claims about neuroimaging evidence, so it’s not surprising he would write this type of piece; but the NYT editorial staff is supposedly there to filter out precisely this kind of pseudoscientific advertorial. And they screwed up. It’s a particularly big screw-up given that (a) as of right now, Lindstrom’s Op-Ed is the single most emailed article on the NYT site, and (b) this incident almost perfectly recapitulates another NYT article 4 years ago in which some neuroscientists and neuromarketers wrote a grossly overblown Op-Ed claiming to be able to infer, in detail, people’s opinions about presidential candidates. That time, Russ Poldrack and a bunch of other big names in cognitive neuroscience wrote a concise rebuttal that appeared in the NYT (but unfortunately, isn’t linked to from the original Op-Ed, so anyone who stumbles across the original now has no way of knowing how ridiculous it is). One hopes the NYT follows up in similar fashion this time around. They certainly owe it to their readers–some of whom, if you believe Lindstrom, are now in danger of dumping their current partners for their iPhones.

h/t: Molly Crockett

the ‘decline effect’ doesn’t work that way

Over the last four or five years, there’s been a growing awareness in the scientific community that science is an imperfect process. Not that everyone used to think science was a crystal ball with a direct line to the universe or anything, but there does seem to be a growing recognition that scientists are human beings with human flaws, and are susceptible to common biases that can make it more difficult to fully trust any single finding reported in the literature. For instance, scientists like interesting results more than boring results; we’d rather keep our jobs than lose them; and we have a tendency to see what we want to see, even when it’s only sort-of-kind-of there, and sometimes not there at all. All of these things contrive to produce systematic biases in the kinds of findings that get reported.

The single biggest contributor to the zeitgeist shift nudge is undoubtedly John Ioannidis (recently profiled in an excellent Atlantic article), whose work I can’t say enough good things about (though I’ve tried). But lots of other people have had a hand in popularizing the same or similar ideas–many of which actually go back several decades. I’ve written a bit about these issues myself in a number of papers (1, 2, 3) and blog posts (1, 2, 3, 4, 5), so I’m partial to such concerns. Still, important as the role of the various selection and publication biases is in charting the course of science, virtually all of the discussions of these issues have had a relatively limited audience. Even Ioannidis’ work, influential as it’s been, has probably been read by no more than a few thousand scientists.

Last week, the debate hit the mainstream when the New Yorker (circulation: ~ 1 million) published an article by Jonah Lehrer suggesting–or at least strongly raising the possibility–that something might be wrong with the scientific method. The full article is behind a paywall, but I can helpfully tell you that some people seem to have un-paywalled it against the New Yorker’s wishes, so if you search for it online, you will find it.

The crux of Lehrer’s argument is that many, and perhaps most, scientific findings fall prey to something called the “decline effect”: initial positive reports of relatively large effects are subsequently followed by gradually decreasing effect sizes, in some cases culminating in a complete absence of an effect in the largest, most recent studies. Lehrer gives a number of colorful anecdotes illustrating this process, and ends on a decidedly skeptical (and frankly, terribly misleading) note:

The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.

While Lehrer’s article received pretty positive reviews from many non-scientist bloggers (many of whom, dismayingly, seemed to think the take-home message was that since scientists always change their minds, we shouldn’t trust anything they say), science bloggers were generally not very happy with it. Within days, angry mobs of Scientopians and Nature Networkers started murdering unicorns; by the end of the week, the New Yorker offices were reduced to rubble, and the scientists and statisticians who’d given Lehrer quotes were all rumored to be in hiding.

Okay, none of that happened. I’m just trying to keep things interesting. Anyway, because I’ve been characteristically lazy slow on the uptake, by the time I got around to writing this post you’re now reading, about eighty hundred and sixty thousand bloggers had already weighed in on Lehrer’s article. That’s good, because it means I can just direct you to other people’s blogs instead of having to do any thinking myself. So here you go: good posts by Games With Words (whose post tipped me off to the article), Jerry Coyne, Steven Novella, Charlie Petit, and Andrew Gelman, among many others.

Since I’ve blogged about these issues before, and agree with most of what’s been said elsewhere, I’ll only make one point about the article. Which is that about half of the examples Lehrer talks about don’t actually seem to me to qualify as instances of the decline effect–at least as Lehrer defines it. The best example of this comes when Lehrer discusses Jonathan Schooler’s attempt to demonstrate the existence of the decline effect by running a series of ESP experiments:

In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhirie’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—-the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”–a standard statistical measure–“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhinos,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”

This is a pretty bad way to describe what’s going on, because it makes it sound like it’s a general principle of data collection that effects systematically get smaller. It isn’t. The variance around the point estimate of effect size certainly gets smaller as samples get larger, but the likelihood of an effect increasing is just as high as the likelihood of it decreasing. The absolutely critical point Lehrer left out is that you would only get the decline effect to show up if you intervened in the data collection or reporting process based on the results you were getting. Instead, most of Lehrer’s article presents the decline effect as if it’s some sort of mystery, rather than the well-understood process that it is. It’s as though Lehrer believes that scientific data has the magical property of telling you less about the world the more of it you have. Which isn’t true, of course; the problem isn’t that science is malfunctioning, it’s that scientists are still (kind of!) human, and are susceptible to typical human biases. The unfortunate net effect is that Lehrer’s article, while tremendously entertaining, achieves exactly the opposite of what good science journalism should do: it sows confusion about the scientific process and makes it easier for people to dismiss the results of good scientific work, instead of helping people develop a critical appreciation for the amazing power science has to tell us about the world.

what the arsenic effect means for scientific publishing

I don’t know very much about DNA (and by ‘not very much’ I sadly mean ‘next to nothing’), so when someone tells me that life as we know it generally doesn’t use arsenic to make DNA, and that it’s a big deal to find a bacterium that does, I’m willing to believe them. So too, apparently, are at least two or three reviewers for Science, which published a paper last week by a NASA group purporting to demonstrate exactly that.

Turns out the paper might have a few holes. In the last few days, the blogosphere has reached fever delirium pitch as critiques of the article have emerged from every corner; it seems like pretty much everyone with some knowledge of the science in question is unhappy about the paper. Since I’m not in any position to critique the article myself, I’ll take Carl Zimmer’s word for it in Slate yesterday:

Was this merely a case of a few isolated cranks? To find out, I reached out to a dozen experts on Monday. Almost unanimously, they think the NASA scientists have failed to make their case.  “It would be really cool if such a bug existed,” said San Diego State University’s Forest Rohwer, a microbiologist who looks for new species of bacteria and viruses in coral reefs. But, he added, “none of the arguments are very convincing on their own.” That was about as positive as the critics could get. “This paper should not have been published,” said Shelley Copley of the University of Colorado.

Zimmer then follows his Slate piece up with a blog post today in which he provides 13 experts’ unadulterated comments. While there are one or two (somewhat) positive reviews, the consensus clearly seems to be that the Science paper is (very) bad science.

Of course, scientists (yes, even Science reviewers) do occasionally make mistakes, so if we’re being charitable about it, we might chalk it up to human error (though some of the critiques suggest that these are elementary problems that could have been very easily addressed, so it’s possible there’s some disingenuousness involved). But what many bloggers (1, 2, 3, etc.) have found particularly inexcusable is the way NASA and the research team have handled the criticism. Zimmer again, in Slate:

I asked two of the authors of the study if they wanted to respond to the criticism of their paper. Both politely declined by email.

“We cannot indiscriminately wade into a media forum for debate at this time,” declared senior author Ronald Oremland of the U.S. Geological Survey. “If we are wrong, then other scientists should be motivated to reproduce our findings. If we are right (and I am strongly convinced that we are) our competitors will agree and help to advance our understanding of this phenomenon. I am eager for them to do so.”

“Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated,” wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. “The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner.”

A NASA spokesperson basically reiterated this point of view, indicating that NASA scientists weren’t going to respond to criticism of their work unless that criticism appeared in, you know, a respectable, peer-reviewed outlet. (Fortunately, at least one of the critics already has a draft letter to Science up on her blog.)

I don’t think it’s surprising that people who spend much of their free time blogging about science, and think it’s important to discuss scientific issues in a public venue, generally aren’t going to like being told that science blogging isn’t a legitimate form of scientific discourse. Especially considering that the critics here aren’t laypeople without scientific training; they’re well-respected scientists with areas of expertise that are directly relevant to the paper. In this case, dismissing trenchant criticism because it’s on the web rather than in a peer-reviewed journal seems kind of like telling someone who’s screaming at you that your house is on fire that you’re not going to listen to them until they adopt a more polite tone. It just seems counterproductive.

That said, I personally don’t think we should take the NASA team’s statements at face value. I very much doubt that what the NASA researchers are saying really reflect any deep philosophical view about the role of blogs in scientific discourse; it’s much more likely that they’re simply trying to buy some time while they figure out how to respond. On the face of it, they have a choice between two lousy options: either ignore the criticism entirely, which would be antithetical to the scientific process and would look very bad, or address it head-on–which, judging by the vociferousness and near-unanimity of the commentators, is probably going to be a losing battle. Shifting the terms of the debate by insisting on responding only in a peer-reviewed venue doesn’t really change anything, but it does buy the authors two or three weeks. And two or three weeks is worth like, forty attentional cycles in the blogosphere.

Mind you, I’m not saying we should sympathize with the NASA researchers just because they’re in a tough position. I think one of the main reasons the story’s attracted so much attention is precisely because people see it as a case of justice being served. The NASA team called a major press conference ahead of the paper’s publication, published its results in one of the world’s most prestigious science journals, and yet apparently failed to run relatively basic experimental controls in support of its conclusions. If the critics are to be believed, the NASA researchers are either disingenuous or incompetent; either way, we shouldn’t feel sorry for them.

What I do think this episode shows is that the rules of scientific publishing have fundamentally changed in the last few years–and largely for the better. I haven’t been doing science for very long, but even in the halcyon days of 2003, when I started graduate school, science blogging was practically nonexistent, and the main way you’d find out what other people thought about an influential new paper was by talking to people you knew at conferences (which could take several months) or waiting for critiques or replication failures to emerge in other peer-reviewed journals (which could take years). That kind of delay between publication and evaluation is disastrous for science, because in the time it takes for a consensus to emerge that a paper is no good, several research teams might have already started trying to replicate and extend the reported findings, and several dozen other researchers might have uncritically cited their paper peripherally in their own work. This delay is probably why, as John Ioannidis’ work so elegantly demonstrates, major studies published in high-impact journals tend to exert a disproportionate influence on the literature long after they’ve been resoundingly discredited.

The Arsenic Effect, if we can call it that, provides a nice illustration of the impact of new media on scientific communication. It’s a safe bet that there are now very few people who do anything even vaguely related to the NASA team’s research who haven’t been made aware that the reported findings are controversial. Which means that the process of attempting to replicate (or falsify) the findings will proceed much more quickly than it might have ten or twenty years ago, and there probably won’t be very many people who cite the Science paper as compelling evidence of terrestrial arsenic-based life. Perhaps more importantly, as researchers get used to the idea that their high-profile work is going to be instantly evaluated by thousands of pairs of highly trained eyes, any of which might be attached to a highly prolific pair of typing hands, there will be an increasingly strong disincentive to avoid being careless. That isn’t to say that bad science will disappear, of course; just that, in cases where the badness reflects a pressure to tell a good story at all costs, we’ll probably see less of it.

trouble with biomarkers and press releases

The latest issue of the Journal of Neuroscience contains an interesting article by Ecker et al in which the authors attempted to classify people with autism spectrum disorder (ASD) and health controls based on their brain anatomy, and report achieving “a sensitivity and specificity of up to 90% and 80%, respectively.” Before unpacking what that means, and why you probably shouldn’t get too excited (about the clinical implications, at any rate; the science is pretty cool), here’s a snippet from the decidedly optimistic press release that accompanied the study:

“Scientists funded by the Medical Research Council (MRC) have developed a pioneering new method of diagnosing autism in adults. For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy. The method could lead to the screening for autism spectrum disorders in children in the future.”

If you think this sounds too good to be true, that’s because it is. Carl Heneghan explains why in an excellent article in the Guardian:

How the brain scans results are portrayed is one of the simplest mistakes in interpreting diagnostic test accuracy to make. What has happened is, the sensitivity has been taken to be the positive predictive value, which is what you want to know: if I have a positive test do I have the disease? Not, if I have the disease, do I have a positive test? It would help if the results included a measure called the likelihood ratio (LR), which is the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder. In this case the LR is 4.5. We’ve put up an article if you want to know more on how to calculate the LR.

In the general population the prevalence of autism is 1 in 100; the actual chances of having the disease are 4.5 times more likely given a positive test. This gives a positive predictive value of 4.5%; about 5 in every 100 with a positive test would have autism.

For those still feeling confused and not convinced, let’s think of 10,000 children. Of these 100 (1%) will have autism, 90 of these 100 would have a positive test, 10 are missed as they have a negative test: there’s the 90% reported accuracy by the media.

But what about the 9,900 who don’t have the disease? 7,920 of these will test negative (the specificity3 in the Ecker paper is 80%). But, the real worry though, is the numbers without the disease who test positive. This will be substantial: 1,980 of the 9,900 without the disease. This is what happens at very low prevalences, the numbers falsely misdiagnosed rockets. Alarmingly, of the 2,070 with a positive test, only 90 will have the disease, which is roughly 4.5%.

In other words, if you screened everyone in the population for autism, and assume the best about the classifier reported in the JNeuro article (e.g., that the sample of 20 ASD participants they used is perfectly representative of the broader ASD population, which seems unlikely), only about 1 in 20 people who receive a positive diagnosis would actually deserve one.

Ecker et al object to this characterization, and reply to Heneghan in the comments (through the MRC PR office):

Our test was never designed to screen the entire population of the UK. This is simply not practical in terms of costs and effort, and besides totally  unjustified- why would we screen everybody in the UK for autism if there is no evidence whatsoever that an individual is affected?. The same case applies to other diagnostic tests. Not every single individual in the UK is tested for HIV. Clearly this would be too costly and unnecessary. However, in the group of individuals that are test for the virus, we can be very confident that if the test is positive that means a patient is infected. The same goes for our approach.

Essentially, the argument is that, since people would presumably be sent for an MRI scan because they were already under consideration for an ASD diagnosis, and not at random, the false positive rate would in fact be much lower than 95%, and closer to the 20% reported in the article.

One response to this reply–which is in fact Heneghan’s response in the comments–is to point out that the pre-test probability of ASD would need to be pretty high already in order for the classifier to add much. For instance, even if fully 30% of people who were sent for a scan actually had ASD, the posterior probability of ASD given a positive result would still be only 66% (Heneghan’s numbers, which I haven’t checked). Heneghan nicely contrasts these results with the standard for HIV testing, which “reports sensitivity of 99.7% and specificity of 98.5% for enzyme immunoassay.” Clearly, we have a long way to go before doctors can order MRI-based tests for ASD and feel reasonably confident that a positive result is sufficient grounds for an ASD diagnosis.

Setting Heneghan’s concerns about base rates aside, there’s a more general issue that he doesn’t touch on. It’s one that’s not specific to this particular study, and applies to nearly all studies that attempt to develop “biomarkers” for existing disorders. The problem is that the sensitivity and specificity values that people report for their new diagnostic procedure in these types of studies generally aren’t the true parameters of the procedure. Rather, they’re the sensitivity and specificity under the assumption that the diagnostic procedures used to classify patients and controls in the first place are themselves correct. In other words, in order to believe the results, you have to assume that the researchers correctly classified the subjects into patient and control groups using other procedures. In cases where the gold standard test used to make the initial classification is known to have near 100% sensitivity and specificity (e.g., for the aforementioned HIV tests), one can reasonably ignore this concern. But when we’re talking about mental health disorders, where diagnoses are fuzzy and borderline cases abound, it’s very likely that the “gold standard” isn’t really all that great to begin with.

Concretely,  studies that attempt to develop biomarkers for mental health disorders face two substantial problems. One is that it’s extremely unlikely that the clinical diagnoses are ever perfect; after all, if they were perfect, there’d be little point in trying to develop other diagnostic procedures! In this particular case, the authors selected subjects into the ASD group based on standard clinical instruments and structured interviews. I don’t know that there are many clinicians who’d claim with a straight face that the current diagnostic criteria for ASD (and there are multiple sets to choose from!) are perfect. From my limited knowledge, the criteria for ASD seem to be even more controversial than those for most other mental health disorders (which is saying something, if you’ve been following the ongoing DSM-V saga). So really, the accuracy of the classifier in the present study, even if you put the best face on it and ignore the base rate issue Heneghan brings up, is undoubtedly south of the 90% sensitivity / 80% specificity the authors report. How much south, we just don’t know, because we don’t really have any independent, objective way to determine who “really” should get an ASD diagnosis and who shouldn’t (assuming you think it makes sense to make that kind of dichotomous distinction at all). But 90% accuracy is probably a pipe dream, if for no other reason than it’s hard to imagine that level of consensus about autism spectrum diagnoses.

The second problem is that, because the researchers are using the MRI-based classifier to predict the clinician-based diagnosis, it simply isn’t possible for the former to exceed the accuracy of the latter. That bears repeating, because it’s important: no matter how good the MRI-based classifier is, it can only be as good as the procedures used to make the original diagnosis, and no better. It cannot, by definition, make diagnoses that are any more accurate than the clinicians who screened the participants in the authors’ ASD sample. So when you see the press release say this:

For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy.

You should really read it as this:

The method relies on structural (MRI) brain scans and has an accuracy rate approaching that of conventional clinical diagnosis.

That’s not quite as exciting, obviously, but it’s more accurate.

To be fair, there’s something of a catch-22 here, in that the authors didn’t really have a choice about whether or not to diagnose the ASD group using conventional criteria. If they hadn’t, reviewers and other researchers would have complained that we can’t tell if the ASD group is really an ASD group, because they authors used non-standard criteria. Under the circumstances, they did the only thing they could do. But that doesn’t change the fact that it’s misleading to intimate, as the press release does, that the new procedure might be any better than the old ones. It can’t be, by definition.

Ultimately, if we want to develop brain-based diagnostic tools that are more accurate than conventional clinical diagnoses, we’re going to need to show that these tools are capable of predicting meaningful outcomes that clinician diagnoses can’t. This isn’t an impossible task, but it’s a very difficult one. One approach you could take, for instance, would be to compare the ability of clinician diagnosis and MRI-based diagnosis to predict functional outcomes among subjects at a later point in time. If you could show that MRI-based classification of subjects at an early age was a stronger predictor of receiving an ASD diagnosis later in life than conventional criteria, that would make a really strong case for using the former approach in the real world. Short of that type of demonstration though, the only reason I can imagine wanting to use a procedure that was developed by trying to duplicate the results of an existing procedure is in the event that the new procedure is substantially cheaper or more efficient than the old one. Meaning, it would be reasonable enough to say “well, look, we don’t do quite as well with this approach as we do with a full clinical evaluation, but at least this new approach costs much less.” Unfortunately, that’s not really true in this case, since the price of even a short MRI scan is generally going to outweigh that of a comprehensive evaluation by a psychiatrist or psychotherapist. And while it could theoretically be much faster to get an MRI scan than an appointment with a mental health professional, I suspect that that’s not generally going to be true in practice either.

Having said all that, I hasten to note that all this is really a critique of the MRC press release and subsequently lousy science reporting, and not of the science itself. I actually think the science itself is very cool (but the Neuroskeptic just wrote a great rundown of the methods and results, so there’s not much point in me describing them here). People have been doing really interesting work with pattern-based classifiers for several years now in the neuroimaging literature, but relatively few studies have applied this kind of technique to try and discriminate between different groups of individuals in a clinical setting. While I’m not really optimistic that the technique the authors introduce in this paper is going to change the way diagnosis happens any time soon (or at least, I’d argue that it shouldn’t), there’s no question that the general approach will be an important piece of future efforts to improve clinical diagnoses by integrating biological data with existing approaches. But that’s not going to happen overnight, and in the meantime, I think it’s pretty irresponsible of the MRC to be issuing press releases claiming that its researchers can diagnose autism in adults with 90% accuracy.

ResearchBlogging.orgEcker C, Marquand A, Mourão-Miranda J, Johnston P, Daly EM, Brammer MJ, Maltezos S, Murphy CM, Robertson D, Williams SC, & Murphy DG (2010). Describing the brain in autism in five dimensions–magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30 (32), 10612-23 PMID: 20702694

a possible link between pesticides and ADHD

A forthcoming article in the journal Pediatrics that’s been getting a lot of press attention suggests that exposure to common pesticides may be associated with a substantially elevated risk of ADHD. More precisely, what the study found was that elevated urinary concentrations of organophosphate metabolites were associated with an increased likelihood of meeting criteria for an ADHD diagnosis. One of the nice things about this study is that the authors used archival data from the (very large) National Health and Nutrition Examination Survey (NHANES), so they were able to control for a relatively broad range of potential confounds (e.g., gender, age, SES, etc.). The primary finding is, of course, still based on observational data, so you wouldn’t necessarily want to conclude that exposure to pesticides causes ADHD. But it’s a finding that converges with previous work in animal models demonstrating that high exposure to organophosphate pesticides causes neurodevelopmental changes, so it’s by no means a crazy hypothesis.

I think it’s really pleasantly surprising to see how responsibly the popular press has covered this story (e.g., this, this, and this). Despite the obvious potential for alarmism, very few articles have led with a headline implying a causal link between pesticides and ADHD. They all say things like “associated with”, “tied to”, or “linked to”, which is exactly right. And many even explicitly mention the size of the effect in question–namely, approximately a 50% increase in risk of ADHD per 10-fold increase in concentration of pesticide metabolites. Given that most of the articles contain cautionary quotes from the study’s authors, I’m guessing the authors really emphasized the study’s limitations when dealing with the press, which is great. In any case, because the basic details of the study have already been amply described elsewhere (I thought this short CBS article was particularly good), I’ll just mention a few random thoughts here:

  • Often, epidemiological studies suffer from a gaping flaw in the sense that the more interesting causal story (and the one that prompts media attention) is far less plausible than other potential explanations (a nice example of this is the recent work on the social contagion of everything from obesity to loneliness). That doesn’t seem to be the case here. Obviously, there are plenty of other reasons you might get a correlation between pesticide metabolites and ADHD risk–for instance, ADHD is substantially heritable, so it could be that parents with a disposition to ADHD also have systematically different dietary habits (i.e., parental dispositions are a common cause of both urinary metabolites and ADHD status in children). But given the aforementioned experimental evidence, it’s not obvious that alternative explanations for the correlation are much more plausible than the causal story linking pesticide exposure to ADHD, so in that sense this is potentially a very important finding.
  • The use of a dichotomous dependent variable (i.e., children either meet criteria for ADHD or don’t; there are no shades of ADHD gray here) is a real problem in this kind of study, because it can make the resulting effects seem deceptively large. The intuitive way we think about the members of a category is to think in terms of prototypes, so that when you think about “ADHD” and “Not-ADHD” categories, you’re probably mentally representing an extremely hyperactive, inattentive child for the former, and a quiet, conscientious kid for the latter. If that’s your mental model, and someone comes along and tells you that pesticide exposure increases the risk of ADHD by 50%, you’re understandably going to freak out, because it’ll seem quite natural to interpret that as a statement that pesticides have a 50% chance of turning average kids into hyperactive ones. But that’s not the right way to think about it. In all likelihood, pesticides aren’t causing a small proportion of kids to go from perfectly average to completely hyperactive; instead, what’s probably happening is that the entire distribution is shifting over slightly. In other words, most kids who are exposed to pesticides (if we assume for the sake of argument that there really is a causal link) are becoming slightly more hyperactive and/or inattentive.
  • Put differently, what happens when you have a strict cut-off for diagnosis is that even small increases in underlying symptoms can result in a qualitative shift in category membership. If ADHD symptoms were measured on a continuous scale (which they actually probably were, before being dichotomized to make things simple and more consistent with previous work), these findings might have been reported as something like “a 10-fold increase in pesticide exposures is associated with a 2-point increase on a 30-point symptom scale,” which would have made it much clearer that, at worst, pesticides are only one of many other contributing factors to ADHD, and almost certainly not nearly as big a factor as some others. That’s not to say we shouldn’t be concerned if subsequent work supports a causal link, but just that we should retain perspective on what’s involved. No one’s suggesting that you’re going to feed your child an unwashed pear or two and end up with a prescription for Ritalin; the more accurate view would be that you might have a minority of kids who are already at risk for ADHD, and this would be just one more precipitating factor.
  • It’s also worth keeping in mind that the relatively large increase in ADHD risk is associated with a ten-fold increase in pesticide metabolites. As the authors note, that corresponds to the difference between the 25th and 75th percentiles in the sample. Although we don’t know exactly what that means in terms of real-world exposure to pesticides (because the authors didn’t have any data on grocery shopping or eating habits), it’s almost certainly a very sizable difference (I won’t get into the reasons why, except to note that the rank-order of pesticide metabolites must be relatively stable among children, or else there wouldn’t be any association with a temporally-extended phenotype like ADHD). So the point is, it’s probably not so easy to go from the 25th to the 75th percentile just by eating a few more fruits and vegetables here and there. So while it’s certainly advisable to try and eat better, and potentially to buy organic produce (if you can afford it), you shouldn’t assume that you can halve your child’s risk of ADHD simply by changing his or her diet slightly. These are, at the end of the day, small effects.
  • The authors report that fully 12% of children in this nationally representative sample met criteria for ADHD (mostly of the inattentive subtype). This, frankly, says a lot more about how silly the diagnostic criteria for ADHD are than about the state of the nation’s children. It’s frankly not plausible to suppose that 1 in 8 children really suffer from what is, in theory at least, a severe, potentially disabling disorder. I’m not trying to trivialize ADHD or argue that there’s no such thing, but simply to point out the dangers of medicalization. Once you’ve reached the point where 1 in every 8 people meet criteria for a serious disorder, the label is in danger of losing all meaning.

ResearchBlogging.orgBouchard, M., Bellinger, D., Wright, R., & Weisskopf, M. (2010). Attention-Deficit/Hyperactivity Disorder and Urinary Metabolites of Organophosphate Pesticides PEDIATRICS DOI: 10.1542/peds.2009-3058

the male brain hurts, or how not to write about science

My wife asked me to blog about this article on CNN because, she said, “it’s really terrible, and it shouldn’t be on CNN”. I usually do what my wife tells me to do, so I’m blogging about it. It’s by Louann Brizendine, M.D., author of the absolutely awful controversial book The Female Brain, and now, its manly counterpart, The Male Brain. From what I can gather, the CNN article, which is titled Love, Sex, and the Male Brain, is a precis of Brizendine’s new book (though I have no intention of reading the book to make sure). The article is pretty short, so I’ll go through the first half of it paragraph-by-paragraph. But I’ll warn you right now that it isn’t pretty, and will likely anger anyone with even a modicum of training in psychology or neuroscience.

Although women the world over have been doing it for centuries, we can’t really blame a guy for being a guy. And this is especially true now that we know that the male and female brains have some profound differences.

Our brains are mostly alike. We are the same species, after all. But the differences can sometimes make it seem like we are worlds apart.

So far, nothing terribly wrong here, just standard pop psychology platitudes. But it goes quickly downhill.

The “defend your turf” area — dorsal premammillary nucleus — is larger in the male brain and contains special circuits to detect territorial challenges by other males. And his amygdala, the alarm system for threats, fear and danger is also larger in men. These brain differences make men more alert than women to potential turf threats.

As Vaughan notes over at Mind Hacks, the dorsal premammillary nucleus (PMD) hasn’t been identified in humans, so it’s unclear exactly what chunk of tissue Brizendine’s referring to–let alone where the evidence that there are gender differences in humans might come from. The claim that the PMD is a “defend your turf” area might be plausible, if oh, I don’t know, you happen to think that the way rats behave under narrowly circumscribed laboratory conditions when confronted by an aggressor is a good guide to normal interactions between human males. (Then again, given that PMD lesions impair rats from running away when exposed to a cat, Brizendine could just as easily have concluded that the dorsal premammillary nucleus is the “fleeing” part of the brain.)

The amygdala claim is marginally less ridiculous: it’s not entirely clear that the amygdala is “the alarm system for threats, fear and danger”, but at least that’s a claim you can make with a straight face, since it’s one fairly common view among neuroscientists. What’s not really defensible is the claim that larger amygdalae “make men more alert than women to potential turf threats”, because (a) there’s limited evidence that the male amygdala really is larger than the female amygdala and (b) if such a difference exists, it’s very small, and (c) it’s not clear in any case how you go from a small between-group difference to the notion that somehow the amygdala is the reason why men maintain little interpersonal fiefdoms and women don’t.

Meanwhile, the “I feel what you feel” part of the brain — mirror-neuron system — is larger and more active in the female brain. So women can naturally get in sync with others’ emotions by reading facial expressions, interpreting tone of voice and other nonverbal emotional cues.

This falls under the rubric of “not even wrong“. The mirror neuron system isn’t a single “part of the brain”; current evidence suggests that neurons that show mirroring properties are widely distributed throughout multiple frontoparietal regions. So I don’t really know what brain region Brizendine is referring to (the fact that she never cites any empirical studies in support of her claims is something of an inconvenience in that respect). And even if I did know, it’s a safe bet it wouldn’t be the “I feel what you feel” brain region, because, as far as I know, no such thing exists. The central claim regarding mirror neurons isn’t that they support empathy per se, but that they support a much more basic type of representation–namely, abstract conceptual (as opposed to sensory/motor) representation of actions. And even that much weaker notion is controversial; for example, Greg Hickok has a couple of recent posts (and a widely circulated paper) arguing against it. No one, as far as I know, has provided any kind of serious evidence linking the mirror neuron system to females’ (modestly) superior nonverbal decoding ability.

Perhaps the biggest difference between the male and female brain is that men have a sexual pursuit area that is 2.5 times larger than the one in the female brain. Not only that, but beginning in their teens, they produce 200 to 250 percent more testosterone than they did during pre-adolescence.

Maybe the silliest paragraph in the whole article. Not only do I not know what region Brizendine is talking about here, I have absolutely no clue what the “sexual pursuit area” might be. It could be just me, I suppose, but I just searched Google Scholar for “sexual pursuit area” and got… zero hits. Is it a visual region? A part of the hypothalamus? The notoriously grabby motor cortex hand area? No one knows, and Brizendine isn’t telling.  Off-hand, I don’t know of any region of the human brain that shows the degree of sexual dimorphism Brizendine claims here.

If testosterone were beer, a 9-year-old boy would be getting the equivalent of a cup a day. But a 15-year-old would be getting the equivalent of nearly two gallons a day. This fuels their sexual engines and makes it impossible for them to stop thinking about female body parts and sex.

If each fiber of chest hair was a tree, a 12-year-old boy would have a Bonsai sitting on the kitchen counter, and a 30-year-old man would own Roosevelt National Forest. What you’re supposed to learn from this analogy, I honestly couldn’t tell you. It’s hard for me to think clearly about trees and hair you see, seeing as how I find it impossible to stop thinking about female body parts while I’m trying to write this.

All that testosterone drives the “Man Trance”– that glazed-eye look a man gets when he sees breasts. As a woman who was among the ranks of the early feminists, I wish I could say that men can stop themselves from entering this trance. But the truth is, they can’t. Their visual brain circuits are always on the lookout for fertile mates. Whether or not they intend to pursue a visual enticement, they have to check out the goods.

To a man, this is the most natural response in the world, so he’s dismayed by how betrayed his wife or girlfriend feels when she sees him eyeing another woman. Men look at attractive women the way we look at pretty butterflies. They catch the male brain’s attention for a second, but then they flit out of his mind. Five minutes later, while we’re still fuming, he’s deciding whether he wants ribs or chicken for dinner. He asks us, “What’s wrong?” We say, “Nothing.” He shrugs and turns on the TV. We smolder and fear that he’ll leave us for another woman.

This actually isn’t so bad if you ignore the condescending “men are animals with no self-control” implication and pretend Brizendine had just made the  indisputably true but utterly banal observation that men, on average, like to ogle women more than women, on average, like to ogle men.

Not surprisingly, the different objectives that men and women have in mating games put us on opposing teams — at least at first. The female brain is driven to seek security and reliability in a potential mate before she has sex. But a male brain is fueled to mate and mate again. Until, that is, he mates for life.

So men are driven to sleep around, again and again… until they stop sleeping around. It’s tautological and profound at the same time!

Despite stereotypes to the contrary, the male brain can fall in love just as hard and fast as the female brain, and maybe more so. When he meets and sets his sights on capturing “the one,” mating with her becomes his prime directive. And when he succeeds, his brain makes an indelible imprint of her. Lust and love collide and he’s hooked.

Failure to operationalize complex construct of “love” in a measurable way… check. Total lack of evidence in support of claim that men and women are equally love-crazy… check. Oblique reference to Star Trek universe… check. What’s not to like?

A man in hot pursuit of a mate doesn’t even remotely resemble a devoted, doting daddy. But that’s what his future holds. When his mate becomes pregnant, she’ll emit pheromones that will waft into his nostrils, stimulating his brain to make more of a hormone called prolactin. Her pheromones will also cause his testosterone production to drop by 30 percent.

You know, on the off-chance that something like this is actually true, I think it’s actually kind of neat. But I just can’t bring myself to do a literature search, because I’m pretty sure I’ll discover that the jury is still out on whether humans even emit and detect pheromones (ok, I know this isn’t a completely baseless claim), or that there’s little to no evidence of a causal relationship between women releasing pheromones and testosterone levels dropping in men. I don’t like to be disappointed, you see; it turns out it’s much easier to just decide what you want to believe ahead of time and then contort available evidence to fit that view.

Anyway, we’re only half-way through the article; Brizendine goes on in similar fashion for several hundred more words. Highlights include the origin of male poker face, the conflation of correlation and causation in sociable elderly men, and the effects of oxytocin on your grandfather. You should go read the reset of it if you practice masochism; I’m too full of rage depressed to write about it any more.

Setting aside the blatant exercise in irresponsible scientific communication (Brizendine has an MD, and appears to be at least nominally affiliated with UCSF’s psychiatry department, so ignorance shouldn’t really be a valid excuse here), I guess what I’d really like to know is what goes through Brizendine’s mind when she writes this sort of dreck. Does she really believe the ludicrous claims she makes? Is she fully aware she’s grossly distorting the empirical evidence if not outright confabulating, and is simply in it for the money? Or does she rationalize it as a case of the ends justifying the means, thinking the message she’s presenting is basically right, so it’s ok if nearly all a few of the details go missing in the process?

I understand that presenting scientific evidence in an accurate and entertaining manner is a difficult business, and many people who work hard at it still get it wrong pretty often (I make mistakes in my posts here all the time!). But many scientists still manage to find time in their busy schedules to write popular science books that present the science in an accessible way without having to make up ridiculous stories just to keep the reader entertained (Steven Pinker, Antonio Damasio, and Dan Gilbert are just a few of the first ones that spring to mind). And then there are amazing science writers like Carl Zimmer and David Dobbs who don’t necessarily have any professional training in the areas they write about, but still put in the time and energy to make sure they get the details right, and consistently write stories that blow me away (the highest compliment I can pay to a science story is that it makes me think “I wish I studied that“, and Zimmer’s articles routinely do that). That type of intellectual honesty is essential, because there’s really no point in going to the trouble of doing most scientific research if people get to disregard any findings they disagree with on ideological or aesthetic grounds, or can make up any evidence they like to fit their claims.

The sad thing is that Brizendine’s new book will probably sell more copies in its first year out than Carl Zimmer’s entire back catalogue. And it’s not going to sell all those copies because it’s a careful meditation on the subtle differences between genders that scientists have uncovered; it’s going to fly off the shelves because it basically regurgitates popular stereotypes about gender differences with a seemingly authoritative scientific backing. Instead of evaluating and challenging many of those notions with actual empirical data, people who read Brizendine’s work will now get to say “science proves it!”, making it that much more difficult for responsible scientists and journalists to tell the public what’s really true about gender differences.

You might say (or at least, Brizendine might say) that this is all well and good, but hopelessly naive and idealistic, and that telling an accurate story is always going to be less important than telling the public what it wants to hear about science, because the latter is the only way to ensure continued funding for and interest in scientific research. This isn’t that uncommon a sentiment; I’ve even heard a number of scientists who I otherwise have a great deal of respect for say something like this. But I think Brizendine’s work underscores the typical outcome of that type of reasoning: once you allow yourself to relax the standards for what counts as evidence, it becomes quite easy to rationalize almost any rhetorical abuse of science, and ultimately you abuse the public’s trust while muddying the waters for working scientists.

As with so many other things, I think Richard Feynman summed up this sentiment best:

I would like to add something that’s not essential to the science, but something I kind of believe, which is that you should not fool the layman when you’re talking as a scientist. I am not trying to tell you what to do about cheating on your wife, or fooling your girlfriend, or something like that, when you’re not trying to be a scientist, but just trying to be an ordinary human being. We’ll leave those problems up to you and your rabbi. I’m talking about a specific, extra type of integrity that is not lying, but bending over backwards to show how you are maybe wrong, that you ought to have when acting as a scientist. And this is our responsibility as scientists, certainly to other scientists, and I think to laymen.

For example, I was a little surprised when I was talking to a friend who was going to go on the radio. He does work on cosmology and astronomy, and he wondered how he would explain what the applications of this work were. “Well,” I said, “there aren’t any.” He said, “Yes, but then we won’t get support for more research of this kind.” I think that’s kind of dishonest. If you’re representing yourself as a scientist, then you should explain to the layman what you’re doing–and if they don’t want to support you under those circumstances, then that’s their decision.

No one doubts that men and women differ from one another, and the study of gender differences is an active and important area of psychology and neuroscience. But I can’t for the life of me see any merit in telling the public that men can’t stop thinking about breasts because they’re full of the beer-equivalent of two gallons of testosterone.

[Update 3/25: plenty of other scathing critiques pop up in the blogosphere today: Language Log, Salon, and Neuronarrative, and no doubt many others...]