Archive for the ‘news’ Category

aftermath of the NYT / Lindstrom debacle

Thursday, October 6th, 2011

Over the last few days the commotion over Martin Lindstrom’s terrible New York Times iPhone loving Op-Ed, which I wrote about in my last post, seems to have spread far and wide. Highlights include excellent posts by David Dobbs and the Neurocritic, but really there are too many to list at this point. And the verdict is overwhelmingly negative; I don’t think I’ve seen a single post in defense of Lindstrom, which is probably not a good sign (for him).

In the meantime, Russ Poldrack and over 40 other neuroscientists and psychologists (including me) wrote a letter to the NYT complaining about the Lindstrom Op-Ed, which the NYT has now published. As per usual, they edited down the letter till it almost disappeared. But the original, along with a list of signees, is on Russ’s blog.

Anyway, the fact that the Times published the rebuttal letter is all well and good, but as I mentioned in my last post, the bigger problem is that since the Times doesn’t include links to related content on their articles, people who stumble across the Op-Ed aren’t going to have any way of knowing that it’s been roundly discredited by pretty much the entire web. Lindstrom’s piece was the most emailed article on the Times website for a day or two, but only a tiny fraction of those readers will ever see (or even hear about) the critical response. As far as I know, the NYT hasn’t issued an explanation or apology for publishing the Op-Ed; they’ve simply published the letter and gone on about their business (I guess I can’t fault them for this–if they had to issue a formal apology for every mistake that gets published, they’d have no time for anything else; the trick is really to catch this type of screw-up at the front end). Adding links from each article to related content wouldn’t solve the problem entirely, of course, but it would be something. The fact that Times’ platform currently doesn’t have this capacity is kind of perplexing.

The other point worth mentioning is that, in the aftermath of the tsunami of criticism he received, Lindstrom left a comment on several blogs (Russ Poldrack and David Dobbs were lucky recipients; sadly, I wasn’t on the guest list). Here’s the full text of the comment:

My first foray into neuro-marketing research was for my New York Times bestseller Buyology: Truth and Lies about Why We Buy. For that book I teamed up with Neurosense, a leading independent neuro-marketing company that specializes in consumer research using functional magnetic resonance imaging (fMRI) headed by Oxford University trained Gemma Calvert, BSc DPhil CPsychol FRSA and Neuro-Insight, a market research company that uses unique brain-imaging technology, called Steady-State Topography (SST), to measure how the brain responds to communications which is lead by Dr. Richard Silberstein, PhD. This was the single largest neuro-marketing study ever conducted—25x larger than any such study to date and cost more than seven million dollars to run.

In the three-year effort scientists scanned the brains of over 2,000 people from all over the world as they were exposed to various marketing and advertising strategies including clever product placements, sneaky subliminal messages, iconic brand logos, shocking health and safety warnings, and provocative product packages. The purpose of all of this was to understand, quite successfully I may add, the key drivers behind why we make the purchasing decisions that we do.

For the research that my recent Op-Ed column in the New York Times was based on I turned to Dr. David Hubbard, a board-certified neurologist and his company MindSign Neuro Marketing, an independently owned fMRI neuro-marketing company. I asked Dr. Hubbard and his team a simple question, “Are we addicted to our iPhones?” After analyzing the brains of 8 men and 8 women between the ages of 18-25 using fMRI technology, MindSign answered my question using standardized answering methods and completely reproducible results. The conclusion was that we are not addicted to our iPhones, we are in love with them.

The thought provoking dialogue that has been generated from the article has been overwhelmingly positive and I look forward to the continued comments from professionals in the field, readers and fans.

Respectfully,

Martin Lindstrom

As evasive responses go, this is a masterpiece; at no point does Lindstrom ever actually address any of the substantive criticisms leveled at him. He spends most of his response name dropping (the list of credentials is almost long enough to make you forget that the rebuttal letter to his Op-Ed was signed by over 40 PhDs) and rambling about previous unrelated neuromarketing work (which may as well not exist, since none of it has ever been made public), and then closes by shifting the responsibility for the study to MindSign, the company he paid to run the iPhone study. The claim that MindSign “answered [his] question using standardized answering methods and completely reproducible results” is particularly ludicrous; as I explained in my last post, there currently aren’t any standardized methods for reading addiction or love off of brain images. And ‘completely reproducible results’ implies that one has, you know, successfully reproduced the results, which is simply false unless Lindstrom is suggesting that MindSign did the same experiment twice. It’s hard to see any “thought provoking dialogue” taking place here, and the neuroimaging community’s response to the Op-Ed column has been, virtually without exception, overwhelmingly negative, not positive (as Lindstrom claims).

That all said, I do think there’s one very positive aspect to this entire saga, and that’s the amazing speed and effectiveness of the response from scientists, science journalists, and other scientifically literate folks. Ten years ago, Lindstrom’s piece might have gone completely unchallenged–and even if someone like Russ Poldrack had written a response, it would probably have appeared much later, been signed by fewer scientists (because coordination would have been much more difficult), and received much less attention. But with 48 hours of Lindstrom’s Op-Ed being published, dozens of critical blog posts had appeared, and hundreds, if not thousands, of people all over the world had tweeted or posted links to these critiques (my last post alone received over 12,000 hits). Scientific discourse, which used to be confined largely to peer-reviewed print journals and annual conferences, now takes place at a remarkable pace online, and it’s fantastic to see social media used in this way. The hope is that as these technologies develop further and scientists take on a more active role in communicating with the public (something that platforms like Twitter and Google+ seem to be facilitating amazingly well), it’ll become increasingly difficult for people like Lindstrom to make crazy pseudoscientific claims without being immediately and visibly called out on it–even in those rare cases when the NYT makes the mistake of leaving one the biggest microphones on earth open and unmonitored.

the New York Times blows it big time on brain imaging

Saturday, October 1st, 2011

The New York Times has a terrible, terrible Op-Ed piece today by Martin Lindstrom (who I’m not going to link to, because I don’t want to throw any more bones his way). If you believe Lindstrom, you don’t just like your iPhone a lot; you love it. Literally. And the reason you love it, shockingly, is your brain:

Earlier this year, I carried out an fMRI experiment to find out whether iPhones were really, truly addictive, no less so than alcohol, cocaine, shopping or video games. In conjunction with the San Diego-based firm MindSign Neuromarketing, I enlisted eight men and eight women between the ages of 18 and 25. Our 16 subjects were exposed separately to audio and to video of a ringing and vibrating iPhone.

But most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion. The subjects’ brains responded to the sound of their phones as they would respond to the presence or proximity of a girlfriend, boyfriend or family member.

In short, the subjects didn’t demonstrate the classic brain-based signs of addiction. Instead, they loved their iPhones.

There’s so much wrong with just these three short paragraphs (to say nothing of the rest of the article, which features plenty of other whoppers) that it’s hard to know where to begin. But let’s try. Take first the central premise–that an fMRI experiment could help determine whether iPhones are no less addictive than alcohol or cocaine. The tacit assumption here is that all the behavioral evidence you could muster–say, from people’s reports about how they use their iPhones, or clinicians’ observations about how iPhones affect their users–isn’t sufficient to make that determination; to “really, truly” know if something’s addictive, you need to look at what the brain is doing when people think about their iPhones. This idea is absurd inasmuch as addiction is defined on the basis of its behavioral consequences, not (right now, anyway) by the presence or absence of some biomarker. What makes someone an alcoholic is the fact that they’re dependent on alcohol, have trouble going without it, find that their alcohol use interferes with multiple aspects of their day-to-day life, and generally suffer functional impairment because of it–not the fact that their brain lights up when they look at pictures of Johnny Walker red. If someone couldn’t stop drinking–to the point where they lost their job, family, and friends–but their brain failed to display a putative biomarker for addiction, it would be strange indeed to say “well, you show all the signs, but I guess you’re not really addicted to alcohol after all.”

Now, there may come a day (and it will be a great one) when we have biomarkers sufficiently accurate that they can stand in for the much more tedious process of diagnosing someone’s addiction the conventional way. But that day is, to put it gently, a long way off. Right now, if you want to know if iPhones are addictive, the best way to do that is to, well, spend some time observing and interviewing iPhone users (and some quantitative analysis would be helpful).

Of course, it’s not clear what Lindstrom thinks an appropriate biomarker for addiction would be in any case. Presumably it would have something to do with the reward system; but what? Suppose Lindstrom had seen robust activation in the ventral striatum–a critical component of the brain’s reward system–when participants gazed upon the iPhone: what then? Would this have implied people are addicted to iPhones? But people also show striatal activity when gazing on food, money, beautiful faces, and any number of other stimuli. Does that mean the average person is addicted to all of the above? A marker of pleasure or reward, maybe (though even that’s not certain), but addiction? How could a single fMRI experiment with 16 subjects viewing pictures of iPhones confirm or disconfirm the presence of addiction? Lindstrom doesn’t say. I suppose he has good reason not to say: if he really did have access to an accurate fMRI-based biomarker for addiction, he’d be in a position to make millions (billions?) off the technology. To date, no one else has come close to identifying a clinically accurate fMRI biomarker for any kind of addiction (for more technical readers, I’m talking here about cross-validated methods that have both sensitivity and specificity comparable to traditional approaches when applied to new subjects–not individual studies that claim 90% with-sample classification accuracy based on simple regression models). So we should, to put it mildly, be very skeptical that Lindstrom’s study was ever in a position to do what he says it was designed to do.

We should also ask all sorts of salient and important questions about who the people are who are supposedly in love with their iPhones. Who’s the “You” in the “You Love Your iPhone” of the title? We don’t know, because we don’t know who the participants in Lindstrom’s sample, were, aside from the fact that they were eight men and eight women aged 18 to 25. But we’d like to know some other important things. For instance, were they selected for specific characteristics? Were they, say, already avid iPhone users? Did they report loving, or being addicted to their iPhones? If so, would it surprise us that people chosen for their close attachment to their iPhones also showed brain activity patterns typical of close attachment? (Which, incidentally, they actually don’t–but more on that below.) And if not, are we to believe that the average person pulled off the street–who probably has limited experience with iPhones–really responds to the sound of their phones “as they would respond to the presence or proximity of a girlfriend, boyfriend or family member”? Is the takeaway message of Lindstrom’s Op-Ed that iPhones are actually people, as far as our brains are concerned?

In fairness, space in the Times is limited, so maybe it’s not fair to demand this level of detail in the Op-Ed iteslf. But the bigger problem is that we have no way of evaluating Lindstrom’s claims, period, because (as far as I can tell), his study hasn’t been published or peer-reviewed anywhere. Presumably, it’s proprietary information that belongs to the neuromarketing firm in question. Which is to say, the NYT is basically giving Lindstrom license to talk freely about scientific-sounding findings that can’t actually be independently confirmed, disputed, or critiqued by members of the scientific community with expertise in the very methods Lindstrom is applying (expertise which, one might add, he himself lacks). For all we know, he could have made everything up. To be clear, I don’t really think he did make everything up–but surely, somewhere in the editorial process someone at the NYT should have stepped in and said, “hey, these are pretty strong scientific claims; is there any way we can make your results–on which your whole article hangs–available for other experts to examine?”

This brings us to what might be the biggest whopper of all, and the real driver of the article title: the claim that “most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion“. Russ Poldrack already tore this statement to shreds earlier this morning:

Insular cortex may well be associated with feelings of love and compassion, but this hardly proves that we are in love with our iPhones.  In Tal Yarkoni’s recent paper in Nature Methods, we found that the anterior insula was one of the most highly activated part of the brain, showing activation in nearly 1/3 of all imaging studies!  Further, the well-known studies of love by Helen Fisher and colleagues don’t even show activation in the insula related to love, but instead in classic reward system areas.  So far as I can tell, this particular reverse inference was simply fabricated from whole cloth.  I would have hoped that the NY Times would have learned its lesson from the last episode.

But you don’t have to take Russ’s word for it; if you surf for a few terms on our Neurosynth website, making sure to select “forward inference” under image type, you’ll notice that the insula shows up for almost everything. That’s not an accident; it’s because the insula (or at least the anterior part of the insula) plays a very broad role in goal-directed cognition. It really is activated when you’re doing almost anything that involves, say, following instructions an experimenter gave you, or attending to external stimuli, or mulling over something salient in the environment. You can see this pretty clearly in this modified figure from our Nature Methods paper (I’ve circled the right insula):

Proportion of studies reporting activation at each voxel

The insula is one of a few ‘hotspots’ where activation is reported very frequently in neuroimaging articles (the other major one being the dorsal medial frontal cortex). So, by definition, there can’t be all that much specificity to what the insula is doing, since it pops up so often. To put it differently, as Russ and others have repeatedly pointed out, the fact that a given region activates when people are in a particular psychological state (e.g., love) doesn’t give you license to conclude that that state is present just because you see activity in the region in question. If language, working memory, physical pain, anger, visual perception, motor sequencing, and memory retrieval all activate the insula, then knowing that the insula is active is of very little diagnostic value. That’s not to say that some psychological states might not be more strongly associated with insula activity (again, you can see this on Neurosynth if you switch the image type to ‘reverse inference’ and browse around); it’s just that, probabilistically speaking, the mere fact that the insula is active gives you very little basis for saying anything concrete about what people are experiencing.

In fact, to account for Lindstrom’s findings, you don’t have to appeal to love or addiction at all. There’s a much simpler way to explain why seeing or hearing an iPhone might elicit insula activation. For most people, the onset of visual or auditory stimulation is a salient event that causes redirection of attention to the stimulated channel. I’d be pretty surprised, actually, if you could present any picture or sound to participants in an fMRI scanner and not elicit robust insula activity. Orienting and sustaining attention to salient things seems to be a big part of what the anterior insula is doing (whether or not that’s ultimately its ‘core’ function). So the most appropriate conclusion to draw from the fact that viewing iPhone pictures produces increased insula activity is something vague like “people are paying more attention to iPhones”, or “iPhones are particularly salient and interesting objects to humans living in 2011.” Not something like “no, really, you love your iPhone!”

In sum, the NYT screwed up. Lindstrom appears to have a habit of making overblown claims about neuroimaging evidence, so it’s not surprising he would write this type of piece; but the NYT editorial staff is supposedly there to filter out precisely this kind of pseudoscientific advertorial. And they screwed up. It’s a particularly big screw-up given that (a) as of right now, Lindstrom’s Op-Ed is the single most emailed article on the NYT site, and (b) this incident almost perfectly recapitulates another NYT article 4 years ago in which some neuroscientists and neuromarketers wrote a grossly overblown Op-Ed claiming to be able to infer, in detail, people’s opinions about presidential candidates. That time, Russ Poldrack and a bunch of other big names in cognitive neuroscience wrote a concise rebuttal that appeared in the NYT (but unfortunately, isn’t linked to from the original Op-Ed, so anyone who stumbles across the original now has no way of knowing how ridiculous it is). One hopes the NYT follows up in similar fashion this time around. They certainly owe it to their readers–some of whom, if you believe Lindstrom, are now in danger of dumping their current partners for their iPhones.

h/t: Molly Crockett

the APS likes me!

Wednesday, May 4th, 2011

Somehow I wound up profiled in this month’s issue of the APS Observer as a “Rising Star“. I’d like to believe this means I’m a really big deal now, but I suspect what it actually means is that someone on the nominating committee at APS has extraordinarily bad judgment. I say this in no small part because I know some of the other people who were named Rising Stars quite well (congrats to Karl SzpunarJason Chan, and Alan Castel, among many other people!), so I’m pretty sure I can distinguish people who actually deserve this from, say, me.

Of course, I’m not going to look a gift horse in the mouth. And I’m certainly thrilled to be picked for this. I know these things are kind of a crapshoot, but it still feels really nice. So while the part of my brain that understands measurement error is saying “meh, luck of the draw,” that other part of my brain that likes to be told it’s awesome is in the middle of a three day coke bender right now*. The only regret both parts of the brain have is that there isn’t any money attached to the award–or even a token prize like, say, a free statistician for a year. But I don’t think I’m going to push my luck by complaining to APS about it.

One thing I like a lot about the format of the Rising Star awards is they give you a full page to talk about yourself and your research. If there’s one thing I like to talk about, it’s myself. Usually, you can’t talk about yourself for very long before people start giving you dirty looks. But in this case, it’s sanctioned, so I guess it’s okay. In any case, the kind folks at the Observer sent me a series of seven questions to answer. And being an upstanding gentleman who likes to be given fancy awards, I promptly obliged. I figured they would just run what I sent them with minor edits… but I WAS VERY WRONG. They promptly disassembled nearly all of my brilliant observations and advice and replaced them with some very tame ramblings. So if you actually bother to read my responses, and happen to fall asleep halfway through, you’ll know who to blame. But just to set the record straight, I figured I would run through each of the boilerplate questions I was asked, and show you the answer that was printed in the Observer as compared to what I actually wrote**:

What does your research focus on?

What they printed: Most of my current research focuses on what you might call psychoinformatics: the application of information technology to psychology, with the aim of advancing our ability to study the human mind and brain. I’m interested in developing new ways to acquire, synthesize, and share data in psychology and cognitive neuroscience. Some of the projects I’ve worked on include developing new ways to measure personality more efficiently, adapting computer science metrics of string similarity to visual word recognition, modeling fMRI data on extremely short timescales, and conducting large-scale automated synthesis of published neuroimaging findings. The common theme that binds these disparate projects together is the desire to develop new ways of conceptualizing and addressing psychological problems; I believe very strongly in the transformative power of good methods.

What I actually said: I don’t know! There’s so much interesting stuff to think about! I can’t choose!

What drew you to this line of research? Why is it exciting to you?

What they printed: Technology enriches and improves our lives in every domain, and science is no exception. In the biomedical sciences in particular, many revolutionary discoveries would have been impossible without substantial advances in information technology. Entire subfields of research in molecular biology and genetics are now synonymous with bioinformatics, and neuroscience is currently also experiencing something of a neuroinformatics revolution. The same trend is only just beginning to emerge in psychology, but we’re already able to do amazing things that would have been unthinkable 10 or 20 years ago. For instance, we can now collect data from thousands of people all over the world online, sample people’s inner thoughts and feelings in real time via their phones, harness enormous datasets released by governments and corporations to study everything from how people navigate their spatial world to how they interact with their friends, and use high-performance computing platforms to solve previously intractable problems through large-scale simulation. Over the next few years, I think we’re going to see transformative changes in the way we study the human mind and brain, and I find that a tremendously exciting thing to be involved in.

What I actually said: I like psychology a lot, and I like technology a lot. Why not combine them!

Who were/are your mentors or psychological influences?

What they printed: I’ve been fortunate to have outstanding teachers and mentors at every stage of my training. I actually started my academic career quite disinterested in science and owe my career trajectory in no small part to two stellar philosophy professors (Rob Stainton and Chris Viger) who convinced me as an undergraduate that engaging with empirical data was a surprisingly good way to discover how the world really works. I can’t possibly do justice to all the valuable lessons my graduate and postdoctoral mentors have taught me, so let me just pick a few out of a hat. Among many other things, Todd Braver taught me how to talk through problems collaboratively and keep recursively questioning the answers to problems until a clear understanding materializes. Randy Larsen taught me that patience really is a virtue, despite my frequent misgivings. Tor Wager has taught me to think more programmatically about my research and to challenge myself to learn new skills. All of these people are living proof that you can be an ambitious, hard-working, and productive scientist and still be extraordinarily kind and generous with your time. I don’t think I embody those qualities myself right now, but at least I know what to shoot for.

What I actually said: Richard Feynman, Richard Hamming, and my mother. Not necessarily in that order.

To what do you attribute your success in the science?

What they printed: Mostly to blind luck. So far I’ve managed to stumble from one great research and mentoring situation to another. I’ve been fortunate to have exceptional advisors who’ve provided me with the perfect balance of freedom and guidance and amazing colleagues and friends who’ve been happy to help me out with ideas and resources whenever I’m completely out of my depth — which is most of the time.

To the extent that I can take personal credit for anything, I think I’ve been good about pursuing ideas I’m passionate about and believe in, even when they seem unlikely to pay off at first. I’m also a big proponent of exploratory research; I think pure exploration is tremendously undervalued in psychology. Many of my projects have developed serendipitously, as a result of asking, “What happens if we try doing it this way?”

What I actually said: Mostly to blind luck.

What’s your future research agenda?

What they printed: I’d like to develop technology-based research platforms that improve psychologists’ ability to answer existing questions while simultaneously opening up entirely new avenues of research. That includes things like developing ways to collect large amounts of data more efficiently, tracking research participants over time, automatically synthesizing the results of published studies, building online data repositories and collaboration tools, and more. I know that all sounds incredibly vague, and if you have some ideas about how to go about any of it, I’d love to collaborate! And by collaborate, I mean that I’ll brew the coffee and you’ll do the work.

What I actually said: Trading coffee for publications?

Any advice for even younger psychological scientists? What would you tell someone just now entering graduate school or getting their PhD?

What they printed: The responsible thing would probably be to say “Don’t go to graduate school.” But if it’s too late for that, I’d recommend finding brilliant mentors and colleagues and serving them coffee exactly the way they like it. Failing that, find projects you’re passionate about, work with people you enjoy being around, develop good technical skills, and don’t be afraid to try out crazy ideas. Leave your office door open, and talk to everyone you can about the research they’re doing, even if it doesn’t seem immediately relevant. Good ideas can come from anywhere and often do.

What I actually said: “Don’t go to graduate school.”

What publication you are most proud of or feel has been most important to your career?

What they printed: Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Manuscript submitted for publication.

In this paper, we introduce a highly automated platform for synthesizing data from thousands of published functional neuroimaging studies. We used a combination of text mining, meta-analysis, and machine learning to automatically generate maps of brain activity for hundreds of different psychological concepts, and we showed that these results could be used to “decode” cognitive states from brain activity in individual human subjects in a relatively open-ended way. I’m very proud of this work, and I’m quite glad that my co-authors agreed to make me first author in return for getting their coffee just right. Unfortunately, the paper isn’t published yet, so you’ll just have to take my word for it that it’s really neat stuff. And if you’re thinking, “Isn’t it awfully convenient that his best paper is unpublished?”… why, yes. Yes it is.

What I actually said: …actually, that’s almost exactly what I said. Except they inserted that bit about trading coffee for co-authorship. Really all I had to do was ask my co-authors nicely.

Anyway, like I said, it’s really nice to be honored in this way, even if I don’t really deserve it (and that’s not false modesty–I’m generally the first to tell other people when I think I’ve done something awesome). But I’m a firm believer in regression to the mean, so I suspect the run of good luck won’t last. In a few years, when I’ve done almost no new original work, failed to land a tenure-track job, and dropped out of academia to ride horses around the racetrack***, you can tell people that you knew me back when I was a Rising Star. Right before you tell them you don’t know what the hell happened.

———————————-

* But not really.

** Totally lying. Pretty much every word is as I wrote it. And the Observer staff were great.

*** Hopefully none of these things will happen. Except the jockey thing; that would be awesome.

trouble with biomarkers and press releases

Sunday, August 15th, 2010

The latest issue of the Journal of Neuroscience contains an interesting article by Ecker et al in which the authors attempted to classify people with autism spectrum disorder (ASD) and health controls based on their brain anatomy, and report achieving “a sensitivity and specificity of up to 90% and 80%, respectively.” Before unpacking what that means, and why you probably shouldn’t get too excited (about the clinical implications, at any rate; the science is pretty cool), here’s a snippet from the decidedly optimistic press release that accompanied the study:

“Scientists funded by the Medical Research Council (MRC) have developed a pioneering new method of diagnosing autism in adults. For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy. The method could lead to the screening for autism spectrum disorders in children in the future.”

If you think this sounds too good to be true, that’s because it is. Carl Heneghan explains why in an excellent article in the Guardian:

How the brain scans results are portrayed is one of the simplest mistakes in interpreting diagnostic test accuracy to make. What has happened is, the sensitivity has been taken to be the positive predictive value, which is what you want to know: if I have a positive test do I have the disease? Not, if I have the disease, do I have a positive test? It would help if the results included a measure called the likelihood ratio (LR), which is the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder. In this case the LR is 4.5. We’ve put up an article if you want to know more on how to calculate the LR.

In the general population the prevalence of autism is 1 in 100; the actual chances of having the disease are 4.5 times more likely given a positive test. This gives a positive predictive value of 4.5%; about 5 in every 100 with a positive test would have autism.

For those still feeling confused and not convinced, let’s think of 10,000 children. Of these 100 (1%) will have autism, 90 of these 100 would have a positive test, 10 are missed as they have a negative test: there’s the 90% reported accuracy by the media.

But what about the 9,900 who don’t have the disease? 7,920 of these will test negative (the specificity3 in the Ecker paper is 80%). But, the real worry though, is the numbers without the disease who test positive. This will be substantial: 1,980 of the 9,900 without the disease. This is what happens at very low prevalences, the numbers falsely misdiagnosed rockets. Alarmingly, of the 2,070 with a positive test, only 90 will have the disease, which is roughly 4.5%.

In other words, if you screened everyone in the population for autism, and assume the best about the classifier reported in the JNeuro article (e.g., that the sample of 20 ASD participants they used is perfectly representative of the broader ASD population, which seems unlikely), only about 1 in 20 people who receive a positive diagnosis would actually deserve one.

Ecker et al object to this characterization, and reply to Heneghan in the comments (through the MRC PR office):

Our test was never designed to screen the entire population of the UK. This is simply not practical in terms of costs and effort, and besides totally  unjustified- why would we screen everybody in the UK for autism if there is no evidence whatsoever that an individual is affected?. The same case applies to other diagnostic tests. Not every single individual in the UK is tested for HIV. Clearly this would be too costly and unnecessary. However, in the group of individuals that are test for the virus, we can be very confident that if the test is positive that means a patient is infected. The same goes for our approach.

Essentially, the argument is that, since people would presumably be sent for an MRI scan because they were already under consideration for an ASD diagnosis, and not at random, the false positive rate would in fact be much lower than 95%, and closer to the 20% reported in the article.

One response to this reply–which is in fact Heneghan’s response in the comments–is to point out that the pre-test probability of ASD would need to be pretty high already in order for the classifier to add much. For instance, even if fully 30% of people who were sent for a scan actually had ASD, the posterior probability of ASD given a positive result would still be only 66% (Heneghan’s numbers, which I haven’t checked). Heneghan nicely contrasts these results with the standard for HIV testing, which “reports sensitivity of 99.7% and specificity of 98.5% for enzyme immunoassay.” Clearly, we have a long way to go before doctors can order MRI-based tests for ASD and feel reasonably confident that a positive result is sufficient grounds for an ASD diagnosis.

Setting Heneghan’s concerns about base rates aside, there’s a more general issue that he doesn’t touch on. It’s one that’s not specific to this particular study, and applies to nearly all studies that attempt to develop “biomarkers” for existing disorders. The problem is that the sensitivity and specificity values that people report for their new diagnostic procedure in these types of studies generally aren’t the true parameters of the procedure. Rather, they’re the sensitivity and specificity under the assumption that the diagnostic procedures used to classify patients and controls in the first place are themselves correct. In other words, in order to believe the results, you have to assume that the researchers correctly classified the subjects into patient and control groups using other procedures. In cases where the gold standard test used to make the initial classification is known to have near 100% sensitivity and specificity (e.g., for the aforementioned HIV tests), one can reasonably ignore this concern. But when we’re talking about mental health disorders, where diagnoses are fuzzy and borderline cases abound, it’s very likely that the “gold standard” isn’t really all that great to begin with.

Concretely,  studies that attempt to develop biomarkers for mental health disorders face two substantial problems. One is that it’s extremely unlikely that the clinical diagnoses are ever perfect; after all, if they were perfect, there’d be little point in trying to develop other diagnostic procedures! In this particular case, the authors selected subjects into the ASD group based on standard clinical instruments and structured interviews. I don’t know that there are many clinicians who’d claim with a straight face that the current diagnostic criteria for ASD (and there are multiple sets to choose from!) are perfect. From my limited knowledge, the criteria for ASD seem to be even more controversial than those for most other mental health disorders (which is saying something, if you’ve been following the ongoing DSM-V saga). So really, the accuracy of the classifier in the present study, even if you put the best face on it and ignore the base rate issue Heneghan brings up, is undoubtedly south of the 90% sensitivity / 80% specificity the authors report. How much south, we just don’t know, because we don’t really have any independent, objective way to determine who “really” should get an ASD diagnosis and who shouldn’t (assuming you think it makes sense to make that kind of dichotomous distinction at all). But 90% accuracy is probably a pipe dream, if for no other reason than it’s hard to imagine that level of consensus about autism spectrum diagnoses.

The second problem is that, because the researchers are using the MRI-based classifier to predict the clinician-based diagnosis, it simply isn’t possible for the former to exceed the accuracy of the latter. That bears repeating, because it’s important: no matter how good the MRI-based classifier is, it can only be as good as the procedures used to make the original diagnosis, and no better. It cannot, by definition, make diagnoses that are any more accurate than the clinicians who screened the participants in the authors’ ASD sample. So when you see the press release say this:

For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy.

You should really read it as this:

The method relies on structural (MRI) brain scans and has an accuracy rate approaching that of conventional clinical diagnosis.

That’s not quite as exciting, obviously, but it’s more accurate.

To be fair, there’s something of a catch-22 here, in that the authors didn’t really have a choice about whether or not to diagnose the ASD group using conventional criteria. If they hadn’t, reviewers and other researchers would have complained that we can’t tell if the ASD group is really an ASD group, because they authors used non-standard criteria. Under the circumstances, they did the only thing they could do. But that doesn’t change the fact that it’s misleading to intimate, as the press release does, that the new procedure might be any better than the old ones. It can’t be, by definition.

Ultimately, if we want to develop brain-based diagnostic tools that are more accurate than conventional clinical diagnoses, we’re going to need to show that these tools are capable of predicting meaningful outcomes that clinician diagnoses can’t. This isn’t an impossible task, but it’s a very difficult one. One approach you could take, for instance, would be to compare the ability of clinician diagnosis and MRI-based diagnosis to predict functional outcomes among subjects at a later point in time. If you could show that MRI-based classification of subjects at an early age was a stronger predictor of receiving an ASD diagnosis later in life than conventional criteria, that would make a really strong case for using the former approach in the real world. Short of that type of demonstration though, the only reason I can imagine wanting to use a procedure that was developed by trying to duplicate the results of an existing procedure is in the event that the new procedure is substantially cheaper or more efficient than the old one. Meaning, it would be reasonable enough to say “well, look, we don’t do quite as well with this approach as we do with a full clinical evaluation, but at least this new approach costs much less.” Unfortunately, that’s not really true in this case, since the price of even a short MRI scan is generally going to outweigh that of a comprehensive evaluation by a psychiatrist or psychotherapist. And while it could theoretically be much faster to get an MRI scan than an appointment with a mental health professional, I suspect that that’s not generally going to be true in practice either.

Having said all that, I hasten to note that all this is really a critique of the MRC press release and subsequently lousy science reporting, and not of the science itself. I actually think the science itself is very cool (but the Neuroskeptic just wrote a great rundown of the methods and results, so there’s not much point in me describing them here). People have been doing really interesting work with pattern-based classifiers for several years now in the neuroimaging literature, but relatively few studies have applied this kind of technique to try and discriminate between different groups of individuals in a clinical setting. While I’m not really optimistic that the technique the authors introduce in this paper is going to change the way diagnosis happens any time soon (or at least, I’d argue that it shouldn’t), there’s no question that the general approach will be an important piece of future efforts to improve clinical diagnoses by integrating biological data with existing approaches. But that’s not going to happen overnight, and in the meantime, I think it’s pretty irresponsible of the MRC to be issuing press releases claiming that its researchers can diagnose autism in adults with 90% accuracy.

ResearchBlogging.orgEcker C, Marquand A, Mourão-Miranda J, Johnston P, Daly EM, Brammer MJ, Maltezos S, Murphy CM, Robertson D, Williams SC, & Murphy DG (2010). Describing the brain in autism in five dimensions–magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30 (32), 10612-23 PMID: 20702694

elsewhere on the net, vacation edition

Tuesday, July 13th, 2010

I’m hanging out in Boston for a few days, so blogging will probably be sporadic or nonexistent. Which is to say, you probably won’t notice any difference.

The last post on the Dunning-Kruger effect somehow managed to rack up 10,000 hits in 48 hours; but that was last week. Today I looked at my stats again, and the blog is back to a more normal 300 hits, so I feel like it’s safe to blog again. Here are some neat (and totally unrelated) links from the past week:

  • OKCupid has another one of those nifty posts showing off all the cool things they can learn from their gigantic userbase (who else gets to say things like “this analysis includes 1.51 million users’ data”???). Apparently, tall people (claim to) have more sex, attractive photos are more likely to be out of date, and most people who claim to be bisexual aren’t really bisexual.
  • After a few months off, my department-mate Chris Chatham is posting furiously again over at Developing Intelligence, with a series of excellent posts reviewing recent work on cognitive control and the perils of fMRI research. I’m not really sure what Chris spent his blogging break doing, but given the frequency with which he’s been posting lately, my suspicion is that he spent it secretly writing blog posts.
  • Mark Liberman points out a fundamental inconsistency in the way we view attributions of authorship: we get appropriately angry at academics who pass someone else’s work off as their own, but think it’s just fine for politicians to pay speechwriters to write for them. It’s an interesting question, and leads to an intimately related, and even more important question–namely, will anyone get mad at me if I pay someone else to write a blog post for me about someone else’s blog post discussing people getting angry at people paying or not paying other people to write material for other people that they do or don’t own the copyright on?
  • I like oohing and aahing over large datasets, and the Guardian’s Data Blog provides a nice interface to some of the most ooh- and aah-able datasets out there. [via R-Chart]
  • Ed Yong has a characteristically excellent write-up about recent work on the magnetic vision of birds. Yong also does link dump posts better than anyone else, so you should probably stop reading this one right now and read his instead.
  • You’ve probably heard about this already, but some time last week, the brain trust at ScienceBlogs made the amazingly clever decision to throw away their integrity by selling PepsiCo its very own “science” blog. Predictably, a lot of the bloggers weren’t happy with the decision, and many have now moved onto greener pastures; Carl Zimmer’s keeping score. Personally, I don’t have anything intelligent to add to everything that’s already been said; I’m literally dumbfounded.
  • Andrew Gelman takes apart an obnoxious letter from pollster John Zogby to Nate Silver of fivethirtyeight.com. I guess now we know that Zogby didn’t get where he is by not being an ass to other people.
  • Vaughan Bell of Mind Hacks points out that neuroplasticity isn’t a new concept, and was discussed seriously in the literature as far back as the 1800s. Apparently our collective views about the malleability of mind are not, themselves, very plastic.
  • NPR ran a three-part story by Barbara Bradley Hagerty on the emerging and somewhat uneasy relationship between neuroscience and the law. The articles are pretty good, but much better, in my opinion, was the Talk of the Nation episode that featured Hagerty as a guest alongside Joshua Greene, Kent Kiehl, and Stephen Morse–people who’ve all contributed in various ways to the emerging discipline of NeuroLaw. It’s a really interesting set of interviews and discussions. For what it’s worth, I think I agree with just about everything Greene has to say about these issues–except that he says things much more eloquently than I think them.
  • Okay, this one’s totally frivolous, but does anyone want to buy me one of these things? I don’t even like dried food; I just think it would be fun to stick random things in there and watch them come out pale, dried husks of their former selves. Is it morbid to enjoy watching the life slowly being sucked out of apples and mushrooms?

and the runner up is…

Tuesday, June 22nd, 2010

This one’s a bit of a head-scratcher. Thomson-Reuters just released its 2009 Journal Citation Report–essentially a comprehensive ranking of scientific journals by their impact factor (IF). The odd part, as reported by Bob Grant in The Scientist, is that the journal with the second-highest IF is Acta Crystallographica – Section A–ahead of heavyweights like the New England Journal of Medicine. For perspective, the same journal had an IF of 2.051 in 2008. The reason for the jump?

A single article published in a 2008 issue of the journal seems to be responsible for the meteoric rise in the Acta Crystallographica – Section A‘s impact factor. “A short history of SHELX,” by University of Göttingen crystallographer George Sheldrick, which reviewed the development of the computer system SHELX, has been cited more than 6,600 times, according to ISI. This paper includes a sentence that essentially instructs readers to cite the paper they’re reading — “This paper could serve as a general literature citation when one or more of the open-source SHELX programs (and the Bruker AXS version SHELXTL) are employed in the course of a crystal-structure determination.” (Note: This may be a good way to boost your citations.)

Setting aside the good career advice (and yes, I’ve made a mental note to include the phrase “this paper could serve as a general literature citation…” in my next paper), it’s perplexing that Thomson-Reuters didn’t downweight Acta Crystallographica‘s IF considerably given the obvious outlier. There’s no question they would have noticed that the second-ranked journal was only there in virtue of one article, so I’m curious what the thought process was. Perhaps the deliberation went something like this:

Thomson-Reuters statistician A: We need to take it out! We can’t have a journal with an impact factor of 2 last year beat out the NEJM!

Thomson-Reuters statistician B: But if we take it out, it’ll look like we tampered with the IF!

TRS-A: But we already tamper with the IF! No one knows how we come up with these numbers! Sometimes we can’t even replicate our own results ourselves! And anyway, it’s really not a big deal if we just leave the article in; scientists know better than to think Acta Crystallographica is the second most influential science journal on the planet. They’ll figure it out.

TRS-B: But that’s like asking them to just disregard our numbers! If you’re supposed to ignore the impact factor in cases where it contradicts your perception of journal quality, what’s the point of having an impact factor at all?

TRS-A: Beats me.

So okay, I’m sure it didn’t go down quite like that. But it’s still pretty weird.
And now, having bitched about how arbitrary the IF is, I’m going to go off and spend the next 15 minutes perusing the psychology and neuroscience journal rankings…

elsewhere on the net

Friday, June 4th, 2010

Some neat links from the past few weeks:

  • You Are No So Smart: A celebration of self-delusion. An excellent blog by journalist David McCraney that deconstructs common myths about the way the mind works.
  • NPR has a great story by Jon Hamilton about the famous saga of Einstein’s brain and what it’s helped teach us about brain function. [via Carl Zimmer]
  • The Neuroskeptic has a characteristically excellent 1,000 word explanation of how fMRI works.
  • David Rock has an interesting post on some recent work from Baumeister’s group purportedly showing that it’s good to believe in free will (whether or not it exists). My own feeling about this is that Baumeister’s not really studying people’s philosophical views about free will, but rather a construct closely related to self-efficacy and locus of control. But it’s certainly an interesting line of research.
  • The Prodigal Academic is a great new blog about all things academic. I’ve found it particularly interesting since several of the posts so far have been about job searches and job-seeking–something I’ll be experiencing my fill of over the next few months.
  • Prof-like Substance has a great 5-part series (1, 2, 3, 4, 5) on how blogging helps him as an academic. My own (much less eloquent) thoughts on that are here.
  • Cameron Neylon makes a nice case for the development of social webs for data mining.
  • Speaking of data mining, Michael Driscoll of Dataspora has an interesting pair of posts extolling the virtues of Big Data.
  • And just to balance things out, there’s this article in the New York Times by John Allen Paulos that offers some cautionary words about the challenges of using empirical data to support policy decisions.
  • On a totally science-less note, some nifty drawings (or is that photos?) by Ben Heine (via Crooked Brains):

fMRI, not coming to a courtroom near you so soon after all

Friday, June 4th, 2010

That’s a terribly constructed title, I know, but bear with me. A couple of weeks ago I blogged about a courtroom case in Tennessee where the defense was trying to introduce fMRI to the courtroom as a way of proving the defendant’s innocence (his brain, apparently, showed no signs of guilt). The judge’s verdict is now in, and…. fMRI is out. In United States v. Lorne Semrau, Judge Pham recommended that the government’s motion to exclude fMRI scans from consideration be granted. That’s the outcome I think most respectable cognitive neuroscientists were hoping for; as many people associated with the case or interviewed about it have noted (and as the judge recognized), there just isn’t a shred of evidence to suggest that fMRI has any utility as a lie detector in real-world situations.

The judge’s decision, which you can download in PDF form here (hat-tip: Thomas Nadelhoffer), is really quite elegant, and worth reading (or at least skimming through). He even manages some subtle snark in places. For instance (my italics):

Regarding the existence and maintenance of standards, Dr. Laken testified as to the protocols and controlling standards that he uses for his own exams. Because the use of fMRI-based lie detection is still in its early stages of development, standards controlling the real-life application have not yet been established. Without such standards, a court cannot adequately evaluate the reliability of a particular lie detection examination. Cordoba, 194 F.3d at 1061. Assuming, arguendo, that the standards testified to by Dr. Laken could satisfy Daubert, it appears that Dr. Laken violated his own protocols when he re-scanned Dr. Semrau on the AIMS tests SIQs, after Dr. Semrau was found “deceptive” on the first AIMS tests scan. None of the studies cited by Dr. Laken involved the subject taking a second exam after being found to have been deceptive on the first exam. His decision to conduct a third test begs the question whether a fourth scan would have revealed Dr. Semrau to be deceptive again.

The absence of real-life error rates, lack of controlling standards in the industry for real-life exams, and Dr. Laken’s apparent deviation from his own protocols are negative factors in the analysis of whether fMRI-based lie detection is scientifically valid. See Bonds, 12 F.3d at 560.

The reference here is to the fact that Laken and his company scanned Semrau (the defendant) on three separate occasions. The first two scans were planned ahead of time, but the third apparently wasn’t:

From the first scan, which included SIQs relating to defrauding the government, the results showed that Dr. Semrau was “not deceptive.” However, from the second scan, which included SIQs relating to AIMS tests, the results showed that Dr. Semrau was “being deceptive.” According to Dr. Laken, “testing indicates that a positive test result in a person purporting to tell the truth is accurate only 6% of the time.” Dr. Laken also believed that the second scan may have been affected by Dr. Semrau’s fatigue. Based on his findings on the second test, Dr. Laken suggested that Dr. Semrau be administered another fMRI test on the AIMS tests topic, but this time with shorter questions and conducted later in the day to reduce the effects of fatigue. … The third scan was conducted on January 12, 2010 at around 7:00 p.m., and according to Dr. Laken, Dr. Semrau tolerated it well and did not express any fatigue. Dr. Laken reviewed this data on January 18, 2010, and concluded that Dr. Semrau was not deceptive. He further stated that based on his prior studies, “a finding such as this is 100% accurate in determining truthfulness from a truthful person.”

I may very well be misunderstanding something here (and so might the judge), but if the positive predictive value of the test is only 6%, I’m guessing that the probability that the test is seriously miscalibrated is somewhat higher than 6%. Especially since the base rate for lying among people who are accused of committing serious fraud is probably reasonably high (this matters, because when base rates are very low, low positive predictive values are not unexpected). But then, no one really knows how to calibrate these tests properly, because the data you’d need to do that simply don’t exist. Serious validation of fMRI as a tool for lie detection would require assembling a large set of brain scans from defendants accused of various crimes (real crimes, not simulated ones) and using that data to predict whether those defendants were ultimately found guilty or not. There really isn’t any substitute for doing a serious study of that sort, but as far as I know, no one’s done it yet. Fortunately, the few judges who’ve had to rule on the courtroom use of fMRI seem to recognize that.

Regarding the existence and maintenance of standards, Dr. Laken testified as to the protocols and controlling standards that he uses for his own exams. Because the use of fMRI-based lie detection is still in its early stages of development, standards controlling the real-life application have not yet been established. Without such standards, a court cannot adequately evaluate the reliability of a particular lie detection examination. Cordoba, 194 F.3d at 1061. Assuming, arguendo, that the standards testified to by Dr. Laken could satisfy Daubert, it appears that Dr. Laken violated his own protocols when he re-scanned Dr. Semrau on the AIMS tests SIQs, after Dr. Semrau was found “deceptive” on the first AIMS tests scan. None of the studies cited by Dr. Laken involved the subject taking a second exam after being found to have been deceptive on the first exam. His decision to conduct a third test begs the question whether a fourth scan would have revealed Dr. Semrau to be deceptive again.
The absence of real-life error rates, lack of controlling standards in the industry for real-life exams, and Dr. Laken’s apparent deviation from his own protocols are negative factors in the analysis of whether fMRI-based lie detection is scientifically valid. See Bonds, 12 F.3d at 560

elsewhere on the net

Wednesday, March 31st, 2010

I’ve been swamped with work lately, so blogging has taken a backseat. I keep a text file on my desktop of interesting things I’d like to blog about; normally, about three-quarters of the links I paste into it go unblogged, but in the last couple of weeks it’s more like 98%. So here are some things I’ve found interesting recently, in no particular order:

It’s World Water Day 2010! Or at least it was a week ago, which is when I should have linked to these really moving photos.

Carl Zimmer has a typically brilliant (and beautifully illustrated) article in the New York Times about “Unseen Beasts, Then and Now“:

Somewhere in England, about 600 years ago, an artist sat down and tried to paint an elephant. There was just one problem: he had never seen one.

John Horgan writes a surprisingly bad guest blog post for Scientific American in which he basically accuses neuroscientists (not a neuroscientist or some neuroscientists, but all of us, collectively) of selling out by working with the US military. I’m guessing that the number of working neuroscientists who’ve ever received any sort of military funding is somewhere south of 10%, and is probably much smaller than the corresponding proportion in any number of other scientific disciplines, but why let data get in the way of a good anecdote or two. [via Peter Reiner]

Mark Liberman follows up his first critique of Louann Brizendine’s new “book” The Male Brain with second one, now that he’s actually got his hands on a copy. Verdict: the book is still terrible. Mark was also kind enough to answer my question about what the mysterious “sexual pursuit area” is. Apparently it’s the medial preoptic area. And the claim that this area governs sexual behavior in humans and is 2.5 times larger in males is, once again, based entirely on work in the rat.

Commuting sucks. Jonah Lehrer discusses evidence from happiness studies (by way of David Brooks) suggesting that most people would be much happier living in a smaller house close to work than a larger house that requires a lengthy commute:

According to the calculations of Frey and Stutzer, a person with a one-hour commute has to earn 40 percent more money to be as satisfied with life as someone who walks to the office.

I’ve taken these findings to heart, and whenever my wife and I move now, we prioritize location over space. We’re currently paying through the nose to live in a 750 square foot apartment near downtown Boulder. It’s about half the size of our old place in St. Louis, but it’s close to everything, including our work, and we love living here.

The modern human brain is much bigger than it used to be, but we didn’t get that way overnight. John Hawks disputes Colin Blakemore’s claim that “the human brain got bigger by accident and not through evolution“.

Sanjay Srivastava leans (or maybe used to lean) toward the permissive side; Andrew Gelman is skeptical. Attitudes toward causal modeling of correlational (and even some experimental) data differ widely. There’s been a flurry of recent work suggesting that causal modeling techniques like mediation analysis and SEM suffer from a number of serious and underappreciated problems, and after reading this paper by Bullock, Green and Ha, I guess I incline to agree.

A landmark ruling by a New York judge yesterday has the potential to invalidate existing patents on genes, which currently cover about 20% of the human genome in some form. Daniel MacArthur has an excellent summary.

internet use causes depression! or not.

Thursday, February 4th, 2010

I have a policy of not saying negative things about people (or places, or things) on this blog, and I think I’ve generally been pretty good about adhering to that policy. But I also think it’s important for scientists to speak up in cases where journalists or other scientists misrepresent scientific research in a way that could have a potentially large impact on people’s behavior, and this is one of those cases. All day long, media outlets have been full of reports about a new study that purportedly reveals that the internet–that most faithful of friends, always just a click away with its soothing, warm embrace–has a dark side: using it makes you depressed!

In fairness, most of the stories have been careful to note that the  study only “links” heavy internet use to depression, without necessarily implying that internet use causes depression. And the authors acknowledge that point themselves:

“While many of us use the Internet to pay bills, shop and send emails, there is a small subset of the population who find it hard to control how much time they spend online, to the point where it interferes with their daily activities,” said researcher Dr. Catriona Morrison, of the University of Leeds, in a statement. “Our research indicates that excessive Internet use is associated with depression, but what we don’t know is which comes first. Are depressed people drawn to the Internet or does the Internet cause depression?”

So you might think all’s well in the world of science and science journalism. But in other places, the study’s authors weren’t nearly so circumspect. For example, the authors suggest that 1.2% of the population can be considered addicted to the internet–a rate they claim is double that of compulsive gambling; and they suggest that their results “feed the public speculation that overengagement in websites that serve/replace a social function might be linked to maladaptive psychological functioning,” and “add weight to the recent suggestion that IA should be taken seriously as a distinct psychiatric construct.”

These are pretty strong claims; if the study’s findings are to be believed, we should at least be seriously considering the possibility that using the internet is making some of us depressed. At worst, we should be diagnosing people with internet addiction and doing… well, presumably something to treat them.

The trouble is that it’s not at all clear that the study’s findings should be believed. Or at least, it’s not clear that they really support any of the statements made above.

Let’s start with what the study (note: restricted access) actually shows. The authors, Catriona Morrison and Helen Gore (M&G), surveyed 1,319 subjects via UK-based social networking sites. They had participants fill out 3 self-report measures: the Internet Addiction Test (IAT), which measures dissatisfaction with one’s internet usage; the Internet Function Questionnaire, which asks respondents to indicate the relative proportion of time they spend on different internet activities (e.g., e-mail, social networking, porn, etc.); and the Beck Depression Inventory (BDI), a very widely-used measure of depression.

M&G identify a number of findings, three of which appear to support most of their conclusions. First, they report a very strong positive correlation (r = .49) between internet addiction and depression scores; second, they identify a small group of 18 subjects (1.2%) who they argue qualify as internet addicts (IA group) based on their scores on the IAT; and third, they suggest that people who used the internet more heavily “spent proportionately more time on online gaming sites, sexually gratifying websites, browsing, online communities and chat sites.”

These findings may sound compelling, but there are a number of methodological shortcomings of the study that make them very difficult to interpret in any meaningful way. As far as I can tell, none of these concerns are addressed in the paper:

First, participants were recruited online, via social networking sites. This introduces a huge selection bias: you can’t expect to obtain accurate estimates of how much, and how adaptively, people use the internet by sampling only from the population of internet users! It’s the equivalent of trying to establish cell phone usage patterns by randomly dialing only land-line numbers. Not a very good idea. And note that, not only could the study not reach people who don’t use the internet, but it was presumably also more likely to oversample from heavy internet users. The more time a person spends online, the greater the chance they’d happen to run into the authors recruitment ad. People who only check their email a couple of times a week would be very unlikely to participate in the study. So the bottom line is, the 1.2% figure the authors arrive at is almost certainly a gross overestimate. The true proportion of people who meet the authors’ criteria for internet addiction is probably much lower. It’s hard to believe the authors weren’t aware of the issue of selection bias, and the massive problem it presents for their estimates, yet they failed to mention it anywhere in their paper.

Second, the cut-off score for being placed in the IA group appears to be completely arbitrary. The Internet Addiction Test itself was developed by Kimberly Young in a 1998 book entitled “Caught in the Net: How to Recognize the Signs of Internet Addiction–and a Winning Strategy to Recovery”. The test was introduced, as far as I can tell (I haven’t read the entire book, just skimmed it in Google Books), with no real psychometric validation. The cut-off of 80 points out of a maximum 100 possible as a threshold for addiction appears to be entirely arbitrary (in fact, in Young’s book, she defines the cut-off as 70; for reasons that are unclear, M&G adopted a cut-off of 80). That is, it’s not like Young conducted extensive empirical analysis and determined that people with scores of X or above were functionally impaired in a way that people with scores below X weren’t; by all appearances, she simply picked numerically convenient cut-offs (20 – 39 is average; 40 – 69 indicates frequent problems; and 70+ basically means the internet is destroying your life). Any small change in the numerical cut-off would have translated into a large change in the proportion of people in M&G’s sample who met criteria for internet addiction, making the 1.2% figure seem even more arbitrary.

Third, M&G claim that the Internet Function Questionnaire they used asks respondents to indicate the proportion of time on the internet that they spend on each of several different activities. For example, given the question “How much of your time online do you spend on e-mail?”, your options would be 0-20%, 21-40%, and so on. You would presume that all the different activities should sum to 100%; after all, you can’t really spend 80% of your online time gaming, and then another 80% looking at porn–unless you’re either a very talented gamer, or have an interesting taste in “games”. Yet, when M&G report absolute numbers for the different activities in tables, they’re not given in percentages at all. Instead, one of the table captions indicates that the values are actually coded on a 6-point Likert scale ranging from “rarely/never” to “very frequently”. Hopefully you can see why this is a problem: if you claim (as M&G do) that your results reflect the relative proportion of time that people spend on different activities, you shouldn’t be allowing people to essentially say anything they like for each activity. Given that people with high IA scores report spending more time overall than they’d like online, is it any surprise if they also report spending more time on individual online activities? The claim that high-IA scorers spend “proportionately more” time on some activities just doesn’t seem to be true–at least, not based on the data M&G report. This might also explain how it could be that IA scores correlated positively with nearly all individual activities. That simply couldn’t be true for real proportions (if you spend proportionately more time on e-mail, you must be spending proportionately less time somewhere else), but it makes perfect sense if the response scale is actually anchored with vague terms like “rarely” and “frequently”.

Fourth, M&G consider two possibilities for the positive correlation between IAT and depression scores: (a) increased internet use causes depression, and (b) depression causes increased internet use. But there’s a third, and to my mind far more plausible, explanation: people who are depressed tend to have more negative self-perceptions, and are much more likely to endorse virtually any question that asks about dissatisfaction with one’s own behavior. Here are a couple of examples of questions on the IAT: “How often do you fear that life without the Internet would be boring, empty, and joyless?” “How often do you try to cut down the amount of time you spend on-line and fail?” Notice that there are really two components to these kinds of questions. One component is internet-specific: to what extent are people specifically concerned about their behavior online, versus in other domains? The other component is a general hedonic one, and has to do with how dissatisfied you are with stuff in general. Now, is there any doubt that, other things being equal, someone who’s depressed is going to be more likely to endorse an item that asks how often they fail at something? Or how often their life feels empty and joyless–irrespective of cause? No, of course not. Depressive people tend to ruminate and worry about all sorts of things. No doubt internet usage is one of those things, but that hardly makes it special or interesting. I’d be willing to bet money that if you created a Shoelace Tying Questionnaire that had questions like “How often do you worry about your ability to tie your shoelaces securely?” and “How often do you try to keep your shoelaces from coming undone and fail?”, you’d also get a positive correlation with BDI scores. Basically, depression and trait negative affect tend to correlate positively with virtually every measure that has a major evaluative component. That’s not news. To the contrary, given the types of questions on the IAT, it would have been astonishing if there wasn’t a robust positive correlation with depression.

Fifth, and related to the previous point, no evidence is ever actually provided that people with high IAT scores differ in their objective behavior from those with low scores. Remember, this is all based on self-report. And not just self-report, but vague self-report. As far as I can tell, M&G never asked respondents to estimate how much time they spent online in a given week. So it’s entirely possible that people who report spending too much time online don’t actually spend much more time online than anyone else; they just feel that way (again, possibly because of a generally negative disposition). There’s actually some support for this idea: A 2004 study that sought to validate the IAT psychometrically found only a .22 correlation between IAT scores and self-reported time spent online. Now, a .22 correlation is perfectly meaningful, and it suggests that people who feel they spend too much time online also estimate that they really do spend more time online (though, again, bias is a possibility here too). But it’s a much smaller correlation than the one between IAT scores and depression, which fits with the above idea that there may not be any real “link” between internet use and depression above and beyond the fact that depressed individuals are more likely to more negatively-worded items.

Finally, even if you ignore the above considerations, and decide to conclude that there is in fact a non-artifactual correlation between depression and internet use, there’s really no reason you would conclude that that’s a bad thing (which M&G hedge on, and many of the news articles haven’t hesitated to play up). It’s entirely plausible that the reason depressed individuals might spend more time online is because it’s an effective form of self-medication. If you’re someone who has trouble mustering up the energy to engage with the outside world, or someone who’s socially inhibited, online communities might provide you with a way to fulfill your social needs in a way that you would otherwise not have been able to. So it’s quite conceivable that heavy internet use makes people less depressed, not more; it’s just that the people who are more likely to use the internet heavily are more depressed to begin with. I’m not suggesting that this is in fact true (I find the artifactual explanation for the IAT-BDI correlation suggested above much more plausible), but just that the so-called “dark side” of the internet could actually be a very good thing.

In sum, what can we learn from M&G’s paper? Not that much. To be fair, I don’t necessarily think it’s a terrible paper; it has its limitations, but every paper does. The problem isn’t so much that the paper is bad; it’s that the findings it contains were blown entirely out of proportion, and twisted to support headlines (most of them involving the phrase “The Dark Side”) that they couldn’t possibly support. The internet may or may not cause depression (probably not), but you’re not going to get much traction on that question by polling a sample of internet respondents, using measures that have a conceptual overlap with depression, and defining groups based on arbitrary cut-offs. The jury remains open, of course, but these findings by themselves don’t really give us any reason to reconsider or try to change our online behavior.

ResearchBlogging.org
Morrison, C., & Gore, H. (2010). The Relationship between Excessive Internet Use and Depression: A Questionnaire-Based Study of 1,319 Young People and Adults Psychopathology, 43 (2), 121-126 DOI: 10.1159/000277001