In defense of In Defense of Facebook

A long, long time ago (in social media terms), I wrote a post defending Facebook against accusations of ethical misconduct related to a newly-published study in PNAS. I won’t rehash the study, or the accusations, or my comments in any detail here; for that, you can read the original post (I also recommend reading this or this for added context). While I stand by most of what I wrote, as is the nature of things, sometimes new information comes to light, and sometimes people say things that make me change my mind. So I thought I’d post my updated thoughts and reactions. I also left some additional thoughts in a comment on my last post, which I won’t rehash here.

Anyway, in no particular order…

I’m not arguing for a lawless world where companies can do as they like with your data

Some people apparently interpreted my last post as a defense of Facebook’s data use policy in general. It wasn’t. I probably brought this on myself in part by titling the post “In Defense of Facebook”. Maybe I should have called it something like “In Defense of this one particular study done by one Facebook employee”. In any case, I’ll reiterate: I’m categorically not saying that Facebook–or any other company, for that matter–should be allowed to do whatever it likes with its users’ data. There are plenty of valid concerns one could raise about the way companies like Facebook store, manage, and use their users’ data. And for what it’s worth, I’m generally in favor of passing new rules regulating the use of personal data in the private sector. So, contrary to what some posts suggested, I was categorically not advocating for a laissez-faire world in which large corporations get to do as they please with your information, and there’s nothing us little people can do about it.

The point I made in my last post was much narrower than that–namely, that picking on the PNAS study as an example of ethically questionable practices at Facebook was a bad idea, because (a) there aren’t any new risks introduced by this manipulation that aren’t already dwarfed by the risks associated with using Facebook itself (which is not exactly a high-risk enterprise to begin with), and (b) there are literally thousands of experiments just like this being conducted every day by large companies intent on figuring out how best to market their products and services–so Facebook’s study doesn’t stand out in any respect. My point was not that you shouldn’t be concerned about who has your data and how they’re using it, but that it’s deeply counterproductive to go after Facebook for this particular experiment when Facebook is of the few companies in this arena who actually (occasionally) publish the results of their findings in the scientific literature, instead of hiding them entirely from the light, as almost everyone else does. Of course, that will probably change as a result of this controversy.

I Was Wrong–A/B Testing Edition.

One claim I made in my last post that was very clearly wrong is this (emphasis added):

What makes the backlash on this issue particularly strange is that I’m pretty sure most people do actually realize that their experience on Facebook (and on other websites, and on TV, and in restaurants, and in museums, and pretty much everywhere else) is constantly being manipulated. I expect that most of the people who’ve been complaining about the Facebook study on Twitter are perfectly well aware that Facebook constantly alters its user experience–I mean, they even see it happen in a noticeable way once in a while, whenever Facebook introduces a new interface.

After watching the commentary over the past two days, I think it’s pretty clear I was wrong about this. A surprisingly large number of people clearly were genuinely unaware that Facebook, Twitter, Google, and other major players in every major industry (not just tech–also banks, groceries, department stores, you name it) are constantly running large-scale, controlled experiments on their users and customers. For instance, here’s a telling comment left on my last post:

The main issue I have with the experiment is that they conducted it without telling us. Given, that would have been counterproductive, but even a small adverse affect is still an adverse affect. I just don’t like the idea that corporations can do stuff to me without my consent. Just my opinion.

Similar sentiments are all over the place. Clearly, the revelation that Facebook regularly experiments on its users without their knowledge was indeed just that to many people–a revelation. I suppose in this sense, there’s potentially a considerable upside to this controversy, inasmuch as it has clearly served to raise awareness of industry-standard practices.

Questions about the ethics of the PNAS paper’s publication

My post focused largely on the question of whether the experiment Facebook conducted was itself illegal or unethical. I took this to be the primary concern of most lay people who have expressed concern about the episode. As I discussed in my post, I think it’s quite clear that the experiment itself is (a) entirely legal and that (b) any ethical objections one could raise are actually much broader objections about the way we regulate data use and consumer privacy, and have nothing to do with Facebook in particular. However, there’s a separate question that does specifically concern Facebook–or really, the authors of the PNAS paper–which is whether the authors, in their efforts to publish their findings, violated any laws or regulations.

When I wrote my post, I was under the impression–based largely on reports of an interview with the PNAS editor, Susan Fiske–that the authors had in fact obtained approval to conduct the study from an IRB, and had simply neglected to include that information in the text (which would have been an Editorial lapse, but not an unethical act). I wrote as much in a comment on my post. I was not suggesting–as some seemed to take away–that Facebook doesn’t need to get IRB approval. I was operating on the assumption that it had obtained IRB approval, based on the information available at the time.

In any case, it now appears that may not be exactly what happened. Unfortunately, it’s not yet clear exactly what did happen. One version of events people have suggested is that the study’s authors exploited a loophole in the rules by having Facebook conduct and analyze the experiment without the involvement of the other authors–who only contributed to the genesis of the idea and the writing of the manuscript. However, this interpretation is not unambiguous, and risks maligning the authors’ reputations unfairly, because Adam Kramer’s post explaining the motivation for the experiment suggests that the idea for the experiment originated entirely at Facebook, and was related to internal needs:

The reason we did this research is because we care about the emotional impact of Facebook and the people that use our product. We felt that it was important to investigate the common worry that seeing friends post positive content leads to people feeling negative or left out. At the same time, we were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook. We didn’t clearly state our motivations in the paper.

How you interpret the ethics of the study thus depends largely on what you believe actually happened. If you believe that the genesis and design of the experiment were driven by Facebook’s internal decision-making, and the decision to publish an interesting finding came only later, then there’s nothing at all ethically questionable about the authors’ behavior. It would have made no more sense to seek out IRB approval for this one experiment than for any of the other in-house experiments Facebook regularly conducts. And there is, again, no question whatsoever that Facebook does not have to get approval from anyone to do experiments that are not for the purpose of systematic, generalizable research.

Moreover, since the non-Facebook authors did in fact ask the IRB to review their proposal to use archival data–and the IRB exempted them from review, as is routinely done for this kind of analysis–there would be no legitimacy to the claim that the authors acted unethically. About the only claim one could raise an eyebrow at is that the authors “didn’t clearly state” their motivations. But since presenting a post-hoc justification for one’s studies that has nothing to do with the original intention is extremely common in psychology (though it shouldn’t be), it’s not really fair to fault Kramer et al for doing something that is standard practice.

If, on the other hand, the idea for the study did originate outside of Facebook, and the authors deliberately attempted to avoid prospective IRB review, then I think it’s fair to say that their behavior was unethical. However, given that the authors were following the letter of the law (if clearly not the spirit), it’s not clear that PNAS should have, or could have, rejected the paper. It certainly should have demanded that information regarding interactions with the IRB be included in the manuscript, and perhaps it could have published some kind of expression of concern alongside the paper. But I agree with Michelle Meyer’s analysis that, in taking the steps they took, the authors are almost certainly operating within the rules, because (a) Facebook itself is not subject to HHS rules, (b) the non-Facebook authors were not technically “engaged in research”, and (c) the archival use of already-collected data by the non-Facebook authors was approved by the Cornell IRB (or rather, the study was exempted from further review).

Absent clear evidence of what exactly happened in the lead-up to publication, I think the appropriate course of action is to withhold judgment. In the interim, what the episode clearly does do is lay bare how ill-prepared the existing HHS regulations are for dealing with the research use of data collected online–particularly when the data was acquired by private entities. Actually, it’s not just research use that’s problematic; it’s clear that many people complaining about Facebook’s conduct this week don’t really give a hoot about the “generalizable knowledge” side of things, and are fundamentally just upset that Facebook is allowed to run these kinds of experiments at all without providing any notification.

In my view, what’s desperately called for is a new set of regulations that provide a unitary code for dealing with consumer data across the board–i.e., in both research and non-research contexts. This leaves aside exactly what such regulations would look like, of course. My personal view is that the right direction to move in is to tighten consumer protection laws to better regulate management and use of private citizens’ data, while simultaneously liberalizing the research use of private datasets that have already been acquired. For example, I would favor a law that (a) forced Facebook and other companies to more clearly and explicitly state how they use their users’ data, (b) provided opt-out options when possible, along with the ability for users to obtain report of how their data has been used in the past, and (c) gave blanket approval to use data acquired under these conditions for any and all academic research purposes so long as the data are deidentified. Many people will disagree with this, of course, and have very different ideas. That’s fine; the key point is that the conversation we should be having is about how to update and revise the rules governing research vs. non-research uses of data in such a way that situations like the PNAS study don’t come up again.

What Facebook does is not research–until they try to publish it

Much of the outrage over the Facebook experiment is centered around the perception that Facebook shouldn’t be allowed to conduct research on its users without their consent. What many people mean by this, I think, is that Facebook shouldn’t be allowed to conduct any experiments on its users for purposes of learning things about user experience and behavior unless Facebook explicitly asks for permission. A point that I should have clarified in my original post is that Facebook users are, in the normal course of things, not considered participants in a research study, no matter how or how much their emotions are manipulated. That’s because the HHS’s definition of research includes, as a necessary component, that there be an active intention to contribute to generalizable new knowledge.

Now, to my mind, this isn’t a great way to define “research”–I think it’s a good idea to avoid definitions that depend on knowing what people’s intentions were when they did something. But that’s the definition we’re stuck with, and there’s really no ambiguity over whether Facebook’s normal operations–which include constant randomized, controlled experimentation on its users–constitute research in this sense. They clearly don’t. Put simply, if Facebook were to eschew disseminating its results to the broader community, the experiment in question would not have been subject to any HHS regulations whatsoever (though, as Michelle Meyer astutely pointed out, technically the experiment probably isn’t subject to HHS regulation even now, so the point is moot). Again, to reiterate: it’s only the fact that Kramer et al wanted to publish their results in a scientific journal that opened them up to criticism of research misconduct in the first place.

This observation may not have any impact on your view if your concern is fundamentally about the publication process–i.e., you don’t object to Facebook doing the experiment; what you object to is Facebook trying to disseminate their findings as research. But it should have a strong impact on your views if you were previously under the impression that Facebook’s actions must have violated some existing human subjects regulation or consumer protection law. The laws in the United States–at least as I understand them, and I admittedly am not a lawyer–currently afford you no such protection.

Now, is it a good idea to have two very separate standards, one for research and one for everything else? Probably not. Should Facebook be allowed to do whatever it wants to your user experience so long as it’s covered under the Data Use policy in the user agreement you didn’t read? Probably not. But what’s unequivocally true is that, as it stands right now, your interactions with Facebook–no matter how your user experience, data, or emotions are manipulated–are not considered research unless Facebook manipulates your experience with the express intent of disseminating new knowledge to the world.

Informed consent is not mandatory for research studies

As a last point, there seems to be a very common misconception floating around among commentators that the Facebook experiment was unethical because it didn’t provide informed consent, which is a requirement for all research studies involving experimental manipulation. I addressed this in the comments on my last post in response to other comments:

[I]t’s simply not correct to suggest that all human subjects research requires informed consent. At least in the US (where Facebook is based), the rules governing research explicitly provide for a waiver of informed consent. Directly from the HHS website:

An IRB may approve a consent procedure which does not include, or which alters, some or all of the elements of informed consent set forth in this section, or waive the requirements to obtain informed consent provided the IRB finds and documents that:

(1) The research involves no more than minimal risk to the subjects;

(2) The waiver or alteration will not adversely affect the rights and welfare of the subjects;

(3) The research could not practicably be carried out without the waiver or alteration; and

(4) Whenever appropriate, the subjects will be provided with additional pertinent information after participation.

Granting such waivers is a commonplace occurrence; I myself have had online studies granted waivers before for precisely these reasons. In this particular context, it’s very clear that conditions (1) and (2) are met (because this easily passes the “not different from ordinary experience” test). Further, Facebook can also clearly argue that (3) is met, because explicitly asking for informed consent is likely not viable given internal policy, and would in any case render the experimental manipulation highly suspect (because it would no longer be random). The only point one could conceivably raise questions about is (4), but here again I think there’s a very strong case to be made that Facebook is not about to start providing debriefing information to users every time it changes some aspect of the news feed in pursuit of research, considering that its users have already agreed to its User Agreement, which authorizes this and much more.

Now, if you disagree with the above analysis, that’s fine, but what should be clear enough is that there are many IRBs (and I’ve personally interacted with some of them) that would have authorized a waiver of consent in this particular case without blinking. So this is clearly well within “reasonable people can disagree” territory, rather than “oh my god, this is clearly illegal and unethical!” territory.

I can understand the objection that Facebook should have applied for IRB approval prior to conducting the experiment (though, as I note above, that’s only true if the experiment was initially conducted as research, which is not clear right now). However, it’s important to note that there is no guarantee that an IRB would have insisted on informed consent at all in this case. There’s considerable heterogeneity in different IRBs’ interpretation of the HHS guidelines (and in fact, even across different reviewers within the same IRB), and I don’t doubt that many IRBs would have allowed Facebook’s application to sail through without any problems (see, e.g., this comment on my last post)–though I think there’s a general consensus that a debriefing of some kind would almost certainly be requested.

In defense of Facebook

[UPDATE July 1st: I’ve now posted some additional thoughts in a second post here.]

It feels a bit strange to write this post’s title, because I don’t find myself defending Facebook very often. But there seems to be some discontent in the socialmediaverse at the moment over a new study in which Facebook data scientists conducted a large-scale–over half a million participants!–experimental manipulation on Facebook in order to show that emotional contagion occurs on social networks. The news that Facebook has been actively manipulating its users’ emotions has, apparently, enraged a lot of people.

The study

Before getting into the sources of that rage–and why I think it’s misplaced–though, it’s worth describing the study and its results. Here’s a description of the basic procedure, from the paper:

The experiment manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether exposure to emotional content led people to post content that was consistent with the exposure—thereby testing whether exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion. People who viewed Facebook in English were qualified for selection into the experiment. Two parallel experiments were conducted for positive and negative emotion: One in which exposure to friends’ positive emotional content in their News Feed was reduced, and one in which exposure to negative emotional content in their News Feed was reduced. In these conditions, when a person loaded their News Feed, posts that contained emotional content of the relevant emotional valence, each emotional post had between a 10% and 90% chance (based on their User ID) of being omitted from their News Feed for that specific viewing.

And here’s their central finding:

What the figure shows is that, in the experimental conditions, where negative or positive emotional posts are censored, users produce correspondingly more positive or negative emotional words in their own status updates. Reducing the number of negative emotional posts users saw led those users to produce more positive, and fewer negative words (relative to the unmodified control condition); conversely, reducing the number of presented positive posts led users to produce more negative and fewer positive words of their own.

Taken at face value, these results are interesting and informative. For the sake of contextualizing the concerns I discuss below, though, two points are worth noting. First, these effects, while highly statistically significant, are tiny. The largest effect size reported had a Cohen’s d of 0.02–meaning that eliminating a substantial proportion of emotional content from a user’s feed had the monumental effect of shifting that user’s own emotional word use by two hundredths of a standard deviation. In other words, the manipulation had a negligible real-world impact on users’ behavior. To put it in intuitive terms, the effect of condition in the Facebook study is roughly comparable to a hypothetical treatment that increased the average height of the male population in the United States by about one twentieth of an inch (given a standard deviation of ~2.8 inches). Theoretically interesting, perhaps, but not very meaningful in practice.

Second, the fact that users in the experimental conditions produced content with very slightly more positive or negative emotional content doesn’t mean that those users actually felt any differently. It’s entirely possible–and I would argue, even probable–that much of the effect was driven by changes in the expression of ideas or feelings that were already on users’ minds. For example, suppose I log onto Facebook intending to write a status update to the effect that I had an “awesome day today at the beach with my besties!” Now imagine that, as soon as I log in, I see in my news feed that an acquaintance’s father just passed away. I might very well think twice about posting my own message–not necessarily because the news has made me feel sad myself, but because it surely seems a bit unseemly to celebrate one’s own good fortune around people who are currently grieving. I would argue that such subtle behavioral changes, while certainly responsive to others’ emotions, shouldn’t really be considered genuine cases of emotional contagion. Yet given how small the effects were, one wouldn’t need very many such changes to occur in order to produce the observed results. So, at the very least, the jury should still be out on the extent to which Facebook users actually feel differently as a result of this manipulation.

The concerns

Setting aside the rather modest (though still interesting!) results, let’s turn to look at the criticism. Here’s what Katy Waldman, writing in a Slate piece titled “Facebook’s Unethical Experiment“, had to say:

The researchers, who are affiliated with Facebook, Cornell, and the University of California–San Francisco, tested whether reducing the number of positive messages people saw made those people less likely to post positive content themselves. The same went for negative messages: Would scrubbing posts with sad or angry words from someone’s Facebook feed make that person write fewer gloomy updates?

The upshot? Yes, verily, social networks can propagate positive and negative feelings!

The other upshot: Facebook intentionally made thousands upon thousands of people sad.

Or consider an article in the The Wire, quoting Jacob Silverman:

“What’s disturbing about how Facebook went about this, though, is that they essentially manipulated the sentiments of hundreds of thousands of users without asking permission (blame the terms of service agreements we all opt into). This research may tell us something about online behavior, but it’s undoubtedly more useful for, and more revealing of, Facebook’s own practices.”

On Twitter, the reaction to the study has been similarly negative). A lot of people appear to be very upset at the revelation that Facebook would actively manipulate its users’ news feeds in a way that could potentially influence their emotions.

Why the concerns are misplaced

To my mind, the concerns expressed in the Slate piece and elsewhere are misplaced, for several reasons. First, they largely mischaracterize the study’s experimental procedures–to the point that I suspect most of the critics haven’t actually bothered to read the paper. In particular, the suggestion that Facebook “manipulated users’ emotions” is quite misleading. Framing it that way tacitly implies that Facebook must have done something specifically designed to induce a different emotional experience in its users. In reality, for users assigned to the experimental condition, Facebook simply removed a variable proportion of status messages that were automatically detected as containing positive or negative emotional words. Let me repeat that: Facebook removed emotional messages for some users. It did not, as many people seem to be assuming, add content specifically intended to induce specific emotions. Now, given that a large amount of content on Facebook is already highly emotional in nature–think about all the people sharing their news of births, deaths, break-ups, etc.–it seems very hard to argue that Facebook would have been introducing new risks to its users even if it had presented some of them with more emotional content. But it’s certainly not credible to suggest that replacing 10% – 90% of emotional content with neutral content constitutes a potentially dangerous manipulation of people’s subjective experience.

Second, it’s not clear what the notion that Facebook users’ experience is being “manipulated” really even means, because the Facebook news feed is, and has always been, a completely contrived environment. I hope that people who are concerned about Facebook “manipulating” user experience in support of research realize that Facebook is constantly manipulating its users’ experience. In fact, by definition, every single change Facebook makes to the site alters the user experience, since there simply isn’t any experience to be had on Facebook that isn’t entirely constructed by Facebook. When you log onto Facebook, you’re not seeing a comprehensive list of everything your friends are doing, nor are you seeing a completely random subset of events. In the former case, you would be overwhelmed with information, and in the latter case, you’d get bored of Facebook very quickly. Instead, what you’re presented with is a carefully curated experience that is, from the outset, crafted in such a way as to create a more engaging experience (read: keeps you spending more time on the site, and coming back more often). The items you get to see are determined by a complex and ever-changing algorithm that you make only a partial contribution to (by indicating what you like, what you want hidden, etc.). It has always been this way, and it’s not clear that it could be any other way. So I don’t really understand what people mean when they sarcastically suggest–as Katy Waldman does in her Slate piece–that “Facebook reserves the right to seriously bum you out by cutting all that is positive and beautiful from your news feed”. Where does Waldman think all that positive and beautiful stuff comes from in the first place? Does she think it spontaneously grows wild in her news feed, free from the meddling and unnatural influence of Facebook engineers?

Third, if you were to construct a scale of possible motives for manipulating users’ behavior–with the global betterment of society at one end, and something really bad at the other end–I submit that conducting basic scientific research would almost certainly be much closer to the former end than would the other standard motives we find on the web–like trying to get people to click on more ads. The reality is that Facebook–and virtually every other large company with a major web presence–is constantly conducting large controlled experiments on user behavior. Data scientists and user experience researchers at Facebook, Twitter, Google, etc. routinely run dozens, hundreds, or thousands of experiments a day, all of which involve random assignment of users to different conditions. Typically, these manipulations aren’t conducted in order to test basic questions about emotional contagion; they’re conducted with the explicit goal of helping to increase revenue. In other words, if the idea that Facebook would actively try to manipulate your behavior bothers you, you should probably stop reading this right now and go close your account. You also should definitely not read this paper suggesting that a single social message on Facebook prior to the last US presidential election the may have single-handedly increased national voter turn-out by as much as 0.6%). Oh, and you should probably also stop using Google, YouTube, Yahoo, Twitter, Amazon, and pretty much every other major website–because I can assure you that, in every single case, there are people out there who get paid a good salary to… yes, manipulate your emotions and behavior! For better or worse, this is the world we live in. If you don’t like it, you can abandon the internet, or at the very least close all of your social media accounts. But the suggestion that Facebook is doing something unethical simply by publishing the results of one particular experiment among thousands–and in this case, an experiment featuring a completely innocuous design that, if anything, is probably less motivated by a profit motive than most of what Facebook does–seems kind of absurd.

Fourth, it’s worth keeping in mind that there’s nothing intrinsically evil about the idea that large corporations might be trying to manipulate your experience and behavior. Everybody you interact with–including every one of your friends, family, and colleagues–is constantly trying to manipulate your behavior in various ways. Your mother wants you to eat more broccoli; your friends want you to come get smashed with them at a bar; your boss wants you to stay at work longer and take fewer breaks. We are always trying to get other people to feel, think, and do certain things that they would not otherwise have felt, thought, or done. So the meaningful question is not whether people are trying to manipulate your experience and behavior, but whether they’re trying to manipulate you in a way that aligns with or contradicts your own best interests. The mere fact that Facebook, Google, and Amazon run experiments intended to alter your emotional experience in a revenue-increasing way is not necessarily a bad thing if in the process of making more money off you, those companies also improve your quality of life. I’m not taking a stand one way or the other, mind you, but simply pointing out that without controlled experimentation, the user experience on Facebook, Google, Twitter, etc. would probably be very, very different–and most likely less pleasant. So before we lament the perceived loss of all those “positive and beautiful” items in our Facebook news feeds, we should probably remind ourselves that Facebook’s ability to identify and display those items consistently is itself in no small part a product of its continual effort to experimentally test its offering by, yes, experimentally manipulating its users’ feelings and thoughts.

What makes the backlash on this issue particularly strange is that I’m pretty sure most people do actually realize that their experience on Facebook (and on other websites, and on TV, and in restaurants, and in museums, and pretty much everywhere else) is constantly being manipulated. I expect that most of the people who’ve been complaining about the Facebook study on Twitter are perfectly well aware that Facebook constantly alters its user experience–I mean, they even see it happen in a noticeable way once in a while, whenever Facebook introduces a new interface. Given that Facebook has over half a billion users, it’s a foregone conclusion that every tiny change Facebook makes to the news feed or any other part of its websites induces a change in millions of people’s emotions. Yet nobody seems to complain about this much–presumably because, when you put it this way, it seems kind of silly to suggest that a company whose business model is predicated on getting its users to use its product more would do anything other than try to manipulate its users into, you know, using its product more.

Why the backlash is deeply counterproductive

Now, none of this is meant to suggest that there aren’t legitimate concerns one could raise about Facebook’s more general behavior–or about the immense and growing social and political influence that social media companies like Facebook wield. One can certainly question whether it’s really fair to expect users signing up for a service like Facebook’s to read and understand user agreements containing dozens of pages of dense legalese, or whether it would make sense to introduce new regulations on companies like Facebook to ensure that they don’t acquire or exert undue influence on their users’ behavior (though personally I think that would be unenforceable and kind of silly). So I’m certainly not suggesting that we give Facebook, or any other large web company, a free pass to do as it pleases. What I am suggesting, however, is that even if your real concerns are, at bottom, about the broader social and political context Facebook operates in, using this particular study as a lightning rod for criticism of Facebook is an extremely counterproductive, and potentially very damaging, strategy.

Consider: by far the most likely outcome of the backlash Facebook is currently experiencing is that, in future, its leadership will be less likely to allow its data scientists to publish their findings in the scientific literature. Remember, Facebook is not a research institute expressly designed to further understanding of the human condition; it’s a publicly-traded corporation that exists to create wealth for its shareholders. Facebook doesn’t have to share any of its data or findings with the rest of the world if it doesn’t want to; it could comfortably hoard all of its knowledge and use it for its own ends, and no one else would ever be any wiser for it. The fact that Facebook is willing to allow its data science team to spend at least some of its time publishing basic scientific research that draws on Facebook’s unparalleled resources is something to be commended, not criticized.

There is little doubt that the present backlash will do absolutely nothing to deter Facebook from actually conducting controlled experiments on its users, because A/B testing is a central component of pretty much every major web company’s business strategy at this point–and frankly, Facebook would be crazy not to try to empirically determine how to improve user experience. What criticism of the Kramer et al article will almost certainly do is decrease the scientific community’s access to, and interaction with, one of the largest and richest sources of data on human behavior in existence. You can certainly take a dim view of Facebook as a company if you like, and you’re free to critique the way they do business to your heart’s content. But haranguing Facebook and other companies like it for publicly disclosing scientifically interesting results of experiments that it is already constantly conducting anyway–and that are directly responsible for many of the positive aspects of the user experience–is not likely to accomplish anything useful. If anything, it’ll only ensure that, going forward, all of Facebook’s societally relevant experimental research is done in the dark, where nobody outside the company can ever find out–or complain–about it.

[UPDATE July 1st: I’ve posted some additional thoughts in a second post here.]

aftermath of the NYT / Lindstrom debacle

Over the last few days the commotion over Martin Lindstrom’s terrible New York Times iPhone loving Op-Ed, which I wrote about in my last post, seems to have spread far and wide. Highlights include excellent posts by David Dobbs and the Neurocritic, but really there are too many to list at this point. And the verdict is overwhelmingly negative; I don’t think I’ve seen a single post in defense of Lindstrom, which is probably not a good sign (for him).

In the meantime, Russ Poldrack and over 40 other neuroscientists and psychologists (including me) wrote a letter to the NYT complaining about the Lindstrom Op-Ed, which the NYT has now published. As per usual, they edited down the letter till it almost disappeared. But the original, along with a list of signees, is on Russ’s blog.

Anyway, the fact that the Times published the rebuttal letter is all well and good, but as I mentioned in my last post, the bigger problem is that since the Times doesn’t include links to related content on their articles, people who stumble across the Op-Ed aren’t going to have any way of knowing that it’s been roundly discredited by pretty much the entire web. Lindstrom’s piece was the most emailed article on the Times website for a day or two, but only a tiny fraction of those readers will ever see (or even hear about) the critical response. As far as I know, the NYT hasn’t issued an explanation or apology for publishing the Op-Ed; they’ve simply published the letter and gone on about their business (I guess I can’t fault them for this–if they had to issue a formal apology for every mistake that gets published, they’d have no time for anything else; the trick is really to catch this type of screw-up at the front end). Adding links from each article to related content wouldn’t solve the problem entirely, of course, but it would be something. The fact that Times’ platform currently doesn’t have this capacity is kind of perplexing.

The other point worth mentioning is that, in the aftermath of the tsunami of criticism he received, Lindstrom left a comment on several blogs (Russ Poldrack and David Dobbs were lucky recipients; sadly, I wasn’t on the guest list). Here’s the full text of the comment:

My first foray into neuro-marketing research was for my New York Times bestseller Buyology: Truth and Lies about Why We Buy. For that book I teamed up with Neurosense, a leading independent neuro-marketing company that specializes in consumer research using functional magnetic resonance imaging (fMRI) headed by Oxford University trained Gemma Calvert, BSc DPhil CPsychol FRSA and Neuro-Insight, a market research company that uses unique brain-imaging technology, called Steady-State Topography (SST), to measure how the brain responds to communications which is lead by Dr. Richard Silberstein, PhD. This was the single largest neuro-marketing study ever conducted—25x larger than any such study to date and cost more than seven million dollars to run.

In the three-year effort scientists scanned the brains of over 2,000 people from all over the world as they were exposed to various marketing and advertising strategies including clever product placements, sneaky subliminal messages, iconic brand logos, shocking health and safety warnings, and provocative product packages. The purpose of all of this was to understand, quite successfully I may add, the key drivers behind why we make the purchasing decisions that we do.

For the research that my recent Op-Ed column in the New York Times was based on I turned to Dr. David Hubbard, a board-certified neurologist and his company MindSign Neuro Marketing, an independently owned fMRI neuro-marketing company. I asked Dr. Hubbard and his team a simple question, “Are we addicted to our iPhones?” After analyzing the brains of 8 men and 8 women between the ages of 18-25 using fMRI technology, MindSign answered my question using standardized answering methods and completely reproducible results. The conclusion was that we are not addicted to our iPhones, we are in love with them.

The thought provoking dialogue that has been generated from the article has been overwhelmingly positive and I look forward to the continued comments from professionals in the field, readers and fans.

Respectfully,

Martin Lindstrom

As evasive responses go, this is a masterpiece; at no point does Lindstrom ever actually address any of the substantive criticisms leveled at him. He spends most of his response name dropping (the list of credentials is almost long enough to make you forget that the rebuttal letter to his Op-Ed was signed by over 40 PhDs) and rambling about previous unrelated neuromarketing work (which may as well not exist, since none of it has ever been made public), and then closes by shifting the responsibility for the study to MindSign, the company he paid to run the iPhone study. The claim that MindSign “answered [his] question using standardized answering methods and completely reproducible results” is particularly ludicrous; as I explained in my last post, there currently aren’t any standardized methods for reading addiction or love off of brain images. And ‘completely reproducible results’ implies that one has, you know, successfully reproduced the results, which is simply false unless Lindstrom is suggesting that MindSign did the same experiment twice. It’s hard to see any “thought provoking dialogue” taking place here, and the neuroimaging community’s response to the Op-Ed column has been, virtually without exception, overwhelmingly negative, not positive (as Lindstrom claims).

That all said, I do think there’s one very positive aspect to this entire saga, and that’s the amazing speed and effectiveness of the response from scientists, science journalists, and other scientifically literate folks. Ten years ago, Lindstrom’s piece might have gone completely unchallenged–and even if someone like Russ Poldrack had written a response, it would probably have appeared much later, been signed by fewer scientists (because coordination would have been much more difficult), and received much less attention. But with 48 hours of Lindstrom’s Op-Ed being published, dozens of critical blog posts had appeared, and hundreds, if not thousands, of people all over the world had tweeted or posted links to these critiques (my last post alone received over 12,000 hits). Scientific discourse, which used to be confined largely to peer-reviewed print journals and annual conferences, now takes place at a remarkable pace online, and it’s fantastic to see social media used in this way. The hope is that as these technologies develop further and scientists take on a more active role in communicating with the public (something that platforms like Twitter and Google+ seem to be facilitating amazingly well), it’ll become increasingly difficult for people like Lindstrom to make crazy pseudoscientific claims without being immediately and visibly called out on it–even in those rare cases when the NYT makes the mistake of leaving one the biggest microphones on earth open and unmonitored.

the New York Times blows it big time on brain imaging

The New York Times has a terrible, terrible Op-Ed piece today by Martin Lindstrom (who I’m not going to link to, because I don’t want to throw any more bones his way). If you believe Lindstrom, you don’t just like your iPhone a lot; you love it. Literally. And the reason you love it, shockingly, is your brain:

Earlier this year, I carried out an fMRI experiment to find out whether iPhones were really, truly addictive, no less so than alcohol, cocaine, shopping or video games. In conjunction with the San Diego-based firm MindSign Neuromarketing, I enlisted eight men and eight women between the ages of 18 and 25. Our 16 subjects were exposed separately to audio and to video of a ringing and vibrating iPhone.

But most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion. The subjects’ brains responded to the sound of their phones as they would respond to the presence or proximity of a girlfriend, boyfriend or family member.

In short, the subjects didn’t demonstrate the classic brain-based signs of addiction. Instead, they loved their iPhones.

There’s so much wrong with just these three short paragraphs (to say nothing of the rest of the article, which features plenty of other whoppers) that it’s hard to know where to begin. But let’s try. Take first the central premise–that an fMRI experiment could help determine whether iPhones are no less addictive than alcohol or cocaine. The tacit assumption here is that all the behavioral evidence you could muster–say, from people’s reports about how they use their iPhones, or clinicians’ observations about how iPhones affect their users–isn’t sufficient to make that determination; to “really, truly” know if something’s addictive, you need to look at what the brain is doing when people think about their iPhones. This idea is absurd inasmuch as addiction is defined on the basis of its behavioral consequences, not (right now, anyway) by the presence or absence of some biomarker. What makes someone an alcoholic is the fact that they’re dependent on alcohol, have trouble going without it, find that their alcohol use interferes with multiple aspects of their day-to-day life, and generally suffer functional impairment because of it–not the fact that their brain lights up when they look at pictures of Johnny Walker red. If someone couldn’t stop drinking–to the point where they lost their job, family, and friends–but their brain failed to display a putative biomarker for addiction, it would be strange indeed to say “well, you show all the signs, but I guess you’re not really addicted to alcohol after all.”

Now, there may come a day (and it will be a great one) when we have biomarkers sufficiently accurate that they can stand in for the much more tedious process of diagnosing someone’s addiction the conventional way. But that day is, to put it gently, a long way off. Right now, if you want to know if iPhones are addictive, the best way to do that is to, well, spend some time observing and interviewing iPhone users (and some quantitative analysis would be helpful).

Of course, it’s not clear what Lindstrom thinks an appropriate biomarker for addiction would be in any case. Presumably it would have something to do with the reward system; but what? Suppose Lindstrom had seen robust activation in the ventral striatum–a critical component of the brain’s reward system–when participants gazed upon the iPhone: what then? Would this have implied people are addicted to iPhones? But people also show striatal activity when gazing on food, money, beautiful faces, and any number of other stimuli. Does that mean the average person is addicted to all of the above? A marker of pleasure or reward, maybe (though even that’s not certain), but addiction? How could a single fMRI experiment with 16 subjects viewing pictures of iPhones confirm or disconfirm the presence of addiction? Lindstrom doesn’t say. I suppose he has good reason not to say: if he really did have access to an accurate fMRI-based biomarker for addiction, he’d be in a position to make millions (billions?) off the technology. To date, no one else has come close to identifying a clinically accurate fMRI biomarker for any kind of addiction (for more technical readers, I’m talking here about cross-validated methods that have both sensitivity and specificity comparable to traditional approaches when applied to new subjects–not individual studies that claim 90% with-sample classification accuracy based on simple regression models). So we should, to put it mildly, be very skeptical that Lindstrom’s study was ever in a position to do what he says it was designed to do.

We should also ask all sorts of salient and important questions about who the people are who are supposedly in love with their iPhones. Who’s the “You” in the “You Love Your iPhone” of the title? We don’t know, because we don’t know who the participants in Lindstrom’s sample, were, aside from the fact that they were eight men and eight women aged 18 to 25. But we’d like to know some other important things. For instance, were they selected for specific characteristics? Were they, say, already avid iPhone users? Did they report loving, or being addicted to their iPhones? If so, would it surprise us that people chosen for their close attachment to their iPhones also showed brain activity patterns typical of close attachment? (Which, incidentally, they actually don’t–but more on that below.) And if not, are we to believe that the average person pulled off the street–who probably has limited experience with iPhones–really responds to the sound of their phones “as they would respond to the presence or proximity of a girlfriend, boyfriend or family member”? Is the takeaway message of Lindstrom’s Op-Ed that iPhones are actually people, as far as our brains are concerned?

In fairness, space in the Times is limited, so maybe it’s not fair to demand this level of detail in the Op-Ed iteslf. But the bigger problem is that we have no way of evaluating Lindstrom’s claims, period, because (as far as I can tell), his study hasn’t been published or peer-reviewed anywhere. Presumably, it’s proprietary information that belongs to the neuromarketing firm in question. Which is to say, the NYT is basically giving Lindstrom license to talk freely about scientific-sounding findings that can’t actually be independently confirmed, disputed, or critiqued by members of the scientific community with expertise in the very methods Lindstrom is applying (expertise which, one might add, he himself lacks). For all we know, he could have made everything up. To be clear, I don’t really think he did make everything up–but surely, somewhere in the editorial process someone at the NYT should have stepped in and said, “hey, these are pretty strong scientific claims; is there any way we can make your results–on which your whole article hangs–available for other experts to examine?”

This brings us to what might be the biggest whopper of all, and the real driver of the article title: the claim that “most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion“. Russ Poldrack already tore this statement to shreds earlier this morning:

Insular cortex may well be associated with feelings of love and compassion, but this hardly proves that we are in love with our iPhones.  In Tal Yarkoni’s recent paper in Nature Methods, we found that the anterior insula was one of the most highly activated part of the brain, showing activation in nearly 1/3 of all imaging studies!  Further, the well-known studies of love by Helen Fisher and colleagues don’t even show activation in the insula related to love, but instead in classic reward system areas.  So far as I can tell, this particular reverse inference was simply fabricated from whole cloth.  I would have hoped that the NY Times would have learned its lesson from the last episode.

But you don’t have to take Russ’s word for it; if you surf for a few terms on our Neurosynth website, making sure to select “forward inference” under image type, you’ll notice that the insula shows up for almost everything. That’s not an accident; it’s because the insula (or at least the anterior part of the insula) plays a very broad role in goal-directed cognition. It really is activated when you’re doing almost anything that involves, say, following instructions an experimenter gave you, or attending to external stimuli, or mulling over something salient in the environment. You can see this pretty clearly in this modified figure from our Nature Methods paper (I’ve circled the right insula):

Proportion of studies reporting activation at each voxel

The insula is one of a few ‘hotspots’ where activation is reported very frequently in neuroimaging articles (the other major one being the dorsal medial frontal cortex). So, by definition, there can’t be all that much specificity to what the insula is doing, since it pops up so often. To put it differently, as Russ and others have repeatedly pointed out, the fact that a given region activates when people are in a particular psychological state (e.g., love) doesn’t give you license to conclude that that state is present just because you see activity in the region in question. If language, working memory, physical pain, anger, visual perception, motor sequencing, and memory retrieval all activate the insula, then knowing that the insula is active is of very little diagnostic value. That’s not to say that some psychological states might not be more strongly associated with insula activity (again, you can see this on Neurosynth if you switch the image type to ‘reverse inference’ and browse around); it’s just that, probabilistically speaking, the mere fact that the insula is active gives you very little basis for saying anything concrete about what people are experiencing.

In fact, to account for Lindstrom’s findings, you don’t have to appeal to love or addiction at all. There’s a much simpler way to explain why seeing or hearing an iPhone might elicit insula activation. For most people, the onset of visual or auditory stimulation is a salient event that causes redirection of attention to the stimulated channel. I’d be pretty surprised, actually, if you could present any picture or sound to participants in an fMRI scanner and not elicit robust insula activity. Orienting and sustaining attention to salient things seems to be a big part of what the anterior insula is doing (whether or not that’s ultimately its ‘core’ function). So the most appropriate conclusion to draw from the fact that viewing iPhone pictures produces increased insula activity is something vague like “people are paying more attention to iPhones”, or “iPhones are particularly salient and interesting objects to humans living in 2011.” Not something like “no, really, you love your iPhone!”

In sum, the NYT screwed up. Lindstrom appears to have a habit of making overblown claims about neuroimaging evidence, so it’s not surprising he would write this type of piece; but the NYT editorial staff is supposedly there to filter out precisely this kind of pseudoscientific advertorial. And they screwed up. It’s a particularly big screw-up given that (a) as of right now, Lindstrom’s Op-Ed is the single most emailed article on the NYT site, and (b) this incident almost perfectly recapitulates another NYT article 4 years ago in which some neuroscientists and neuromarketers wrote a grossly overblown Op-Ed claiming to be able to infer, in detail, people’s opinions about presidential candidates. That time, Russ Poldrack and a bunch of other big names in cognitive neuroscience wrote a concise rebuttal that appeared in the NYT (but unfortunately, isn’t linked to from the original Op-Ed, so anyone who stumbles across the original now has no way of knowing how ridiculous it is). One hopes the NYT follows up in similar fashion this time around. They certainly owe it to their readers–some of whom, if you believe Lindstrom, are now in danger of dumping their current partners for their iPhones.

h/t: Molly Crockett

what the arsenic effect means for scientific publishing

I don’t know very much about DNA (and by ‘not very much’ I sadly mean ‘next to nothing’), so when someone tells me that life as we know it generally doesn’t use arsenic to make DNA, and that it’s a big deal to find a bacterium that does, I’m willing to believe them. So too, apparently, are at least two or three reviewers for Science, which published a paper last week by a NASA group purporting to demonstrate exactly that.

Turns out the paper might have a few holes. In the last few days, the blogosphere has reached fever delirium pitch as critiques of the article have emerged from every corner; it seems like pretty much everyone with some knowledge of the science in question is unhappy about the paper. Since I’m not in any position to critique the article myself, I’ll take Carl Zimmer’s word for it in Slate yesterday:

Was this merely a case of a few isolated cranks? To find out, I reached out to a dozen experts on Monday. Almost unanimously, they think the NASA scientists have failed to make their case.  “It would be really cool if such a bug existed,” said San Diego State University’s Forest Rohwer, a microbiologist who looks for new species of bacteria and viruses in coral reefs. But, he added, “none of the arguments are very convincing on their own.” That was about as positive as the critics could get. “This paper should not have been published,” said Shelley Copley of the University of Colorado.

Zimmer then follows his Slate piece up with a blog post today in which he provides 13 experts’ unadulterated comments. While there are one or two (somewhat) positive reviews, the consensus clearly seems to be that the Science paper is (very) bad science.

Of course, scientists (yes, even Science reviewers) do occasionally make mistakes, so if we’re being charitable about it, we might chalk it up to human error (though some of the critiques suggest that these are elementary problems that could have been very easily addressed, so it’s possible there’s some disingenuousness involved). But what many bloggers (1, 2, 3, etc.) have found particularly inexcusable is the way NASA and the research team have handled the criticism. Zimmer again, in Slate:

I asked two of the authors of the study if they wanted to respond to the criticism of their paper. Both politely declined by email.

“We cannot indiscriminately wade into a media forum for debate at this time,” declared senior author Ronald Oremland of the U.S. Geological Survey. “If we are wrong, then other scientists should be motivated to reproduce our findings. If we are right (and I am strongly convinced that we are) our competitors will agree and help to advance our understanding of this phenomenon. I am eager for them to do so.”

“Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated,” wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. “The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner.”

A NASA spokesperson basically reiterated this point of view, indicating that NASA scientists weren’t going to respond to criticism of their work unless that criticism appeared in, you know, a respectable, peer-reviewed outlet. (Fortunately, at least one of the critics already has a draft letter to Science up on her blog.)

I don’t think it’s surprising that people who spend much of their free time blogging about science, and think it’s important to discuss scientific issues in a public venue, generally aren’t going to like being told that science blogging isn’t a legitimate form of scientific discourse. Especially considering that the critics here aren’t laypeople without scientific training; they’re well-respected scientists with areas of expertise that are directly relevant to the paper. In this case, dismissing trenchant criticism because it’s on the web rather than in a peer-reviewed journal seems kind of like telling someone who’s screaming at you that your house is on fire that you’re not going to listen to them until they adopt a more polite tone. It just seems counterproductive.

That said, I personally don’t think we should take the NASA team’s statements at face value. I very much doubt that what the NASA researchers are saying really reflect any deep philosophical view about the role of blogs in scientific discourse; it’s much more likely that they’re simply trying to buy some time while they figure out how to respond. On the face of it, they have a choice between two lousy options: either ignore the criticism entirely, which would be antithetical to the scientific process and would look very bad, or address it head-on–which, judging by the vociferousness and near-unanimity of the commentators, is probably going to be a losing battle. Shifting the terms of the debate by insisting on responding only in a peer-reviewed venue doesn’t really change anything, but it does buy the authors two or three weeks. And two or three weeks is worth like, forty attentional cycles in the blogosphere.

Mind you, I’m not saying we should sympathize with the NASA researchers just because they’re in a tough position. I think one of the main reasons the story’s attracted so much attention is precisely because people see it as a case of justice being served. The NASA team called a major press conference ahead of the paper’s publication, published its results in one of the world’s most prestigious science journals, and yet apparently failed to run relatively basic experimental controls in support of its conclusions. If the critics are to be believed, the NASA researchers are either disingenuous or incompetent; either way, we shouldn’t feel sorry for them.

What I do think this episode shows is that the rules of scientific publishing have fundamentally changed in the last few years–and largely for the better. I haven’t been doing science for very long, but even in the halcyon days of 2003, when I started graduate school, science blogging was practically nonexistent, and the main way you’d find out what other people thought about an influential new paper was by talking to people you knew at conferences (which could take several months) or waiting for critiques or replication failures to emerge in other peer-reviewed journals (which could take years). That kind of delay between publication and evaluation is disastrous for science, because in the time it takes for a consensus to emerge that a paper is no good, several research teams might have already started trying to replicate and extend the reported findings, and several dozen other researchers might have uncritically cited their paper peripherally in their own work. This delay is probably why, as John Ioannidis’ work so elegantly demonstrates, major studies published in high-impact journals tend to exert a disproportionate influence on the literature long after they’ve been resoundingly discredited.

The Arsenic Effect, if we can call it that, provides a nice illustration of the impact of new media on scientific communication. It’s a safe bet that there are now very few people who do anything even vaguely related to the NASA team’s research who haven’t been made aware that the reported findings are controversial. Which means that the process of attempting to replicate (or falsify) the findings will proceed much more quickly than it might have ten or twenty years ago, and there probably won’t be very many people who cite the Science paper as compelling evidence of terrestrial arsenic-based life. Perhaps more importantly, as researchers get used to the idea that their high-profile work is going to be instantly evaluated by thousands of pairs of highly trained eyes, any of which might be attached to a highly prolific pair of typing hands, there will be an increasingly strong disincentive to avoid being careless. That isn’t to say that bad science will disappear, of course; just that, in cases where the badness reflects a pressure to tell a good story at all costs, we’ll probably see less of it.

internet use causes depression! or not.

I have a policy of not saying negative things about people (or places, or things) on this blog, and I think I’ve generally been pretty good about adhering to that policy. But I also think it’s important for scientists to speak up in cases where journalists or other scientists misrepresent scientific research in a way that could have a potentially large impact on people’s behavior, and this is one of those cases. All day long, media outlets have been full of reports about a new study that purportedly reveals that the internet–that most faithful of friends, always just a click away with its soothing, warm embrace–has a dark side: using it makes you depressed!

In fairness, most of the stories have been careful to note that the  study only “links” heavy internet use to depression, without necessarily implying that internet use causes depression. And the authors acknowledge that point themselves:

“While many of us use the Internet to pay bills, shop and send emails, there is a small subset of the population who find it hard to control how much time they spend online, to the point where it interferes with their daily activities,” said researcher Dr. Catriona Morrison, of the University of Leeds, in a statement. “Our research indicates that excessive Internet use is associated with depression, but what we don’t know is which comes first. Are depressed people drawn to the Internet or does the Internet cause depression?”

So you might think all’s well in the world of science and science journalism. But in other places, the study’s authors weren’t nearly so circumspect. For example, the authors suggest that 1.2% of the population can be considered addicted to the internet–a rate they claim is double that of compulsive gambling; and they suggest that their results “feed the public speculation that overengagement in websites that serve/replace a social function might be linked to maladaptive psychological functioning,” and “add weight to the recent suggestion that IA should be taken seriously as a distinct psychiatric construct.”

These are pretty strong claims; if the study’s findings are to be believed, we should at least be seriously considering the possibility that using the internet is making some of us depressed. At worst, we should be diagnosing people with internet addiction and doing… well, presumably something to treat them.

The trouble is that it’s not at all clear that the study’s findings should be believed. Or at least, it’s not clear that they really support any of the statements made above.

Let’s start with what the study (note: restricted access) actually shows. The authors, Catriona Morrison and Helen Gore (M&G), surveyed 1,319 subjects via UK-based social networking sites. They had participants fill out 3 self-report measures: the Internet Addiction Test (IAT), which measures dissatisfaction with one’s internet usage; the Internet Function Questionnaire, which asks respondents to indicate the relative proportion of time they spend on different internet activities (e.g., e-mail, social networking, porn, etc.); and the Beck Depression Inventory (BDI), a very widely-used measure of depression.

M&G identify a number of findings, three of which appear to support most of their conclusions. First, they report a very strong positive correlation (r = .49) between internet addiction and depression scores; second, they identify a small group of 18 subjects (1.2%) who they argue qualify as internet addicts (IA group) based on their scores on the IAT; and third, they suggest that people who used the internet more heavily “spent proportionately more time on online gaming sites, sexually gratifying websites, browsing, online communities and chat sites.”

These findings may sound compelling, but there are a number of methodological shortcomings of the study that make them very difficult to interpret in any meaningful way. As far as I can tell, none of these concerns are addressed in the paper:

First, participants were recruited online, via social networking sites. This introduces a huge selection bias: you can’t expect to obtain accurate estimates of how much, and how adaptively, people use the internet by sampling only from the population of internet users! It’s the equivalent of trying to establish cell phone usage patterns by randomly dialing only land-line numbers. Not a very good idea. And note that, not only could the study not reach people who don’t use the internet, but it was presumably also more likely to oversample from heavy internet users. The more time a person spends online, the greater the chance they’d happen to run into the authors recruitment ad. People who only check their email a couple of times a week would be very unlikely to participate in the study. So the bottom line is, the 1.2% figure the authors arrive at is almost certainly a gross overestimate. The true proportion of people who meet the authors’ criteria for internet addiction is probably much lower. It’s hard to believe the authors weren’t aware of the issue of selection bias, and the massive problem it presents for their estimates, yet they failed to mention it anywhere in their paper.

Second, the cut-off score for being placed in the IA group appears to be completely arbitrary. The Internet Addiction Test itself was developed by Kimberly Young in a 1998 book entitled “Caught in the Net: How to Recognize the Signs of Internet Addiction–and a Winning Strategy to Recovery”. The test was introduced, as far as I can tell (I haven’t read the entire book, just skimmed it in Google Books), with no real psychometric validation. The cut-off of 80 points out of a maximum 100 possible as a threshold for addiction appears to be entirely arbitrary (in fact, in Young’s book, she defines the cut-off as 70; for reasons that are unclear, M&G adopted a cut-off of 80). That is, it’s not like Young conducted extensive empirical analysis and determined that people with scores of X or above were functionally impaired in a way that people with scores below X weren’t; by all appearances, she simply picked numerically convenient cut-offs (20 – 39 is average; 40 – 69 indicates frequent problems; and 70+ basically means the internet is destroying your life). Any small change in the numerical cut-off would have translated into a large change in the proportion of people in M&G’s sample who met criteria for internet addiction, making the 1.2% figure seem even more arbitrary.

Third, M&G claim that the Internet Function Questionnaire they used asks respondents to indicate the proportion of time on the internet that they spend on each of several different activities. For example, given the question “How much of your time online do you spend on e-mail?”, your options would be 0-20%, 21-40%, and so on. You would presume that all the different activities should sum to 100%; after all, you can’t really spend 80% of your online time gaming, and then another 80% looking at porn–unless you’re either a very talented gamer, or have an interesting taste in “games”. Yet, when M&G report absolute numbers for the different activities in tables, they’re not given in percentages at all. Instead, one of the table captions indicates that the values are actually coded on a 6-point Likert scale ranging from “rarely/never” to “very frequently”. Hopefully you can see why this is a problem: if you claim (as M&G do) that your results reflect the relative proportion of time that people spend on different activities, you shouldn’t be allowing people to essentially say anything they like for each activity. Given that people with high IA scores report spending more time overall than they’d like online, is it any surprise if they also report spending more time on individual online activities? The claim that high-IA scorers spend “proportionately more” time on some activities just doesn’t seem to be true–at least, not based on the data M&G report. This might also explain how it could be that IA scores correlated positively with nearly all individual activities. That simply couldn’t be true for real proportions (if you spend proportionately more time on e-mail, you must be spending proportionately less time somewhere else), but it makes perfect sense if the response scale is actually anchored with vague terms like “rarely” and “frequently”.

Fourth, M&G consider two possibilities for the positive correlation between IAT and depression scores: (a) increased internet use causes depression, and (b) depression causes increased internet use. But there’s a third, and to my mind far more plausible, explanation: people who are depressed tend to have more negative self-perceptions, and are much more likely to endorse virtually any question that asks about dissatisfaction with one’s own behavior. Here are a couple of examples of questions on the IAT: “How often do you fear that life without the Internet would be boring, empty, and joyless?” “How often do you try to cut down the amount of time you spend on-line and fail?” Notice that there are really two components to these kinds of questions. One component is internet-specific: to what extent are people specifically concerned about their behavior online, versus in other domains? The other component is a general hedonic one, and has to do with how dissatisfied you are with stuff in general. Now, is there any doubt that, other things being equal, someone who’s depressed is going to be more likely to endorse an item that asks how often they fail at something? Or how often their life feels empty and joyless–irrespective of cause? No, of course not. Depressive people tend to ruminate and worry about all sorts of things. No doubt internet usage is one of those things, but that hardly makes it special or interesting. I’d be willing to bet money that if you created a Shoelace Tying Questionnaire that had questions like “How often do you worry about your ability to tie your shoelaces securely?” and “How often do you try to keep your shoelaces from coming undone and fail?”, you’d also get a positive correlation with BDI scores. Basically, depression and trait negative affect tend to correlate positively with virtually every measure that has a major evaluative component. That’s not news. To the contrary, given the types of questions on the IAT, it would have been astonishing if there wasn’t a robust positive correlation with depression.

Fifth, and related to the previous point, no evidence is ever actually provided that people with high IAT scores differ in their objective behavior from those with low scores. Remember, this is all based on self-report. And not just self-report, but vague self-report. As far as I can tell, M&G never asked respondents to estimate how much time they spent online in a given week. So it’s entirely possible that people who report spending too much time online don’t actually spend much more time online than anyone else; they just feel that way (again, possibly because of a generally negative disposition). There’s actually some support for this idea: A 2004 study that sought to validate the IAT psychometrically found only a .22 correlation between IAT scores and self-reported time spent online. Now, a .22 correlation is perfectly meaningful, and it suggests that people who feel they spend too much time online also estimate that they really do spend more time online (though, again, bias is a possibility here too). But it’s a much smaller correlation than the one between IAT scores and depression, which fits with the above idea that there may not be any real “link” between internet use and depression above and beyond the fact that depressed individuals are more likely to more negatively-worded items.

Finally, even if you ignore the above considerations, and decide to conclude that there is in fact a non-artifactual correlation between depression and internet use, there’s really no reason you would conclude that that’s a bad thing (which M&G hedge on, and many of the news articles haven’t hesitated to play up). It’s entirely plausible that the reason depressed individuals might spend more time online is because it’s an effective form of self-medication. If you’re someone who has trouble mustering up the energy to engage with the outside world, or someone who’s socially inhibited, online communities might provide you with a way to fulfill your social needs in a way that you would otherwise not have been able to. So it’s quite conceivable that heavy internet use makes people less depressed, not more; it’s just that the people who are more likely to use the internet heavily are more depressed to begin with. I’m not suggesting that this is in fact true (I find the artifactual explanation for the IAT-BDI correlation suggested above much more plausible), but just that the so-called “dark side” of the internet could actually be a very good thing.

In sum, what can we learn from M&G’s paper? Not that much. To be fair, I don’t necessarily think it’s a terrible paper; it has its limitations, but every paper does. The problem isn’t so much that the paper is bad; it’s that the findings it contains were blown entirely out of proportion, and twisted to support headlines (most of them involving the phrase “The Dark Side”) that they couldn’t possibly support. The internet may or may not cause depression (probably not), but you’re not going to get much traction on that question by polling a sample of internet respondents, using measures that have a conceptual overlap with depression, and defining groups based on arbitrary cut-offs. The jury remains open, of course, but these findings by themselves don’t really give us any reason to reconsider or try to change our online behavior.

ResearchBlogging.org
Morrison, C., & Gore, H. (2010). The Relationship between Excessive Internet Use and Depression: A Questionnaire-Based Study of 1,319 Young People and Adults Psychopathology, 43 (2), 121-126 DOI: 10.1159/000277001