Tag Archives: selection bias

the ‘decline effect’ doesn’t work that way

Over the last four or five years, there’s been a growing awareness in the scientific community that science is an imperfect process. Not that everyone used to think science was a crystal ball with a direct line to the universe or anything, but there does seem to be a growing recognition that scientists are human beings with human flaws, and are susceptible to common biases that can make it more difficult to fully trust any single finding reported in the literature. For instance, scientists like interesting results more than boring results; we’d rather keep our jobs than lose them; and we have a tendency to see what we want to see, even when it’s only sort-of-kind-of there, and sometimes not there at all. All of these things contrive to produce systematic biases in the kinds of findings that get reported.

The single biggest contributor to the zeitgeist shift nudge is undoubtedly John Ioannidis (recently profiled in an excellent Atlantic article), whose work I can’t say enough good things about (though I’ve tried). But lots of other people have had a hand in popularizing the same or similar ideas–many of which actually go back several decades. I’ve written a bit about these issues myself in a number of papers (1, 2, 3) and blog posts (1, 2, 3, 4, 5), so I’m partial to such concerns. Still, important as the role of the various selection and publication biases is in charting the course of science, virtually all of the discussions of these issues have had a relatively limited audience. Even Ioannidis’ work, influential as it’s been, has probably been read by no more than a few thousand scientists.

Last week, the debate hit the mainstream when the New Yorker (circulation: ~ 1 million) published an article by Jonah Lehrer suggesting–or at least strongly raising the possibility–that something might be wrong with the scientific method. The full article is behind a paywall, but I can helpfully tell you that some people seem to have un-paywalled it against the New Yorker’s wishes, so if you search for it online, you will find it.

The crux of Lehrer’s argument is that many, and perhaps most, scientific findings fall prey to something called the “decline effect”: initial positive reports of relatively large effects are subsequently followed by gradually decreasing effect sizes, in some cases culminating in a complete absence of an effect in the largest, most recent studies. Lehrer gives a number of colorful anecdotes illustrating this process, and ends on a decidedly skeptical (and frankly, terribly misleading) note:

The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.

While Lehrer’s article received pretty positive reviews from many non-scientist bloggers (many of whom, dismayingly, seemed to think the take-home message was that since scientists always change their minds, we shouldn’t trust anything they say), science bloggers were generally not very happy with it. Within days, angry mobs of Scientopians and Nature Networkers started murdering unicorns; by the end of the week, the New Yorker offices were reduced to rubble, and the scientists and statisticians who’d given Lehrer quotes were all rumored to be in hiding.

Okay, none of that happened. I’m just trying to keep things interesting. Anyway, because I’ve been characteristically lazy slow on the uptake, by the time I got around to writing this post you’re now reading, about eighty hundred and sixty thousand bloggers had already weighed in on Lehrer’s article. That’s good, because it means I can just direct you to other people’s blogs instead of having to do any thinking myself. So here you go: good posts by Games With Words (whose post tipped me off to the article), Jerry Coyne, Steven Novella, Charlie Petit, and Andrew Gelman, among many others.

Since I’ve blogged about these issues before, and agree with most of what’s been said elsewhere, I’ll only make one point about the article. Which is that about half of the examples Lehrer talks about don’t actually seem to me to qualify as instances of the decline effect–at least as Lehrer defines it. The best example of this comes when Lehrer discusses Jonathan Schooler’s attempt to demonstrate the existence of the decline effect by running a series of ESP experiments:

In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhirie’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—-the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”–a standard statistical measure–“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhinos,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”

This is a pretty bad way to describe what’s going on, because it makes it sound like it’s a general principle of data collection that effects systematically get smaller. It isn’t. The variance around the point estimate of effect size certainly gets smaller as samples get larger, but the likelihood of an effect increasing is just as high as the likelihood of it decreasing. The absolutely critical point Lehrer left out is that you would only get the decline effect to show up if you intervened in the data collection or reporting process based on the results you were getting. Instead, most of Lehrer’s article presents the decline effect as if it’s some sort of mystery, rather than the well-understood process that it is. It’s as though Lehrer believes that scientific data has the magical property of telling you less about the world the more of it you have. Which isn’t true, of course; the problem isn’t that science is malfunctioning, it’s that scientists are still (kind of!) human, and are susceptible to typical human biases. The unfortunate net effect is that Lehrer’s article, while tremendously entertaining, achieves exactly the opposite of what good science journalism should do: it sows confusion about the scientific process and makes it easier for people to dismiss the results of good scientific work, instead of helping people develop a critical appreciation for the amazing power science has to tell us about the world.

fourteen questions about selection bias, circularity, nonindependence, etc.

A new paper published online this week in the Journal of Cerebral Blood Flow & Metabolism this week discusses the infamous problem of circular analysis in fMRI research. The paper is aptly titled “Everything you never wanted to know about circular analysis, but were afraid to ask,” and is authored by several well-known biostatisticians and cognitive neuroscientists–to wit, Niko Kriegeskorte, Martin Lindquist, Tom Nichols, Russ Poldrack, and Ed Vul. The paper has an interesting format, and one that I really like: it’s set up as a series of fourteen questions related to circular analysis, and each author answers each question in 100 words or less.

I won’t bother going over the gist of the paper, because the Neuroskeptic already beat me to the punch in an excellent post a couple of days ago (actually, that’s how I found out about the paper); instead,  I’ll just give my own answers to the same set of questions raised in the paper. And since blog posts don’t have the same length constraints as NPG journals, I’m going to be characteristically long-winded and ignore the 100 word limit…

(1) Is circular analysis a problem in systems and cognitive neuroscience?

Yes, it’s a huge problem. That said, I think the term ‘circular’ is somewhat misleading here, because it has the connotation than an analysis is completely vacuous. Truly circular analyses–i.e., those where an initial analysis is performed, and the researchers then conduct a “follow-up” analysis that literally adds no new information–are relatively rare in fMRI research. Much more common are cases where there’s some dependency between two different analyses, but the second one still adds some novel information.

(2) How widespread are slight distortions and serious errors caused by circularity in the neuroscience literature?

I think Nichols sums it up nicely here:

TN: False positives due to circularity are minimal; biased estimates of effect size are common. False positives due to brushing off the multiple testing problem (e.g., ‘P<0.001 uncorrected’ and crossing your fingers) remain pervasive.

The only thing I’d add to this is that the bias in effect size estimates is not only common, but, in most cases, is probably very large.

(3) Are circular estimates useful measures of effect size?

Yes and no. They’re less useful than unbiased measures of effect size. But given that the vast majority of effects reported in whole-brain fMRI analyses (and, more generally, analyses in most fields) are likely to be inflated to some extent, the only way to ensure we don’t rely on circular estimates of effect size would be to disregard effect size estimates entirely, which doesn’t seem prudent.

(4) Should circular estimates of effect size be presented in papers and, if so, how?

Yes, because the only principled alternatives are to either (a) never report effect sizes (which seems much too drastic), or (b) report the results of every single test performed, irrespective of the result (i.e., to never give selection bias an opportunity to rear its head). Neither of these is reasonable. We should generally report effect sizes for all key effects, but they should be accompanied by appropriate confidence intervals. As Lindquist notes:

In general, it may be useful to present any effect size estimate as confidence intervals, so that readers can see for themselves how much uncertainty is related to the point estimate.

A key point I’d add is that the width of the reported CIs should match the threshold used to identify results in the first place. In other words, if you conduct a whole brain analysis at p < .001, you should report all resulting effects with 99.9% CIs, and not 95% CIs. I think this simple step would go a considerable ways towards conveying the true uncertainty surrounding most point estimates in fMRI studies.

(5) Are effect size estimates important/useful for neuroscience research, and why?

I think my view here is closest to Ed Vul’s:

Yes, very much so. Null-hypothesis testing is insufficient for most goals of neuroscience because it can only indicate that a brain region is involved to some nonzero degree in some task contrast. This is likely to be true of most combinations of task contrasts and brain regions when measured with sufficient power.

I’d go further than Ed does though, and say that in a sense, effect size estimates are the only things that matter. As Ed notes, there are few if any cases where it’s plausible to suppose that the effect of some manipulation on brain activation is really zero. The brain is a very dense causal system–almost any change in one variable is going to have downstream effects on many, and perhaps most, others. So the real question we care about is almost never “is there or isn’t there an effect,” it’s whether there’s an effect that’s big enough to actually care about. (This problem isn’t specific to fMRI research, of course; it’s been a persistent source of criticism of null hypothesis significance testing for many decades.)

People sometimes try to deflect this concern by saying that they’re not trying to make any claims about how big an effect is, but only about whether or not one can reject the null–i.e., whether any kind of effect is present or not. I’ve never found this argument convincing, because whether or not you own up to it, you’re always making an effect size claim whenever you conduct a hypothesis test. Testing against a null of zero is equivalent to saying that you care about any effect that isn’t exactly zero, which is simply false. No one in fMRI research cares about r or d values of 0.0001, yet we routinely conduct tests whose results could be consistent with those types of effect sizes.

Since we’re always making implicit claims about effect sizes when we conduct hypothesis tests, we may as well make them explicit so that they can be evaluated properly. If you only care about correlations greater than 0.1, there’s no sense in hiding that fact; why not explicitly test against a null range of -0.1 to 0.1, instead of a meaningless null of zero?

(6) What is the best way to accurately estimate effect sizes from imaging data?

Use large samples, conduct multivariate analyses, report results comprehensively, use meta-analysis… I don’t think there’s any single way to ensure accurate effect size estimates, but plenty of things help. Maybe the most general recommendation is to ensure adequate power (see below), which will naturally minimize effect size inflation.

(7) What makes data sets independent? Are different sets of subjects required?

Most of the authors think (as I do too) that different sets of subjects are indeed required in order to ensure independence. Here’s Nichols:

Only data sets collected on distinct individuals can be assured to be independent. Splitting an individual’s data (e.g., using run 1 and run 2 to create two data sets) does not yield independence at the group level, as each subject’s true random effect will correlate the data sets.

Put differently, splitting data within subjects only eliminates measurement error, and not sampling error. You could in theory measure activation perfectly reliably (in which case the two halves of subjects’ data would be perfectly correlated) and still have grossly inflated effects, simply because the multivariate distribution of scores in your sample doesn’t accurately reflect the distribution in the population. So, as Nichols points out, you always need new subjects if you want to be absolutely certain your analyses are independent. But since this generally isn’t feasible, I’d argue we should worry less about whether or not our data sets are completely independent, and more about reporting results in a way that makes the presence of any bias as clear as possible.

(8) What information can one glean from data selected for a certain effect?

I think this is kind of a moot question, since virtually all data are susceptible to some form of selection bias (scientists generally don’t write papers detailing all the analyses they conducted that didn’t pan out!). As I note above, I think it’s a bad idea to disregard effect sizes entirely; they’re actually what we should be focusing most of our attention on. Better to report confidence intervals that accurately reflect the selection procedure and make the uncertainty around the point estimate clear.

(9) Are visualizations of nonindependent data helpful to illustrate the claims of a paper?

Not in cases where there’s an extremely strong dependency between the selection criteria and the effect size estimate. In cases of weak to moderate dependency, visualization is fine so long as confidence bands are plotted alongside the best fit. Again, the key is to always be explicit about the limitations of the analysis and provide some indication of the uncertainty involved.

(10) Should data exploration be discouraged in favor of valid confirmatory analyses?

No. I agree with Poldrack’s sentiment here:

Our understanding of brain function remains incredibly crude, and limiting research to the current set of models and methods would virtually guarantee scientific failure. Exploration of new approaches is thus critical, but the findings must be confirmed using new samples and convergent methods.

(11) Is a confirmatory analysis safer than an exploratory analysis in terms of drawing neuroscientific conclusions?

In principle, sure, but in practice, it’s virtually impossible to determine which reported analyses really started out their lives as confirmatory analyses and which started life out as exploratory analyses and then mysteriously evolved into “a priori” predictions once the paper was written. I’m not saying there’s anything wrong with this–everyone reports results strategically to some extent–just that I don’t know that the distinction between confirmatory and exploratory analyses is all that meaningful in practice. Also, as the previous point makes clear, safety isn’t the only criterion we care about; we also want to discover new and unexpected findings, which requires exploration.

(12) What makes a whole-brain mapping analysis valid? What constitutes sufficient adjustment for multiple testing?

From a hypothesis testing standpoint, you need to ensure adequate control of the family-wise error (FWE) rate or false discovery rate (FDR). But as I suggested above, I think this only ensures validity in a limited sense; it doesn’t ensure that the results are actually going to be worth caring about. If you want to feel confident that any effects that survive are meaningfully large, you need to do the extra work up front and define what constitutes a meaningful effect size (and then test against that).

(13) How much power should a brain-mapping analysis have to be useful?

As much as possible! Concretely, the conventional target of 80% seems like a good place to start. But as I’ve argued before (e.g., here), that would require more than doubling conventional sample sizes in most cases. The reality is that fMRI studies are expensive, so we’re probably stuck with underpowered analyses for the foreseeable future. So we need to find other ways to compensate for that (e.g., relying more heavily on meta-analytic effect size estimates).

(14) In which circumstances are nonindependent selective analyses acceptable for scientific publication?

It depends on exactly what’s problematic about the analysis. Analyses that are truly circular and provide no new information should never be reported, but those constitute only a small fraction of all analyses. More commonly, the nonindependence simply amounts to selection bias: researchers tend to report only those results that achieve statistical significance, thereby inflating apparent effect sizes. I think the solution to this is to still report all key effect sizes, but to ensure they’re accompanied by confidence intervals and appropriate qualifiers.

ResearchBlogging.orgKriegeskorte N, Lindquist MA, Nichols TE, Poldrack RA, & Vul E (2010). Everything you never wanted to know about circular analysis, but were afraid to ask. Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism PMID: 20571517

internet use causes depression! or not.

I have a policy of not saying negative things about people (or places, or things) on this blog, and I think I’ve generally been pretty good about adhering to that policy. But I also think it’s important for scientists to speak up in cases where journalists or other scientists misrepresent scientific research in a way that could have a potentially large impact on people’s behavior, and this is one of those cases. All day long, media outlets have been full of reports about a new study that purportedly reveals that the internet–that most faithful of friends, always just a click away with its soothing, warm embrace–has a dark side: using it makes you depressed!

In fairness, most of the stories have been careful to note that the  study only “links” heavy internet use to depression, without necessarily implying that internet use causes depression. And the authors acknowledge that point themselves:

“While many of us use the Internet to pay bills, shop and send emails, there is a small subset of the population who find it hard to control how much time they spend online, to the point where it interferes with their daily activities,” said researcher Dr. Catriona Morrison, of the University of Leeds, in a statement. “Our research indicates that excessive Internet use is associated with depression, but what we don’t know is which comes first. Are depressed people drawn to the Internet or does the Internet cause depression?”

So you might think all’s well in the world of science and science journalism. But in other places, the study’s authors weren’t nearly so circumspect. For example, the authors suggest that 1.2% of the population can be considered addicted to the internet–a rate they claim is double that of compulsive gambling; and they suggest that their results “feed the public speculation that overengagement in websites that serve/replace a social function might be linked to maladaptive psychological functioning,” and “add weight to the recent suggestion that IA should be taken seriously as a distinct psychiatric construct.”

These are pretty strong claims; if the study’s findings are to be believed, we should at least be seriously considering the possibility that using the internet is making some of us depressed. At worst, we should be diagnosing people with internet addiction and doing… well, presumably something to treat them.

The trouble is that it’s not at all clear that the study’s findings should be believed. Or at least, it’s not clear that they really support any of the statements made above.

Let’s start with what the study (note: restricted access) actually shows. The authors, Catriona Morrison and Helen Gore (M&G), surveyed 1,319 subjects via UK-based social networking sites. They had participants fill out 3 self-report measures: the Internet Addiction Test (IAT), which measures dissatisfaction with one’s internet usage; the Internet Function Questionnaire, which asks respondents to indicate the relative proportion of time they spend on different internet activities (e.g., e-mail, social networking, porn, etc.); and the Beck Depression Inventory (BDI), a very widely-used measure of depression.

M&G identify a number of findings, three of which appear to support most of their conclusions. First, they report a very strong positive correlation (r = .49) between internet addiction and depression scores; second, they identify a small group of 18 subjects (1.2%) who they argue qualify as internet addicts (IA group) based on their scores on the IAT; and third, they suggest that people who used the internet more heavily “spent proportionately more time on online gaming sites, sexually gratifying websites, browsing, online communities and chat sites.”

These findings may sound compelling, but there are a number of methodological shortcomings of the study that make them very difficult to interpret in any meaningful way. As far as I can tell, none of these concerns are addressed in the paper:

First, participants were recruited online, via social networking sites. This introduces a huge selection bias: you can’t expect to obtain accurate estimates of how much, and how adaptively, people use the internet by sampling only from the population of internet users! It’s the equivalent of trying to establish cell phone usage patterns by randomly dialing only land-line numbers. Not a very good idea. And note that, not only could the study not reach people who don’t use the internet, but it was presumably also more likely to oversample from heavy internet users. The more time a person spends online, the greater the chance they’d happen to run into the authors recruitment ad. People who only check their email a couple of times a week would be very unlikely to participate in the study. So the bottom line is, the 1.2% figure the authors arrive at is almost certainly a gross overestimate. The true proportion of people who meet the authors’ criteria for internet addiction is probably much lower. It’s hard to believe the authors weren’t aware of the issue of selection bias, and the massive problem it presents for their estimates, yet they failed to mention it anywhere in their paper.

Second, the cut-off score for being placed in the IA group appears to be completely arbitrary. The Internet Addiction Test itself was developed by Kimberly Young in a 1998 book entitled “Caught in the Net: How to Recognize the Signs of Internet Addiction–and a Winning Strategy to Recovery”. The test was introduced, as far as I can tell (I haven’t read the entire book, just skimmed it in Google Books), with no real psychometric validation. The cut-off of 80 points out of a maximum 100 possible as a threshold for addiction appears to be entirely arbitrary (in fact, in Young’s book, she defines the cut-off as 70; for reasons that are unclear, M&G adopted a cut-off of 80). That is, it’s not like Young conducted extensive empirical analysis and determined that people with scores of X or above were functionally impaired in a way that people with scores below X weren’t; by all appearances, she simply picked numerically convenient cut-offs (20 – 39 is average; 40 – 69 indicates frequent problems; and 70+ basically means the internet is destroying your life). Any small change in the numerical cut-off would have translated into a large change in the proportion of people in M&G’s sample who met criteria for internet addiction, making the 1.2% figure seem even more arbitrary.

Third, M&G claim that the Internet Function Questionnaire they used asks respondents to indicate the proportion of time on the internet that they spend on each of several different activities. For example, given the question “How much of your time online do you spend on e-mail?”, your options would be 0-20%, 21-40%, and so on. You would presume that all the different activities should sum to 100%; after all, you can’t really spend 80% of your online time gaming, and then another 80% looking at porn–unless you’re either a very talented gamer, or have an interesting taste in “games”. Yet, when M&G report absolute numbers for the different activities in tables, they’re not given in percentages at all. Instead, one of the table captions indicates that the values are actually coded on a 6-point Likert scale ranging from “rarely/never” to “very frequently”. Hopefully you can see why this is a problem: if you claim (as M&G do) that your results reflect the relative proportion of time that people spend on different activities, you shouldn’t be allowing people to essentially say anything they like for each activity. Given that people with high IA scores report spending more time overall than they’d like online, is it any surprise if they also report spending more time on individual online activities? The claim that high-IA scorers spend “proportionately more” time on some activities just doesn’t seem to be true–at least, not based on the data M&G report. This might also explain how it could be that IA scores correlated positively with nearly all individual activities. That simply couldn’t be true for real proportions (if you spend proportionately more time on e-mail, you must be spending proportionately less time somewhere else), but it makes perfect sense if the response scale is actually anchored with vague terms like “rarely” and “frequently”.

Fourth, M&G consider two possibilities for the positive correlation between IAT and depression scores: (a) increased internet use causes depression, and (b) depression causes increased internet use. But there’s a third, and to my mind far more plausible, explanation: people who are depressed tend to have more negative self-perceptions, and are much more likely to endorse virtually any question that asks about dissatisfaction with one’s own behavior. Here are a couple of examples of questions on the IAT: “How often do you fear that life without the Internet would be boring, empty, and joyless?” “How often do you try to cut down the amount of time you spend on-line and fail?” Notice that there are really two components to these kinds of questions. One component is internet-specific: to what extent are people specifically concerned about their behavior online, versus in other domains? The other component is a general hedonic one, and has to do with how dissatisfied you are with stuff in general. Now, is there any doubt that, other things being equal, someone who’s depressed is going to be more likely to endorse an item that asks how often they fail at something? Or how often their life feels empty and joyless–irrespective of cause? No, of course not. Depressive people tend to ruminate and worry about all sorts of things. No doubt internet usage is one of those things, but that hardly makes it special or interesting. I’d be willing to bet money that if you created a Shoelace Tying Questionnaire that had questions like “How often do you worry about your ability to tie your shoelaces securely?” and “How often do you try to keep your shoelaces from coming undone and fail?”, you’d also get a positive correlation with BDI scores. Basically, depression and trait negative affect tend to correlate positively with virtually every measure that has a major evaluative component. That’s not news. To the contrary, given the types of questions on the IAT, it would have been astonishing if there wasn’t a robust positive correlation with depression.

Fifth, and related to the previous point, no evidence is ever actually provided that people with high IAT scores differ in their objective behavior from those with low scores. Remember, this is all based on self-report. And not just self-report, but vague self-report. As far as I can tell, M&G never asked respondents to estimate how much time they spent online in a given week. So it’s entirely possible that people who report spending too much time online don’t actually spend much more time online than anyone else; they just feel that way (again, possibly because of a generally negative disposition). There’s actually some support for this idea: A 2004 study that sought to validate the IAT psychometrically found only a .22 correlation between IAT scores and self-reported time spent online. Now, a .22 correlation is perfectly meaningful, and it suggests that people who feel they spend too much time online also estimate that they really do spend more time online (though, again, bias is a possibility here too). But it’s a much smaller correlation than the one between IAT scores and depression, which fits with the above idea that there may not be any real “link” between internet use and depression above and beyond the fact that depressed individuals are more likely to more negatively-worded items.

Finally, even if you ignore the above considerations, and decide to conclude that there is in fact a non-artifactual correlation between depression and internet use, there’s really no reason you would conclude that that’s a bad thing (which M&G hedge on, and many of the news articles haven’t hesitated to play up). It’s entirely plausible that the reason depressed individuals might spend more time online is because it’s an effective form of self-medication. If you’re someone who has trouble mustering up the energy to engage with the outside world, or someone who’s socially inhibited, online communities might provide you with a way to fulfill your social needs in a way that you would otherwise not have been able to. So it’s quite conceivable that heavy internet use makes people less depressed, not more; it’s just that the people who are more likely to use the internet heavily are more depressed to begin with. I’m not suggesting that this is in fact true (I find the artifactual explanation for the IAT-BDI correlation suggested above much more plausible), but just that the so-called “dark side” of the internet could actually be a very good thing.

In sum, what can we learn from M&G’s paper? Not that much. To be fair, I don’t necessarily think it’s a terrible paper; it has its limitations, but every paper does. The problem isn’t so much that the paper is bad; it’s that the findings it contains were blown entirely out of proportion, and twisted to support headlines (most of them involving the phrase “The Dark Side”) that they couldn’t possibly support. The internet may or may not cause depression (probably not), but you’re not going to get much traction on that question by polling a sample of internet respondents, using measures that have a conceptual overlap with depression, and defining groups based on arbitrary cut-offs. The jury remains open, of course, but these findings by themselves don’t really give us any reason to reconsider or try to change our online behavior.

Morrison, C., & Gore, H. (2010). The Relationship between Excessive Internet Use and Depression: A Questionnaire-Based Study of 1,319 Young People and Adults Psychopathology, 43 (2), 121-126 DOI: 10.1159/000277001