Archive for the ‘research’ Category

more on the absence of brain training effects

Tuesday, May 4th, 2010

A little while ago I blogged about the recent Owen et al Nature study on the (null) effects of cognitive training. My take on the study, which found essentially no effect of cognitive training on generalized cognitive performance, was largely positive. In response, Martin Walker, founder of Mind Sparke, maker of Brain Fitness Pro software, left this comment:

I’ve done regular aerobic training for pretty much my entire life, but I’ve never had the kind of mental boost from exercise that I have had from dual n-back training. I’ve also found that n-back training helps my mood.

There was a foundational problem with the BBC study in that it didn’t provide anywhere near the intensity of training that would be required to show transfer. The null hypothesis was a forgone conclusion. It seems hard to believe that the scientists didn’t know this before they began and were setting out to debunk the populist brain game hype.

I think there are a couple of points worth making. One is the standard rejoinder that one anecdotal report doesn’t count for very much. That’s not meant as a jibe at Walker in particular, but simply as a general observation about the fallibility of human judgment. Many people are perfectly convinced that homeopathic solutions have dramatically improved their quality of life, but that doesn’t mean we should take homeopathy seriously. Of course, I’m not suggesting that cognitive training programs are as ineffectual as homeopathy–in my post, I suggested they may well have some effect–but simply that personal testimonials are no substitute for controlled studies.

With respect to the (also anecdotal) claim that aerobic exercise hasn’t worked for Walker, it’s worth noting that the effects of aerobic exercise on cognitive performance take time to develop. No one expects a single brisk 20-minute jog to dramatically improve cognitive performance. If you’ve been exercising regularly your whole life, the question isn’t whether exercise will improve your cognitive function–it’s whether not doing any exercise for a month or two would lead to poorer performance. That is, if Walker stopped exercising, would his cognitive performance suffer? It would be a decidedly unhealthy hypothesis to test, of course, but that would really be the more reasonable prediction. I don’t think anyone thinks that a person in excellent physical condition would benefit further from physical exercise; the point is precisely that most people aren’t in excellent physical shape. In any event, as I noted in my post, the benefits of aerobic exercise are clearly largest for older adults who were previously relatively sedentary. There’s much less evidence for large effects of aerobic exercise on cognitive performance in young or middle-aged adults.

The more substantive question Walker raises has to do with whether the tasks Owen et al used were too easy to support meaningful improvement. I think this is a reasonable question, but I don’t think the answer is as straightforward as Walker suggests. For one thing, participants in the Owen et al study did show substantial gains in performance on the training tasks (just not the untrained tasks), so it’s not like they were at ceiling. That is, the training tasks clearly weren’t easy. Second, participants varied widely in the number of training sessions they performed, and yet, as the authors note, the correlation between amount of training and cognitive improvement was negligible. So if you extrapolate from the observed pattern, it doesn’t look particularly favorable. Third, Owen et al used 12 different training tasks that spanned a broad range of cognitive abilities. While one can quibble with any individual task, it’s hard to reconcile the overall pattern of null results with the notion that cognitive training produces robust effects. Surely at least some of these measures should have led to a noticeable overall effect if they successfully produced transfer. But they didn’t.

To reiterate what I said in my earlier post, I’m not saying that cognitive training has absolutely no effect. No study is perfect, and it’s conceivable that more robust effects might be observed given a different design. But the Owen et al study is, to put it bluntly, the largest study of cognitive training conducted to date by about two orders of magnitude, and that counts for a lot in an area of research dominated by relatively small studies that have generally produced mixed findings. So, in the absence of contradictory evidence from another large training study, I don’t see any reason to second-guess Owen et al’s conclusions.

Lastly, I don’t think Walker is in any position to cast aspersions on people’s motivations (“It seems hard to believe that the scientists didn’t know this before they began and were setting out to debunk the populist brain game hype”). While I don’t think that his financial stake in brain training programs necessarily impugns his evaluation of the Owen et al study, it can’t exactly promote impartiality either. And for what it’s worth, I dug around the Mind Sparke website and couldn’t find any “scientific proof” that the software works (which is what the website claims)–just some vague allusions to customer testimonials and some citations of other researchers’ published work (none of which, as far as I can tell, used Brain Fitness Pro for training).

undergraduates are WEIRD

Tuesday, April 27th, 2010

This month’s issue of Nature Neuroscience contains an editorial lambasting the excessive reliance of psychologists on undergraduate college samples, which, it turns out, are pretty unrepresentative of humanity at large. The impetus for the editorial is a mammoth in-press review of cross-cultural studies by Joseph Henrich and colleagues, which, the authors suggest, collectively indicate that “samples drawn from Western, Educated, Industrialized, Rich and Democratic (WEIRD) societies … are among the least representative populations one could find for generalizing about humans.” I’ve only skimmed the article, but aside from the clever acronym, you could do a lot worse than these (rather graphic) opening paragraphs:

In the tropical forests of New Guinea the Etoro believe that for a boy to achieve manhood he must ingest the semen of his elders. This is accomplished through ritualized rites of passage that require young male initiates to fellate a senior member (Herdt, 1984; Kelley, 1980). In contrast, the nearby Kaluli maintain that  male initiation is only properly done by ritually delivering the semen through the initiate’s anus, not his mouth. The Etoro revile these Kaluli practices, finding them disgusting. To become a man in these societies, and eventually take a wife, every boy undergoes these initiations. Such boy-inseminating practices, which  are enmeshed in rich systems of meaning and imbued with local cultural values, were not uncommon among the traditional societies of Melanesia and Aboriginal Australia (Herdt, 1993), as well as in Ancient Greece and Tokugawa Japan.

Such in-depth studies of seemingly “exotic” societies, historically the province of anthropology, are crucial for understanding human behavioral and psychological variation. However, this paper is not about these peoples. It’s about a truly unusual group: people from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. In particular, it’s about the Western, and more specifically American, undergraduates who form the bulk of the database in the experimental branches of psychology, cognitive science, and economics, as well as allied fields (hereafter collectively labeled the “behavioral sciences”). Given that scientific knowledge about human psychology is largely based on findings from this subpopulation, we ask just how representative are these typical subjects in light of the available comparative database. How justified are researchers in assuming a species-level generality for their findings? Here, we review the evidence regarding how WEIRD people compare to other
populations.

Anyway, it looks like a good paper. Based on a cursory read, the conclusions the authors draw seem pretty reasonable, if a bit strong. I think most researchers do already recognize that our dependence on undergraduates is unhealthy in many respects; it’s just that it’s difficult to break the habit, because the alternative is to spend a lot more time and money chasing down participants (and there are limits to that too; it just isn’t feasible for most researchers to conduct research with Etoro populations in New Guinea). Then again, just because it’s hard to do science the right way doesn’t really make it OK to do it the wrong way. So, to the extent that we care about our results generalizing across the entire human species (which, in many cases, we don’t), we should probably be investing more energy in weaning ourselves off undergraduates and trying to recruit more diverse samples.

cognitive training doesn’t work (much, if at all)

Friday, April 23rd, 2010

There’s a beautiful paper in Nature this week by Adrian Owen and colleagues that provides what’s probably as close to definitive evidence as you can get in any single study that “brain training” programs don’t work. Or at least, to the extent that they do work, the effects are so weak they’re probably not worth caring about.

Owen et al used a very clever approach to demonstrate their point. Rather than spending their time running small-sample studies that require people to come into the lab over multiple sessions (an expensive and very time-intensive effort that’s ultimately still usually underpowered), they teamed up with the BBC program ‘Bang Goes The Theory‘. Participants were recruited via the tv show, and were directed to an experimental website where they created accounts, engaged in “pre-training” cognitive testing, and then could repeatedly log on over the course of six weeks to perform a series of cognitive tasks supposedly capable of training executive abilities. After the training period, participants again performed the same battery of cognitive tests, enabling the researchers to compare performance pre- and post-training.

Of course, you expect robust practice effects with this kind of thing (i.e., participants would almost certainly do better on the post-training battery than on the pre-training battery solely because they’d been exposed to the tasks and had some practice). So Owen et al randomly assigned participants logging on to the website to two different training programs (involving different types of training tasks) or to a control condition in which participants answered obscure trivia questions rather than doing any sort of intensive cognitive training per se. The beauty of doing this all online was that the authors were able to obtain gargantuan sample sizes (several thousand in each condition), ensuring that statistical power wasn’t going to be an issue. Indeed, Owen et al focus almost explicitly on effect sizes rather than p values, because, as they point out, once you have several thousand participants in each group, almost everything is going to be statistically significant, so it’s really the effect sizes that matter.

The critical comparison was whether the experimental groups showed greater improvements in performance post-training than the control group did. And the answer, generally speaking, was no. Across four different tasks, the differences in training-related gains in the experimental group relative to the control group were always either very small (no larger than about a fifth of a standard deviation), or even nonexistent (to the extent that for some comparisons, the control group improved more than the experimental groups!). So the upshot is that if there is any benefit of cognitive training (and it’s not at all clear that there is, based on the data), it’s so small that it’s probably not worth caring about. Here’s the key figure:

owen_et_al

You could argue that the fact the y-axis spans the full range of possible values (rather than fitting the range of observed variation) is a bit misleading, since it’s only going to make any effects seem even smaller. But even so, it’s pretty clear these are not exactly large effects (and note that the key comparison is not the difference between light and dark bars, but the relative change from light to dark across the different groups).

Now, people who are invested (either intellectually or financially) in the efficacy of cognitive training programs might disagree, arguing that an effect of one-fifth of a standard deviation isn’t actually a tiny effect, and that there are arguably many situations in which that would be a meaningful boost in performance. But that’s the best possible estimate, and probably overstates the actual benefit. And there’s also the opportunity cost to consider: the average participant completed 20 – 30 training sessions, which, even at just 20 minutes a session (an estimate based on the description of the length of each of the training tasks), would take about 8 – 10 hours to complete (and some participants no doubt spent many more hours in training).  That’s a lot of time that could have been invested in other much more pleasant things, some of which might also conceivably improve cognitive ability (e.g., doing Sudoku puzzles, which many people actually seem to enjoy). Owen et al put it nicely:

To illustrate the size of the transfer effects observed in this study, consider the following representative example from the data. The increase in the number of digits that could be remembered following training on tests designed, at least in part, to improve memory (for example, in experimental group 2) was three-hundredth of a digit. Assuming a linear relationship between time spent training and improvement, it would take almost four years of training to remember one extra digit. Moreover, the control group improved by two-tenths of a digit, with no formal memory training at all.

If someone asked you if you wanted to spend six weeks doing a “brain training” program that would provide those kinds of returns, you’d probably politely (or impolitely) refuse. Especially since it’s not like most of us spend much of our time doing digit span tasks anyway; odds are that the kinds of real-world problems we’d like to perform a little better at (say, something trivial like figuring out what to buy or not to buy at the grocery store) are even further removed from the tasks Owen et al (and other groups) have used to test for transfer, so any observable benefits in the real world would presumably be even smaller.

Of course, no study is perfect, and there are three potential concerns I can see. The first is that it’s possible that there are subgroups within the tested population who do benefit much more from the cognitive training. That is, the miniscule overall effect could be masking heterogeneity within the sample, such that some people (say, maybe men above 60 with poor diets who don’t like intellectual activities) benefit much more. The trouble with this line of reasoning, though, is that the overall effects in the entire sample are so small that you’re pretty much forced to conclude that either (a) any group that benefits substantially from the training is a very small proportion of the total sample, or (b) that there are actually some people who suffer as a result of cognitive training, effectively balancing out the gains seen by other people. Neither of these possibilities seem particularly attractive.

The second concern is that it’s conceivable that the control group isn’t perfectly matched to the experimental group, because, by the authors’ own admission, the retention rate was much lower in the control group. Participants were randomly assigned to the three groups, but only about two-thirds as many control participants completed the study. The higher drop-out rate was apparently due to the fact that the obscure trivia questions used as a control task were pretty boring. The reason that’s a potential problem is that attrition wasn’t random, so there may be a systematic difference between participants in the experimental conditions and those in the control conditions. In particular, it’s possible that the remaining control participants had a higher tolerance for boredom and/or were somewhat smarter or more intellectual on average (answering obscure trivia questions clearly isn’t everyone’s cup of tea). If that were true, the lack of any difference between experimental and control conditions might be due to participant differences rather than an absence of a true training effect. Unfortunately, it’s hard to determine whether this might be true, because (as far as I can tell) Owen et al don’t provide the raw mean performance scores on the pre- and post-training testing for each group, but only report the changes in performance. What you’d want to know is that the control participants didn’t do substantially better or worse on the pre-training testing than the experimental participants (due to selective attrition of low-performing subjects), which might make changes in performance difficult to interpret. But at face value, it doesn’t seem very plausible that this would be a serious issue.

Lastly, Owen et al do report a small positive correlation between number of training sessions performed (which was under participants’ control) and gains in performance on the post-training test. Now, this effect was, as the authors note, very small (a maximal Spearman’s rho of .06), so that it’s also not really likely to have practical implications. Still, it does suggest that performance increases as a function of practice. So if we’re being pedantic, we should say that intensive cognitive training may improve cognitive performance in a generalized way, but that the effect is really minuscule and probably not worth the time and effort required to do the training in the first place. Which isn’t exactly the type of careful and measured claim that the people who sell brain training programs are generally interested in making.

At any rate, setting aside the debate over whether cognitive training works or not, one thing that’s perplexed me for a long time about the training literature is why people focus to such an extent on cognitive training rather than other training regimens that produce demonstrably larger transfer effects. I’m thinking in particular of aerobic exercise, which produces much more robust and replicable effects on cognitive performance. There’s a nice meta-analysis by Colcombe and colleagues that found effect sizes on the order of half a standard deviation and up for physical exercise in older adults–and effects were particularly large for the most heavily g-loaded tasks. Now, even if you allow for publication bias and other manifestations of the fudge factor, it’s almost certain that the true effect of physical exercise on cognitive performance is substantially larger than the (very small) effects of cognitive training as reported by Owen et al and others.

The bottom line is that, based on everything we know at the moment, the evidence seems to pretty strongly suggest that if your goal is to improve cognitive function, you’re more likely to see meaningful results by jogging or swimming regularly than by doing crossword puzzles or N-back tasks–particularly if you’re older. And of course, a pleasant side effect is that exercise also improves your health and (for at least some people) mood, which I don’t think N-back tasks do. Actually, many of the participants I’ve tested will tell you that doing the N-back is a distinctly dysphoric experience.

On a completely unrelated note, it’s kind of neat to see a journal like Nature publish what is essentially a null result. It goes to show that people do care about replication failures in some cases–namely, in those cases when the replication failure contradicts a relatively large existing literature, and is sufficiently highly powered to actually say something interesting about the likely effect sizes in question.

ResearchBlogging.org
Owen AM, Hampshire A, Grahn JA, Stenton R, Dajani S, Burns AS, Howard RJ, & Ballard CG (2010). Putting brain training to the test. Nature PMID: 20407435

what’s adaptive about depression?

Friday, February 26th, 2010

Jonah Lehrer has an interesting article in the NYT magazine about a recent Psych Review article by Paul Andrews and J. Anderson Thomson. The basic claim Andrews and Thomson make in their paper is that depression is “an adaptation that evolved as a response to complex problems and whose function is to minimize disruption of rumination and sustain analysis of complex problems”. Lehrer’s article is, as always, engaging, and he goes out of his way to obtain some critical perspectives from other researchers not affiliated with Andrews & Thomson’s work. It’s definitely worth a read.

In reading Lehrer’s article and the original paper, two things struck me. One is that I think Lehrer slightly exaggerates the novelty of Andrews and Thomson’s contribution. The novel suggestion of their paper isn’t that depression can be adaptive under the right circumstances (I think most people already believe that, and as Lehrer notes, the idea traces back a long way); it’s that the specific adaptive purpose of depression is to facilitate solving of complex problems. I think Andrews and Thomson’s paper received a somewhat critical reception (which Lehrer discusses) not so much because people found the suggestion that depression might be adaptive objectionable, but because there are arguably more plausible things depression could have been selected for. Lehrer mentions a few:

Other scientists, including Randolph Nesse at the University of Michigan, say that complex psychiatric disorders like depression rarely have simple evolutionary explanations. In fact, the analytic-rumination hypothesis is merely the latest attempt to explain the prevalence of depression. There is, for example, the “plea for help” theory, which suggests that depression is a way of eliciting assistance from loved ones. There’s also the “signal of defeat” hypothesis, which argues that feelings of despair after a loss in social status help prevent unnecessary attacks; we’re too busy sulking to fight back. And then there’s “depressive realism”: several studies have found that people with depression have a more accurate view of reality and are better at predicting future outcomes. While each of these speculations has scientific support, none are sufficient to explain an illness that afflicts so many people. The moral, Nesse says, is that sadness, like happiness, has many functions.

Personally, I find these other suggestions more plausible than the Andrews and Thomson story (if still not terribly compelling). There are a variety of reasons for this (see Jerry Coyne’s twin posts for some of them, along with the many excellent comments), but one pretty big one is that is that they’re all at least somewhat more consistent with a continuity hypothesis under which many of the selection pressures that influenced the structure of the human mind have been at work in our lineage for millions of years. That’s to say, if you believe in a “signal of defeat” account, you don’t have to come up with complex explanations for why human depression is adaptive (the problem being that other mammals don’t seem to show an affinity for ruminating over complex analytical problems); you can just attribute depression to much more general selection pressures found in other animals as well.

One hypothesis I particularly like in this respect, related to the signal-of-defeat account, is that depression is essentially just a human manifestation of a general tendency toward low self-confidence and aggression. The value of low self-confidence is pretty obvious: you don’t challenge the alpha male, so you don’t get into fights; you only chase prey you think you can safely catch; and so on. Now suppose humans inherited this basic architecture from our ancestral apes. In human societies there’s still a clear potential benefit to being subservient and non-confrontational; it’s a low-risk, low-reward strategy. If you don’t bother anyone, you’re probably not going to get the girl impress the opposite sex very much, but at least you won’t get clubbed over the head by a competitor very often. So there’s a sensible argument to be made for frequency dependent selection for depression-related traits (the reason it’s likely to be frequency dependent is that if you ever had a population made up entirely of self-doubting, non-aggressive individuals, being more aggressive would probably become highly advantageous, so at some point, you’d achieve a stable equilibrium).

So where does rumination–the main focus of the Andrews and Thomson paper–come into the picture? Well, I don’t know for sure, but here’s a pretty plausible just-so story: once you evolve the capacity to reason intelligently about yourself, you now have a higher cognitive system that’s naturally going to want to understand why it feels the way it does so often. If you’re someone who feels pretty upset about things much of the time, you’re going to think about those things a lot. So… you ruminate. And that’s really all you need! Saying that depression is adaptive doesn’t require you to think of every aspect of depression (e.g., rumination) as a complex and human-specific adaptation; it seems more parsimonious to see depressive rumination as a non-adaptive by-product of a more general and (potentially) adaptive disposition to experience negative affect.  On this type of account, ruminating isn’t actually helping a depressed person solve any problems at all. In fact, you could even argue that rumination shouldn’t make you feel better, or it would defeat the very purpose of having a depressive nature in the first place. In other words, it’s entirely consistent with the basic argument that depression is adaptive under some circumstances that the very purpose of rumination might be to keep depressed people in a depressed state. I don’t have any direct evidence for this, of course; it’s a just-so story. But it’s one that is, in my opinion (a) more plausible and (b) more consistent with indirect evidence (e.g., that rumination generally doesn’t seem to make people feel better!) than the Andrews and Thomson view.

The other thing that struck me about the Andrews and Thomson paper, and to a lesser extent, Lehrer’s article, is that the focus is (intentionally) squarely on whether and why depression is adaptive from an evolutionary standpoint. But it’s not clear that the average person suffering from depression really cares, or should care, about whether their depression exists for some distant evolutionary reason. What’s much more germane to someone suffering from depression is whether their depression is actually increasing their quality of life, and in that respect, it’s pretty difficult to make a positive case. The argument that rumination is adaptive because it helps you solve complex analytical problems is only compelling if you think that those problems are really worth mulling over deeply in the first place. For most of the things that depressed people tend to ruminate over (most of which aren’t life-changing decisions, but trivial things like whether your co-workers hate you because of the unfashionable shirt you wore to work yesterday), that just doesn’t seem to be the case. So the argument becomes circular: rumination helps you solve problems that a happier person probably wouldn’t have been bothered by in the first place. Now, that isn’t to say that there aren’t some very specific environments in which depression might still be adaptive today; it’s just that there don’t seem to be very many of them. If you look at the data, it’s quite clear that, on average, depression has very negative effects. People lose friends, jobs, and the joy of life because of their depression; it’s hard to see what monumental problem-solving insight could possibly compensate for that in most cases. By way of analogy, saying that depression is adaptive because it promotes rumination seems kind of like saying that cigarettes serve an adaptive purpose because they make nicotine withdrawal go away. Well, maybe. But wouldn’t you rather not have the withdrawal symptoms to begin with?

To be clear, I’m not suggesting that we should view depression solely in pathological terms, and should entirely write off the possibility that there are some potentially adaptive aspects to depression (or personality traits that go along with it). Rather, the point is that, if you’re suffering from depression, it’s not clear what good it’ll do you to learn that some of your ancestors may have benefited from their depressive natures. (By the same token, you wouldn’t expect a person suffering from sickle-cell anemia to gain much comfort from learning that they carry two copies of a mutation that, in a heterozygous carrier, would confer a strong resistance to malaria.) Conversely, there’s a very real danger here, in the sense that, if Andrews and Thomson are wrong about rumination being adaptive, they might be telling people it’s OK to ruminate when in fact excessive rumination could be encouraging further depression. My sense is that that’s actually the received wisdom right now (i.e., much of cognitive-behavioral therapy is focused on getting depressed individuals to recognize their ruminative cycles and break out of them). So the concern is that too much publicity might be a bad thing in this case, and, far from heralding the arrival of a new perspective on the conceptualization and treatment of depression, may actually be hurting some people. Ultimately, of course, it’s an empirical matter, and certainly not one I have any conclusive answers to. But what I can quite confidently assert in the meantime is that the Lehrer article is an enjoyable read, so long as you read it with a healthy dose of skepticism.

ResearchBlogging.org
Andrews, P., & Thomson, J. (2009). The bright side of being blue: Depression as an adaptation for analyzing complex problems. Psychological Review, 116 (3), 620-654 DOI: 10.1037/a0016242

internet use causes depression! or not.

Thursday, February 4th, 2010

I have a policy of not saying negative things about people (or places, or things) on this blog, and I think I’ve generally been pretty good about adhering to that policy. But I also think it’s important for scientists to speak up in cases where journalists or other scientists misrepresent scientific research in a way that could have a potentially large impact on people’s behavior, and this is one of those cases. All day long, media outlets have been full of reports about a new study that purportedly reveals that the internet–that most faithful of friends, always just a click away with its soothing, warm embrace–has a dark side: using it makes you depressed!

In fairness, most of the stories have been careful to note that the  study only “links” heavy internet use to depression, without necessarily implying that internet use causes depression. And the authors acknowledge that point themselves:

“While many of us use the Internet to pay bills, shop and send emails, there is a small subset of the population who find it hard to control how much time they spend online, to the point where it interferes with their daily activities,” said researcher Dr. Catriona Morrison, of the University of Leeds, in a statement. “Our research indicates that excessive Internet use is associated with depression, but what we don’t know is which comes first. Are depressed people drawn to the Internet or does the Internet cause depression?”

So you might think all’s well in the world of science and science journalism. But in other places, the study’s authors weren’t nearly so circumspect. For example, the authors suggest that 1.2% of the population can be considered addicted to the internet–a rate they claim is double that of compulsive gambling; and they suggest that their results “feed the public speculation that overengagement in websites that serve/replace a social function might be linked to maladaptive psychological functioning,” and “add weight to the recent suggestion that IA should be taken seriously as a distinct psychiatric construct.”

These are pretty strong claims; if the study’s findings are to be believed, we should at least be seriously considering the possibility that using the internet is making some of us depressed. At worst, we should be diagnosing people with internet addiction and doing… well, presumably something to treat them.

The trouble is that it’s not at all clear that the study’s findings should be believed. Or at least, it’s not clear that they really support any of the statements made above.

Let’s start with what the study (note: restricted access) actually shows. The authors, Catriona Morrison and Helen Gore (M&G), surveyed 1,319 subjects via UK-based social networking sites. They had participants fill out 3 self-report measures: the Internet Addiction Test (IAT), which measures dissatisfaction with one’s internet usage; the Internet Function Questionnaire, which asks respondents to indicate the relative proportion of time they spend on different internet activities (e.g., e-mail, social networking, porn, etc.); and the Beck Depression Inventory (BDI), a very widely-used measure of depression.

M&G identify a number of findings, three of which appear to support most of their conclusions. First, they report a very strong positive correlation (r = .49) between internet addiction and depression scores; second, they identify a small group of 18 subjects (1.2%) who they argue qualify as internet addicts (IA group) based on their scores on the IAT; and third, they suggest that people who used the internet more heavily “spent proportionately more time on online gaming sites, sexually gratifying websites, browsing, online communities and chat sites.”

These findings may sound compelling, but there are a number of methodological shortcomings of the study that make them very difficult to interpret in any meaningful way. As far as I can tell, none of these concerns are addressed in the paper:

First, participants were recruited online, via social networking sites. This introduces a huge selection bias: you can’t expect to obtain accurate estimates of how much, and how adaptively, people use the internet by sampling only from the population of internet users! It’s the equivalent of trying to establish cell phone usage patterns by randomly dialing only land-line numbers. Not a very good idea. And note that, not only could the study not reach people who don’t use the internet, but it was presumably also more likely to oversample from heavy internet users. The more time a person spends online, the greater the chance they’d happen to run into the authors recruitment ad. People who only check their email a couple of times a week would be very unlikely to participate in the study. So the bottom line is, the 1.2% figure the authors arrive at is almost certainly a gross overestimate. The true proportion of people who meet the authors’ criteria for internet addiction is probably much lower. It’s hard to believe the authors weren’t aware of the issue of selection bias, and the massive problem it presents for their estimates, yet they failed to mention it anywhere in their paper.

Second, the cut-off score for being placed in the IA group appears to be completely arbitrary. The Internet Addiction Test itself was developed by Kimberly Young in a 1998 book entitled “Caught in the Net: How to Recognize the Signs of Internet Addiction–and a Winning Strategy to Recovery”. The test was introduced, as far as I can tell (I haven’t read the entire book, just skimmed it in Google Books), with no real psychometric validation. The cut-off of 80 points out of a maximum 100 possible as a threshold for addiction appears to be entirely arbitrary (in fact, in Young’s book, she defines the cut-off as 70; for reasons that are unclear, M&G adopted a cut-off of 80). That is, it’s not like Young conducted extensive empirical analysis and determined that people with scores of X or above were functionally impaired in a way that people with scores below X weren’t; by all appearances, she simply picked numerically convenient cut-offs (20 – 39 is average; 40 – 69 indicates frequent problems; and 70+ basically means the internet is destroying your life). Any small change in the numerical cut-off would have translated into a large change in the proportion of people in M&G’s sample who met criteria for internet addiction, making the 1.2% figure seem even more arbitrary.

Third, M&G claim that the Internet Function Questionnaire they used asks respondents to indicate the proportion of time on the internet that they spend on each of several different activities. For example, given the question “How much of your time online do you spend on e-mail?”, your options would be 0-20%, 21-40%, and so on. You would presume that all the different activities should sum to 100%; after all, you can’t really spend 80% of your online time gaming, and then another 80% looking at porn–unless you’re either a very talented gamer, or have an interesting taste in “games”. Yet, when M&G report absolute numbers for the different activities in tables, they’re not given in percentages at all. Instead, one of the table captions indicates that the values are actually coded on a 6-point Likert scale ranging from “rarely/never” to “very frequently”. Hopefully you can see why this is a problem: if you claim (as M&G do) that your results reflect the relative proportion of time that people spend on different activities, you shouldn’t be allowing people to essentially say anything they like for each activity. Given that people with high IA scores report spending more time overall than they’d like online, is it any surprise if they also report spending more time on individual online activities? The claim that high-IA scorers spend “proportionately more” time on some activities just doesn’t seem to be true–at least, not based on the data M&G report. This might also explain how it could be that IA scores correlated positively with nearly all individual activities. That simply couldn’t be true for real proportions (if you spend proportionately more time on e-mail, you must be spending proportionately less time somewhere else), but it makes perfect sense if the response scale is actually anchored with vague terms like “rarely” and “frequently”.

Fourth, M&G consider two possibilities for the positive correlation between IAT and depression scores: (a) increased internet use causes depression, and (b) depression causes increased internet use. But there’s a third, and to my mind far more plausible, explanation: people who are depressed tend to have more negative self-perceptions, and are much more likely to endorse virtually any question that asks about dissatisfaction with one’s own behavior. Here are a couple of examples of questions on the IAT: “How often do you fear that life without the Internet would be boring, empty, and joyless?” “How often do you try to cut down the amount of time you spend on-line and fail?” Notice that there are really two components to these kinds of questions. One component is internet-specific: to what extent are people specifically concerned about their behavior online, versus in other domains? The other component is a general hedonic one, and has to do with how dissatisfied you are with stuff in general. Now, is there any doubt that, other things being equal, someone who’s depressed is going to be more likely to endorse an item that asks how often they fail at something? Or how often their life feels empty and joyless–irrespective of cause? No, of course not. Depressive people tend to ruminate and worry about all sorts of things. No doubt internet usage is one of those things, but that hardly makes it special or interesting. I’d be willing to bet money that if you created a Shoelace Tying Questionnaire that had questions like “How often do you worry about your ability to tie your shoelaces securely?” and “How often do you try to keep your shoelaces from coming undone and fail?”, you’d also get a positive correlation with BDI scores. Basically, depression and trait negative affect tend to correlate positively with virtually every measure that has a major evaluative component. That’s not news. To the contrary, given the types of questions on the IAT, it would have been astonishing if there wasn’t a robust positive correlation with depression.

Fifth, and related to the previous point, no evidence is ever actually provided that people with high IAT scores differ in their objective behavior from those with low scores. Remember, this is all based on self-report. And not just self-report, but vague self-report. As far as I can tell, M&G never asked respondents to estimate how much time they spent online in a given week. So it’s entirely possible that people who report spending too much time online don’t actually spend much more time online than anyone else; they just feel that way (again, possibly because of a generally negative disposition). There’s actually some support for this idea: A 2004 study that sought to validate the IAT psychometrically found only a .22 correlation between IAT scores and self-reported time spent online. Now, a .22 correlation is perfectly meaningful, and it suggests that people who feel they spend too much time online also estimate that they really do spend more time online (though, again, bias is a possibility here too). But it’s a much smaller correlation than the one between IAT scores and depression, which fits with the above idea that there may not be any real “link” between internet use and depression above and beyond the fact that depressed individuals are more likely to more negatively-worded items.

Finally, even if you ignore the above considerations, and decide to conclude that there is in fact a non-artifactual correlation between depression and internet use, there’s really no reason you would conclude that that’s a bad thing (which M&G hedge on, and many of the news articles haven’t hesitated to play up). It’s entirely plausible that the reason depressed individuals might spend more time online is because it’s an effective form of self-medication. If you’re someone who has trouble mustering up the energy to engage with the outside world, or someone who’s socially inhibited, online communities might provide you with a way to fulfill your social needs in a way that you would otherwise not have been able to. So it’s quite conceivable that heavy internet use makes people less depressed, not more; it’s just that the people who are more likely to use the internet heavily are more depressed to begin with. I’m not suggesting that this is in fact true (I find the artifactual explanation for the IAT-BDI correlation suggested above much more plausible), but just that the so-called “dark side” of the internet could actually be a very good thing.

In sum, what can we learn from M&G’s paper? Not that much. To be fair, I don’t necessarily think it’s a terrible paper; it has its limitations, but every paper does. The problem isn’t so much that the paper is bad; it’s that the findings it contains were blown entirely out of proportion, and twisted to support headlines (most of them involving the phrase “The Dark Side”) that they couldn’t possibly support. The internet may or may not cause depression (probably not), but you’re not going to get much traction on that question by polling a sample of internet respondents, using measures that have a conceptual overlap with depression, and defining groups based on arbitrary cut-offs. The jury remains open, of course, but these findings by themselves don’t really give us any reason to reconsider or try to change our online behavior.

ResearchBlogging.org
Morrison, C., & Gore, H. (2010). The Relationship between Excessive Internet Use and Depression: A Questionnaire-Based Study of 1,319 Young People and Adults Psychopathology, 43 (2), 121-126 DOI: 10.1159/000277001

better tools for mining the scientific literature

Sunday, January 31st, 2010

Freethinker’s Asylum has a great post reviewing a number of tools designed to help researchers mine the scientific literature–an increasingly daunting task. The impetus for the post is this article in the latest issue of Nature (note: restricted access), but the FA post discusses a lot of tools that the Nature article doesn’t, and focuses in particular on websites that are currently active and publicly accessible, rather than on proprietary tools currently under development in dark basement labs and warehouses. I hadn’t seen most of these before, but am looking forward to trying them out–e.g., pubget:

When you create an account, pubget signs in to your institution and allows you to search the subscribed resources. When you find a reference you want, just click the pdf icon and there it is. No clicking through to content provider websites. You can tag references as “keepers” to come back to them later, or search for the newest articles from a particular journal.

Sounds pretty handy…

Many of the other sites–as well as most of those discussed in the Nature article–focus on data and literature mining in specific fields, e.g., PubGene and PubAnatomy. These services, which allow you to use specific keywords or topics (e.g., specific genes) to constrain literature searches, aren’t very useful to me personally. But it’s worth pointing out that there are some emerging services that fill much the same niche in the world of cognitive neuroscience that I’m more familiar with. The one that currently looks most promising, in my opinion, is the Cognitive Atlas project led by Russ Poldrack, which is “a collaborative knowledge building project that aims to develop a knowledge base (or ontology) that characterizes the state of current thought in cognitive science. … The Cognitive Atlas aims to capture knowledge from users with expertise in psychology, cognitive science, and neuroscience.”

The Cognitive Atlas is officially still in beta, and you need to have a background in cognitive neuroscience in order to sign up to contribute. But there’s already some content you can navigate, and the site, despite being in the early stages of development, is already pretty impressive. In the interest of full disclosure, as well as shameless plugging, I should note that Russ will be giving a talk about the Cognitive Atlas project as part of a symposium I’m chairing at CNS in Montreal this year. So if you want to learn more about it, stop by! Meantime, check out the Freethinker’s Asylum post for links to all sorts of other interesting tools…

how to measure 200 personality scales in 200 items

Sunday, January 17th, 2010

One of the frustrating things about personality research–for both researchers and participants–is that personality is usually measured using self-report questionnaires, and filling out self-report questionnaires can take a very long time. It doesn’t have to take a very long time, mind you; some questionnaires are very short, like the widely-used Ten-Item Personality Inventory (TIPI), which might take you a whole 2 minutes to fill out on a bad day. So you can measure personality quickly if you have to. But more often than not, researchers want to reliably measure a broad range of different personality traits, and that typically requires administering one or more long-ish questionnaires. For example, in my studies, I often give participants a battery of measures to fill out that includes some combination of the NEO-PI-R, EPQ-R, BIS/BAS scales, UPPS, GRAPES, BDI, TMAS, STAI, and a number of others. That’s a large set of acronyms, and yet it’s just a small fraction of what’s out there; every personality psychologist has his or her own set of favorite measures, and at personality conferences, duels-to-the-death often break out over silly things like whether measure X is better than measure Y, or whether measures A and B can be used interchangeably when no one’s looking. Personality measurement is a pretty intense sport.

The trouble with the way we usually measure personality is that it’s wildly inefficient, for two reasons. One is that many measures are much longer than they need to be. It’s not uncommon to see measures that score each personality trait using a dozen or more different items. In theory, the benefit of this type of redundancy is that you get a more reliable measure, because the error terms associated with individual items tends to cancel out. For example, if you want to know if I’m a depressive kind of guy, you shouldn’t just ask me, “hey, are you depressed?”, because lots of random factors could influence my answer to that one question. Instead, you should ask me a bunch of different questions, like: “hey, are you depressed?” and “why so glum, chum?”, and “does somebody need a hug?”. Adding up responses from multiple items is generally going to give you a more reliable measure. But in practice, it turns out that you typically don’t need more than a handful of items to measure most traits reliably. When people develop “short forms” of measures, the abbreviated scales often have just 4 – 5 items per trait, usually with relatively little loss of reliability and validity. So the fact that most of the measures we use have so many items on them is sort of a waste of both researchers’ and participants’ time.

The other reason personality measurement is inefficient is that most researchers recognize that different personality measures tend to measure related aspects of personality, and yet we persist in administering a whole bunch of questionnaires with similar content to our participants. If you’ve ever participated in a psychology experiment that involved filling out personality questionnaires, there’s a good chance you’ve wondered whether you’re just filling out the same questionnaire over and over. Well you are–kind of. Because the space of personality variation is limited (people can only differ from one another in so many ways), and because many personality constructs have complex interrelationships with one another, personality measures usually end up asking similarly-worded questions. So for example, one measure might give you Extraversion and Agreeableness scores whereas another gives you Dominance and Affiliation scores. But then it turns out that the former pair of dimensions can be “rotated” into the latter two; it’s just a matter of how you partition (or label) the variance. So really, when a researcher gives his or her participants a dozen measures to fill out, that’s not because anyone thinks that there are really a dozen completely different sets of traits to measures; it’s more because we recognize that each instrument gives you a slightly different take on personality, and we tend to think that having multiple potential viewpoints is generally a good thing.

Inefficient personality measurement isn’t inevitable; as I’ve already alluded to above, a number of researchers have developed abbreviated versions of common inventories that capture most of the same variance as much longer instruments. Probably the best-known example is the aforementioned TIPI, developed by Sam Gosling and colleagues, which gives you a workable index of people’s relative standing on the so-called Big Five dimensions of personality. But there are relatively few such abbreviated measures. And to the best of my knowledge, the ones that do exist are all focused on abbreviating a single personality measure. That’s unfortunate, because if you believe that most personality inventories have a substantial amount of overlap, it follows that you should be able to recapture scores on multiple different personality inventories using just one set of (non-redundant) items.

That’s exactly what I try to demonstrate in a paper to be published in the Journal of Research in Personality. The article’s entitled “The abbreviation of personality: How to measure 200 personality scales in 200 items“, which is a pretty accurate, if admittedly somewhat grandiose, description of the contents. The basic goal of the paper is two-fold. First, I develop an automated method for abbreviating personality inventories (or really, any kind of measure with multiple items and/or dimensions). The idea here is to shorten the time and effort required in order to generate shorter versions of existing measures, which should hopefully encourage more researchers to create such short forms. The approach I develop relies heavily on genetic algorithms, which are tools for programmatically obtaining high-quality solutions to high-dimensional problems using simple evolutionary principles. I won’t go into the details (read the paper if you want them!), but I think it works quite well. In the first two studies reported in the paper (data for which were very generously provided by Sam Gosling and Lew Goldberg, respectively), I show that you can reduce the length of existing measures (using the Big Five Inventory and the NEO-PI-R as two examples) quite dramatically with minimal loss of validity. It only takes a few minutes to generate the abbreviated measures, so in theory, it should be possible to build up a database of abbreviated versions of many different measures. I’ve started to put together a site that might eventually serve that purpose (shortermeasures.com), but it’s still in the preliminary stages of development, and may or may not get off the ground.

The other main goal of the paper is to show that the same general approach can be applied to simultaneously abbreviate more than one different measure. To make the strongest case I could think of, I took 8 different broadband personality inventories (“broadband” here just means they each measure a relatively large number of personality traits) that collectively comprise 203 different personality scales and 2,091 different items. Using the same genetic algorithm-based approach, I then reduce these 8 measures down to a single inventory that contains only 181 items (hence the title of the paper). I named the inventory the AMBI (Analog to Multiple Broadband Inventories), and it’s now freely available for use (items and scoring keys are provided both in the paper and at shortermeasures.com). It’s certainly not perfect–it does a much better job capturing some scales than others–but if you have limited time available for personality measures, and still want a reasonably comprehensive survey of different traits, I think it does a really nice job. Certainly, I’d argue it’s better than having to administer many hundreds (if not thousands) of different items to achieve the same effect. So if you have about 15 – 20 minutes to spare in a study and want some personality data, please consider trying out the AMBI!

ResearchBlogging.org

Yarkoni, T. (2010). The Abbreviation of Personality, or how to Measure 200 Personality Scales with 200 Items Journal of Research in Personality DOI: 10.1016/j.jrp.2010.01.002

solving the file drawer problem by making the internet the drawer

Thursday, November 26th, 2009

UPDATE 11/22/2011 — Hal Pashler’s group at UCSD just introduced a new website called PsychFileDrawer that’s vastly superior in every way to the prototype I mention in the post below; be sure to check it out!

Science is a difficult enterprise, so scientists have many problems. One particularly nasty problem is the File Drawer Problem. The File Drawer Problem is actually related to another serious scientific problem known as the Desk Problem. The Desk Problem is that many scientists have messy desks covered with overflowing stacks of papers, which can make it very hard to find things on one’s desk–or, for that matter, to clear enough space to lay down another stack of papers.  A common solution to the Desk Problem is to shove all of those papers into one’s file drawer. Which brings us to the the File Drawer Problem. The File Drawer Problem refers to the fact that, eventually, even the best-funded of scientists run out of room in their file drawers.

Ok, so that’s not exactly right. What the file drawer problem–a term coined by Robert Rosenthal in a seminal 1979 article–really refers to is the fact that null results tend to go unreported in the scientific literature at a much higher rate than positive findings, because journals don’t like to publish papers that say “we didn’t find anything”, and as a direct consequence, authors don’t like to write papers that say “journals won’t want to publish this”.

Because of this blatant prejudice systematic bias against null results, the eventual resting place of many a replication failure is its author’s file drawer. The reason this is a problem is that, over the long term, if only (or mostly) positive findings ever get published, researchers can get a very skewed picture of how strong an effect really is. To illustrate, let’s say that Joe X publishes a study showing that people with lawn gnomes in their front yards tend to be happier than people with no lawn gnomes in their yards. Intuitive as that result may be, someone is inevitably going to get the crazy idea that this effect is worth replicating once or twice before we all stampede toward Home Depot or the Container Store with our wallets out (can you tell I’ve never bought a lawn gnome before?). So let’s say Suzanna Y and Ramesh Z each independently try to replicate the effect in their labs (meaning, they command their graduate students to do it). And they find… nothing! No effect. Turns out, people with lawn gnomes are just as miserable as the rest of us. Well, you don’t need a PhD in lawn decoration to recognize that Suzanna Y and Ramesh Z are not going to have much luck publishing their findings in very prestigious journals–or for that matter, in any journals. So those findings get buried into their file drawers, where they will live out the rest of their days with very sad expressions on their numbers.

Now let’s iterate this process several times. Every couple of years, some enterprising young investigator will decide she’s going to try to replicate that cool effect from 2009, since no one else seems to have bothered to do it. This goes on for a while, with plenty of null results, until eventually, just by chance, someone gets lucky (if you can call a false positive lucky) and publishes a successful replication. And also, once in a blue moon, someone who gets a null result actually bothers to forces their graduate student to write it up, and successfully gets out a publication that very carefully explains that, no, Virginia, lawn gnomes don’t really make you happy. So, over time, a small literature on the hedonic effects of lawn gnomes accumulates.

Eventually, someone else comes across this small literature and notices that it contains “mixed findings”, with some studies finding an effect, and others finding no effect. So this special someone–let’s call them the Master of the Gnomes–decides to do a formal meta-analysis. (A meta-analysis is basically just a fancy way of taking a bunch of other people’s studies, throwing them in a blender, and pouring out the resulting soup into a publication of your very own.) Now you can see why the failure to publish null results is going to be problematic: What the Master of the Gnomes doesn’t know about, the Master of the Gnomes can’t publish about. So any resulting meta-analytic estimate of the association between lawn gnomes and subjective well-being is going to be biased in the positive directio. That is, there’s a good chance that the meta-analysis will end up saying lawn gnomes make people very happy,when in reality lawn gnomes only make people a little happy, or don’t make people happy at all.

There are lots of ways to try to get around the file drawer problem, of course. One approach is to call up everyone you know who you think might have ever done any research on lawn gnomes and ask if you could take a brief peek into their file drawer. But meta-analysts are often very introverted people with no friends, so they may not know any other researchers. Or they might be too shy to ask other people for their data. And then too, some researchers are very protective of their file drawers, because in some cases, they’re hiding more than just papers in there. Bottom line, it’s not always easy to identify all of the null results that are out there.

A very different way to deal with the file drawer problem, and one suggested by Rosenthal in his 1979 article, is to compute a file drawer number, which is basically a number that tells you how many null results that you don’t know about would have to exist in people’s file drawers before the meta-analytic effect size estimate was itself rendered null. So, for example, let’s say you do a meta-analysis of 28 studies, and find that your best estimate, taking all studies into account, is that the standardized effect size (Cohen’s d) is 0.63, which is quite a large effect, and is statistically different from 0 at, say, the p < .00000001 level. Intuitively, that may seem like a lot of zeros, but being a careful methodologist, you decide you’d like a more precise definition of “a lot”. So you compute the file drawer number (in one of its many permutations), and it turns out that there would have to be 4,640,204 null results out there in people’s file drawers before the meta-analytic effect size became statistically non-significant. That’s a lot of studies, and it’s doubtful that there are even that many people studying lawn gnomes, so you can probably feel comfortable that there really is an association there, and that it’s fairly large.

The problem, of course, is that it doesn’t always turn out that way. Sometimes you do the meta-analysis and find that your meta-analytic effect is cutting it pretty close, and that it would only take, say, 12 null results to render the effect non-significant. At that point, the file drawer N is no help; no amount of statistical cleverness is going to give you the extrasensory ability to peer into people’s file drawers at a distance. Moreover, even in cases where you can feel relatively confident that there couldn’t possibly be enough null results out there to make your effect go away entirely, it’s still possible that there are enough null results out there to substantially weaken it. Generally speaking, the file drawer N is a number you compute because you have to, not because you want to. In an ideal world, you’d always have all the information readily available at your fingertips, and all that would be left for you to do is toss it all in the blender and hit “liquify” “meta-analyze”. But of course, we don’t live in an ideal world; we live in a horrible world full of things like tsunamis, lip syncing, and publication bias.

This brings me, in a characteristically long-winded way, to the point of this post. The fact that researchers often don’t have access to other researchers’ findings–null result or not–is in many ways a vestige of the fact that, until recently, there was no good way to rapidly and easily communicate one’s findings to others in an informal way. Of course, the telephone has been around for a long time, and the postal service has been around even longer. But the problem with telling other people what you found on the telephone is that they have to be listening, and you don’t really know ahead of time who’s going to want to hear about your findings. When Rosenthal was writing about file drawers in the 80s, there wasn’t any bulletin board where people could post their findings for all to see without going to the trouble of actually publishing them, so it made sense to focus on ways to work around the file drawer problem instead of through it.

These days, we do have a bulletin board where researchers can post their null results: The internet. In theory, an online database of null results presents an ideal solution to the file drawer problem: Instead of tossing their replication failures into a folder somewhere, researchers could spend a minute or two entering just a minimal amount of information into an online database, and that information would then live on in perpetuity, accessible to anyone else who cared to come along and enter the right keyword into the search box. Such a system could benefit everyone involved: researchers who ended up with unpublishable results could salvage at least some credit for their efforts, and ensure that their work wasn’t entirely lost to the sands of time; prospective meta-analysts could simplify the task of hunting down relevant findings in unlikely places; and scientists contemplating embarking on a new line of research that built heavily on an older finding could do a cursory search to see if other people had already tried (and failed) to replicate the foundational effect.

Sounds good, right? At least, that was my thought process last year, when I spent some time building an online database that could serve as this type of repository for null (and, occasionally, not-null) results. I got a working version up and running at failuretoreplicate.com, and was hoping to spend some time begging people to use it trying to write it up as a short paper, but then I started sinking into the quicksand of my dissertation, and promptly forgot about it. What jogged my memory was this post a couple of days ago, which describes a database, called the Negatome, that contains “a collection of protein and domain (functional units of proteins) pairs thatare unlikely to be engaged in direct physical interactions”. This isn’t exactly the same thing as a database of null results, and is in a completely different field, but it was close enough to rekindle my interest and motivate me to dust off the site I built last year. So now the site is here, and it’s effectively open for business.

I should confess up front that I don’t harbor any great hopes of this working; I suspect it will be quite difficult to build the critical mass needed to make something like this work. Still, I’d like to try. The site is officially in beta, so stuff will probably still break occasionally, but it’s basically functional. You can create an account instantly and immediately start adding studies; it only takes a minute or two per study. There’s no need to enter much in the way of detail; the point isn’t to provide an alternative to peer-reviewed publication, but rather to provide a kind of directory service that researchers could use as a cursory tool for locating relevant information. All you have to do is enter a brief description of the effect you tried to replicate, an indication of whether or not you succeeded, and what branch of psychology the effect falls under. There are plenty of other fields you can enter (e.g., searchable tags, sample sizes, description of procedures, etc.), but they’re almost all optional. The goal is really to make this as effortless as possible for people to use, so that there is no virtually no cost to contributing.

Anyway, right now there’s nothing on the site except a single lonely record I added in order to get things started. I’d be very grateful to anyone who wants to help this project off the ground by adding a study or two. There are full editing and deletion capabilities, so you can always delete anything you add later on if you decide you don’t want to share after all. My hope is that, given enough community involvement and a large enough userbase, this could eventually become a valuable resource psychologists could rely on when trying to establish how likely a finding is to replicate, or when trying to identify relevant studies to include in meta-analyses. You do want to help figure out what effect those sneaky, sneaky lawn gnomes have on our collective mental health, right?

tuesday at 3 pm works for me

Wednesday, October 21st, 2009

Apparently, Tuesday at 3 pm is the best time to suggest as a meeting time–that’s when people have the most flexibility available in their schedule. At least, that’s the conclusion drawn by a study based on data from WhenIsGood, a free service that helps with meeting scheduling. There’s not much to the study beyond the conclusion I just gave away; not surprisingly, people don’t like to meet before 10 or 11 am or after 4 pm, and there’s very little difference in availability across different days of the week.

What I find neat about this isn’t so much the results of the study itself as the fact that it was done at all. I’m a big proponent of using commercial website data for research purposes–I’m about to submit a paper that relies almost entirely on content pulled using the Blogger API, and am working on another project that makes extensive use of the Twitter API. The scope of the datasets one can assemble via these APIs is simply unparalleled; for example, there’s no way I could ever realistically collect writing samples of 50,000+ words from 500+ participants in a laboratory setting, yet the ability to programmatically access blogspot.com blog contents makes the task trivial. And of course, many websites collect data of a kind that just isn’t available off-line. For example, the folks at OKCupid are able to continuously pump out interesting data on people’s online dating habits because they have comprehensive data on interactions between literally millions of prospective dating partners. If you want to try to generate that sort of data off-line, I hope you have a really large lab.

Of course, I recognize that in this case, the WhenIsGood study really just amounts to a glorified press release. You can tell that’s what it is from the URL, which literally includes the “press/” directory in its path. So I’m certainly not naive enough to think that Web 2.0 companies are publishing interesting research based on their proprietary data solely out of the goodness of their hearts. Quite the opposite. But I think in this case the desire for publicity works in researchers’ favor: It’s precisely because virtually any press is considered good press that many of these websites would probably be happy to let researchers play with their massive (de-identified) datasets. It’s just that, so far, hardly anyone’s asked. The Web 2.0 world is a largely untapped resource that researchers (or at least, psychologists) are only just beginning to take advantage of.

I suspect that this will change in the relatively near future. Five or ten years from now, I imagine that a relatively large chunk of the research conducted in many area of psychology (particularly social and personality psychology) will rely heavily on massive datasets derived from commercial websites. And then we’ll all wonder in amazement at how we ever put up with the tediousness of collecting real-world data from two or three hundred college students at a time, when all of this online data was just lying around waiting for someone to come take a peek at it.