There’s a beautiful paper in Nature this week by Adrian Owen and colleagues that provides what’s probably as close to definitive evidence as you can get in any single study that “brain training” programs don’t work. Or at least, to the extent that they do work, the effects are so weak they’re probably not worth caring about.
Owen et al used a very clever approach to demonstrate their point. Rather than spending their time running small-sample studies that require people to come into the lab over multiple sessions (an expensive and very time-intensive effort that’s ultimately still usually underpowered), they teamed up with the BBC program ‘Bang Goes The Theory‘. Participants were recruited via the tv show, and were directed to an experimental website where they created accounts, engaged in “pre-training” cognitive testing, and then could repeatedly log on over the course of six weeks to perform a series of cognitive tasks supposedly capable of training executive abilities. After the training period, participants again performed the same battery of cognitive tests, enabling the researchers to compare performance pre- and post-training.
Of course, you expect robust practice effects with this kind of thing (i.e., participants would almost certainly do better on the post-training battery than on the pre-training battery solely because they’d been exposed to the tasks and had some practice). So Owen et al randomly assigned participants logging on to the website to two different training programs (involving different types of training tasks) or to a control condition in which participants answered obscure trivia questions rather than doing any sort of intensive cognitive training per se. The beauty of doing this all online was that the authors were able to obtain gargantuan sample sizes (several thousand in each condition), ensuring that statistical power wasn’t going to be an issue. Indeed, Owen et al focus almost explicitly on effect sizes rather than p values, because, as they point out, once you have several thousand participants in each group, almost everything is going to be statistically significant, so it’s really the effect sizes that matter.
The critical comparison was whether the experimental groups showed greater improvements in performance post-training than the control group did. And the answer, generally speaking, was no. Across four different tasks, the differences in training-related gains in the experimental group relative to the control group were always either very small (no larger than about a fifth of a standard deviation), or even nonexistent (to the extent that for some comparisons, the control group improved more than the experimental groups!). So the upshot is that if there is any benefit of cognitive training (and it’s not at all clear that there is, based on the data), it’s so small that it’s probably not worth caring about. Here’s the key figure:
You could argue that the fact the y-axis spans the full range of possible values (rather than fitting the range of observed variation) is a bit misleading, since it’s only going to make any effects seem even smaller. But even so, it’s pretty clear these are not exactly large effects (and note that the key comparison is not the difference between light and dark bars, but the relative change from light to dark across the different groups).
Now, people who are invested (either intellectually or financially) in the efficacy of cognitive training programs might disagree, arguing that an effect of one-fifth of a standard deviation isn’t actually a tiny effect, and that there are arguably many situations in which that would be a meaningful boost in performance. But that’s the best possible estimate, and probably overstates the actual benefit. And there’s also the opportunity cost to consider: the average participant completed 20 – 30 training sessions, which, even at just 20 minutes a session (an estimate based on the description of the length of each of the training tasks), would take about 8 – 10 hours to complete (and some participants no doubt spent many more hours in training). That’s a lot of time that could have been invested in other much more pleasant things, some of which might also conceivably improve cognitive ability (e.g., doing Sudoku puzzles, which many people actually seem to enjoy). Owen et al put it nicely:
To illustrate the size of the transfer effects observed in this study, consider the following representative example from the data. The increase in the number of digits that could be remembered following training on tests designed, at least in part, to improve memory (for example, in experimental group 2) was three-hundredth of a digit. Assuming a linear relationship between time spent training and improvement, it would take almost four years of training to remember one extra digit. Moreover, the control group improved by two-tenths of a digit, with no formal memory training at all.
If someone asked you if you wanted to spend six weeks doing a “brain training” program that would provide those kinds of returns, you’d probably politely (or impolitely) refuse. Especially since it’s not like most of us spend much of our time doing digit span tasks anyway; odds are that the kinds of real-world problems we’d like to perform a little better at (say, something trivial like figuring out what to buy or not to buy at the grocery store) are even further removed from the tasks Owen et al (and other groups) have used to test for transfer, so any observable benefits in the real world would presumably be even smaller.
Of course, no study is perfect, and there are three potential concerns I can see. The first is that it’s possible that there are subgroups within the tested population who do benefit much more from the cognitive training. That is, the miniscule overall effect could be masking heterogeneity within the sample, such that some people (say, maybe men above 60 with poor diets who don’t like intellectual activities) benefit much more. The trouble with this line of reasoning, though, is that the overall effects in the entire sample are so small that you’re pretty much forced to conclude that either (a) any group that benefits substantially from the training is a very small proportion of the total sample, or (b) that there are actually some people who suffer as a result of cognitive training, effectively balancing out the gains seen by other people. Neither of these possibilities seem particularly attractive.
The second concern is that it’s conceivable that the control group isn’t perfectly matched to the experimental group, because, by the authors’ own admission, the retention rate was much lower in the control group. Participants were randomly assigned to the three groups, but only about two-thirds as many control participants completed the study. The higher drop-out rate was apparently due to the fact that the obscure trivia questions used as a control task were pretty boring. The reason that’s a potential problem is that attrition wasn’t random, so there may be a systematic difference between participants in the experimental conditions and those in the control conditions. In particular, it’s possible that the remaining control participants had a higher tolerance for boredom and/or were somewhat smarter or more intellectual on average (answering obscure trivia questions clearly isn’t everyone’s cup of tea). If that were true, the lack of any difference between experimental and control conditions might be due to participant differences rather than an absence of a true training effect. Unfortunately, it’s hard to determine whether this might be true, because (as far as I can tell) Owen et al don’t provide the raw mean performance scores on the pre- and post-training testing for each group, but only report the changes in performance. What you’d want to know is that the control participants didn’t do substantially better or worse on the pre-training testing than the experimental participants (due to selective attrition of low-performing subjects), which might make changes in performance difficult to interpret. But at face value, it doesn’t seem very plausible that this would be a serious issue.
Lastly, Owen et al do report a small positive correlation between number of training sessions performed (which was under participants’ control) and gains in performance on the post-training test. Now, this effect was, as the authors note, very small (a maximal Spearman’s rho of .06), so that it’s also not really likely to have practical implications. Still, it does suggest that performance increases as a function of practice. So if we’re being pedantic, we should say that intensive cognitive training may improve cognitive performance in a generalized way, but that the effect is really minuscule and probably not worth the time and effort required to do the training in the first place. Which isn’t exactly the type of careful and measured claim that the people who sell brain training programs are generally interested in making.
At any rate, setting aside the debate over whether cognitive training works or not, one thing that’s perplexed me for a long time about the training literature is why people focus to such an extent on cognitive training rather than other training regimens that produce demonstrably larger transfer effects. I’m thinking in particular of aerobic exercise, which produces much more robust and replicable effects on cognitive performance. There’s a nice meta-analysis by Colcombe and colleagues that found effect sizes on the order of half a standard deviation and up for physical exercise in older adults–and effects were particularly large for the most heavily g-loaded tasks. Now, even if you allow for publication bias and other manifestations of the fudge factor, it’s almost certain that the true effect of physical exercise on cognitive performance is substantially larger than the (very small) effects of cognitive training as reported by Owen et al and others.
The bottom line is that, based on everything we know at the moment, the evidence seems to pretty strongly suggest that if your goal is to improve cognitive function, you’re more likely to see meaningful results by jogging or swimming regularly than by doing crossword puzzles or N-back tasks–particularly if you’re older. And of course, a pleasant side effect is that exercise also improves your health and (for at least some people) mood, which I don’t think N-back tasks do. Actually, many of the participants I’ve tested will tell you that doing the N-back is a distinctly dysphoric experience.
On a completely unrelated note, it’s kind of neat to see a journal like Nature publish what is essentially a null result. It goes to show that people do care about replication failures in some cases–namely, in those cases when the replication failure contradicts a relatively large existing literature, and is sufficiently highly powered to actually say something interesting about the likely effect sizes in question.
Owen AM, Hampshire A, Grahn JA, Stenton R, Dajani S, Burns AS, Howard RJ, & Ballard CG (2010). Putting brain training to the test. Nature PMID: 20407435