Tag Archives: neuroscience

what exactly is it that 53% of neuroscience articles fail to do?

[UPDATE: Jake Westfall points out in the comments that the paper discussed here appears to have made a pretty fundamental mistake that I then carried over to my post. I've updated the post accordingly.]

A new paper in Nature Neuroscience by Emmeke Aarts and colleagues argues that neuroscientists should start using hierarchical  (or multilevel) models in their work in order to account for the nested structure of their data. From the abstract:

In neuroscience, experimental designs in which multiple observations are collected from a single research object (for example, multiple neurons from one animal) are common: 53% of 314 reviewed papers from five renowned journals included this type of data. These so-called ‘nested designs’ yield data that cannot be considered to be independent, and so violate the independency assumption of conventional statistical methods such as the t test. Ignoring this dependency results in a probability of incorrectly concluding that an effect is statistically significant that is far higher (up to 80%) than the nominal α level (usually set at 5%). We discuss the factors affecting the type I error rate and the statistical power in nested data, methods that accommodate dependency between observations and ways to determine the optimal study design when data are nested. Notably, optimization of experimental designs nearly always concerns collection of more truly independent observations, rather than more observations from one research object.

I don’t have any objection to the advocacy for hierarchical models; that much seems perfectly reasonable. If you have nested data, where each subject (or petrie dish or animal or whatever) provides multiple samples, it’s sensible to try to account for as many systematic sources of variance as you can. That point may have been made many times before,  but it never hurts to make it again.

What I do find surprising though–and frankly, have a hard time believing–is the idea that 53% of neuroscience articles are at serious risk of Type I error inflation because they fail to account for nesting. This seems to me to be what the abstract implies, yet it’s a much stronger claim that doesn’t actually follow just from the observation that virtually no studies that have reported nested data have used hierarchical models for analysis. What it also requires is for all of those studies that use “conventional” (i.e., non-hierarchical) analyses to have actively ignored the nesting structure and treated repeated measurements as if they in fact came from entirely different subjects or clusters.

To make this concrete, suppose we have a dataset made up of 400 observations, consisting of 20 subjects who each provided 10 trials in 2 different experimental conditions (i.e., 20 x 2 x 10 = 400). And suppose the thing we ultimately want to know is whether or not there’s a statistical difference in outcome between the two conditions. There are three at least three ways we could set up our comparison:

  1. Ignore the grouping variable (i.e., subject) entirely, effectively giving us 200 observations in each condition. We then conduct the test as if we have 200 independent observations in each condition.
  2. Average the 10 trials in each condition within each subject first, then conduct the test on the subject means. In this case, we effectively have 20 observations in each condition (1 per subject).
  3. Explicitly include the effects of both subject and trial in our model. In this case we have 400 observations, but we’re explictly accounting for the correlation between trials within a given subject, so that the statistical comparison of conditions effectively has somewhere between 20 and 400 “observations” (or degrees of freedom).

Now, none of these approaches is strictly “wrong”, in that there could be specific situations in which any one of them would be called for. But as a general rule, the first approach is almost never appropriate. The reason is that we typically want to draw conclusions that generalize across the cases in the higher level of the hierarchy, and don’t have any intrinsic interest in the individual trials themselves. In the above example, we’re asking whether people on average, behave differently in the two conditions. If we treat our data as if we had 200 subjects in each condition, effectively concatenating trials across all subjects, we’re ignoring the fact that the responses acquired from each subject will tend to be correlated (i.e., Jane Doe’s behavior on Trial 2 will tend to be more similar to her own behavior on Trial 1 than to another subject’s behavior on Trial 1). So we’re pretending that we know something about 200 different individuals sampled at random from the population, when in fact we only know something about 20 different  individuals. The upshot, if we use approach (1), is that we do indeed run a high risk of producing false positives we’re going to end up answering a question quite different from the one we think we’re answering. [Update: Jake Westfall points out in the comments below that we won't necessarily inflate Type I error rate. Rather, the net effect of failing to model the nesting structure properly will depend on the relative amount of within-cluster vs. between-cluster variance. The answer we get will, however, usually deviate considerably from the answer we would get using approaches (2) or (3).]

By contrast, approaches (2) and (3) will, in most cases, produce pretty similar results. It’s true that the hierarchical approach is generally a more sensible thing to do, and will tend to provide a better estimate of the true population difference between the two conditions. However, it’s probably better to describe approach (2) as suboptimal, and not as wrong. So long as the subjects in our toy example above are in fact sampled at random, it’s pretty reasonable to assume that we have exactly 20 independent observations, and analyze our data accordingly. Our resulting estimates might not be quite as good as they could have been, but we’re unlikely to miss the mark by much.

To return to the Aarts et al paper, the key question is what exactly the authors mean when they say in their abstract that:

In neuroscience, experimental designs in which multiple observations are collected from a single research object (for example, multiple neurons from one animal) are common: 53% of 314 reviewed papers from five renowned journals included this type of data. These so-called ‘nested designs’ yield data that cannot be considered to be independent, and so violate the independency assumption of conventional statistical methods such as the t test. Ignoring this dependency results in a probability of incorrectly concluding that an effect is statistically significant that is far higher (up to 80%) than the nominal α level (usually set at 5%).

I’ve underlined the key phrases here. It seems to me that the implication the reader is supposed to draw from this is that roughly 53% of the neuroscience literature is at high risk of reporting spurious results. But in reality this depends entirely on whether the authors mean that 53% of studies are modeling trial-level data but ignoring the nesting structure (as in approach 1 above), or that 53% of studies in the literature aren’t using hierarchical models, even though they may be doing nothing terribly wrong otherwise (e.g., because they’re using approach (2) above).

Unfortunately, the rest of the manuscript doesn’t really clarify the matter. Here’s the section in which the authors report how they obtained that 53% number:

To assess the prevalence of nested data and the ensuing problem of inflated type I error rate in neuroscience, we scrutinized all molecular, cellular and developmental neuroscience research articles published in five renowned journals (Science, Nature, Cell, Nature Neuroscience and every month’s first issue of Neuron) in 2012 and the first six months of 2013. Unfortunately, precise evaluation of the prevalence of nesting in the literature is hampered by incomplete reporting: not all studies report whether multiple measurements were taken from each research object and, if so, how many. Still, at least 53% of the 314 examined articles clearly concerned nested data, of which 44% specifically reported the number of observations per cluster with a minimum of five observations per cluster (that is, for robust multilevel analysis a minimum of five observations per cluster is required11, 12). The median number of observations per cluster, as reported in literature, was 13 (Fig. 1a), yet conventional analysis methods were used in all of these reports.

This is, as far as I can see, still ambiguous. The only additional information provided here is that 44% of studies specifically reported the number of observations per cluster. Unfortunately this still doesn’t tell us whether the effective degrees of freedom used in the statistical tests in those papers included nested observations, or instead averaged over nested observations within each group or subject prior to analysis.

Lest this seem like a rather pedantic statistical point, I hasten to emphasize that a lot hangs on it. The potential implications for the neuroscience literature are very different under each of these two scenarios. If it is in fact true that 53% of studies are inappropriately using a “fixed-effects” model (approach 1)–which seems to me to be what the Aarts et al abstract implies–the upshot is that a good deal of neuroscience research is very bad statistical shape, and the authors will have done the community a great service by drawing attention to the problem. On the other hand, if the vast majority of the studies in that 53% are actually doing their analyses in a perfectly reasonable–if perhaps suboptimal–way, then the Aarts et al article seems rather alarmist. It would, of course, still be true that hierarchical models should be used more widely, but the cost of failing to switch would be much lower than seems to be implied.

I’ve emailed the corresponding author to ask for a clarification. I’ll update this post if I get a reply. In the meantime, I’m interested in others’ thoughts as to the likelihood that around half of the neuroscience literature involves inappropriate reporting of fixed-effects analyses. I guess personally I would be very surprised if this were the case, though it wouldn’t be unprecedented–e.g., I gather that in the early days of neuroimaging, the SPM analysis package used a fixed-effects model by default, resulting in quite a few publications reporting grossly inflated t/z/F statistics. But that was many years ago, and in the literatures I read regularly (in psychology and cognitive neuroscience), this problem rarely arises any more. A priori, I would have expected the same to be true in cellular and molecular neuroscience.

a human and a monkey walk into an fMRI scanner…

Tor Wager and I have a “news and views” piece in Nature Methods this week; we discuss a paper by Mantini and colleagues (in the same issue) introducing a new method for identifying functional brain homologies across different species–essentially, identifying brain regions in humans and monkeys that seem to do roughly the same thing even if they’re not located in the same place anatomically. Mantini et al make some fairly strong claims about what their approach tells us about the evolution of the human brain (namely, that some cortical regions have undergone expansion relative to monkeys, while others have adapted substantively new functions). For reasons we articulate in our commentary, I’m personally not so convinced by the substantive conclusions, but I do think the core idea underlying the method is a very clever and potentially useful one:

Their technique, interspecies activity correlation (ISAC), uses functional magnetic resonance imaging (fMRI) to identify brain regions in which humans and monkeys exposed to the same dynamic stimulus—a 30-minute clip from the movie The Good, the Bad and the Ugly—show correlated patterns of activity (Fig. 1). The premise is that homologous regions should have similar patterns of activity across species. For example, a brain region sensitive to a particular configuration of features, including visual motion, hands, faces, object and others, should show a similar time course of activity in both species—even if its anatomical location differs across species and even if the precise features that drive the area’s neurons have not yet been specified.

Mo Costandi has more on the paper in an excellent Guardian piece (and I’m not just saying that because he quoted me a few times). All in all, I think it’s a very exciting method, and it’ll be interesting to see how it’s applied in future studies. I think there’s a fairly broad class of potential applications based loosely around the same idea of searching for correlated patterns. It’s an idea that’s already been used by Uri Hasson (an author on the Mantini et al paper) and others fairly widely in the fMRI literature to identify functional correspondences across different subjects; but you can easily imagine conceptually similar applications in other fields too–e.g., correlating gene expression profiles across species in order to identify structural homologies (actually, one could probably try this out pretty easily using the mouse and human data available in the Allen Brain Atlas).

ResearchBlogging.orgMantini D, Hasson U, Betti V, Perrucci MG, Romani GL, Corbetta M, Orban GA, & Vanduffel W (2012). Interspecies activity correlations reveal functional correspondence between monkey and human brain areas. Nature methods PMID: 22306809

Wager, T., & Yarkoni, T. (2012). Establishing homology between monkey and human brains Nature Methods DOI: 10.1038/nmeth.1869

of postdocs and publishing models: two opportunities of (possible) interest

I don’t usually use this blog to advertise things (so please don’t send me requests to publicize your third cousin’s upcoming bar mitzvah), but I think these two opportunities are pretty cool. They also happen to be completely unrelated, but I’m too lazy to write two separate posts, so…

Opportunity 1: We’re hiring!

Well, not me personally, but a guy I know. My current postdoc advisor, Tor Wager, is looking to hire up to 4 postdocs in the next few months to work on various NIH-funded projects related to the neural substrates of pain and emotion. You would get to play with fun things like fMRI scanners, thermal stimulators, and machine learning techniques. Oh, and snow, because we’re located in Boulder, Colorado. So we have. A lot. Of snow.

Anyway, Tor is great to work with, the lab is full of amazing people and great resources, and Boulder is a fantastic place to live, so if you have (or expect to soon have) a PhD in affective/cognitive neuroscience or related field and a background in pain/emotion research and/or fMRI analysis and/or machine learning and/or psychophysiology, you should consider applying! See this flyer for more details. And no, I’m not being paid to say this.

Opportunity 2: Design the new science!

That’s a cryptic way of saying that there’s a forthcoming special issue of Frontiers in Computational Neuroscience that’s going to focus on “Visions for Open Evaluation of Scientific Papers by Post-Publication Peer Review.” As far as I can tell, that basically means that if you’re like every other scientist, and think there’s more to scientific evaluation than the number of publications and citations one has, you now have an opportunity to design a perfect evaluation system of your very own–meaning, of course, that system in which you end up at or near the very top.

In all seriousness though, this seems like a really great idea, and I think it’s the kind of thing that could actually have a very large impact on how we’re all doing–or at least communicating–science 10 or 20 years from now. The special issue will be edited by Niko Kriegeskorte, whose excellent ideas about scientific publishing I’ve previously blogged about, and Diana Deca. Send them your best ideas! And then, if it’s not too much trouble, put my name on your paper. You know, as a finder’s fee. Abstracts are due January 15th.

the naming of things

Let’s suppose you were charged with the important task of naming all the various subdisciplines of neuroscience that have anything to do with the field of research we now know as psychology. You might come up with some or all of the following terms, in no particular order:

  • Neuropsychology
  • Biological psychology
  • Neurology
  • Cognitive neuroscience
  • Cognitive science
  • Systems neuroscience
  • Behavioral neuroscience
  • Psychiatry

That’s just a partial list; you’re resourceful, so there are probably others (biopsychology? psychobiology? psychoneuroimmunology?). But it’s a good start. Now suppose you decided to make a game out of it, and threw a dinner party where each guest received a copy of your list (discipline names only–no descriptions!) and had to guess what they thought people in that field study. If your nomenclature made any sense at all, and tried to respect the meanings of the individual words used to generate the compound words or phrases in your list, your guests might hazard something like the following guesses:

  • Neuropsychology: “That’s the intersection of neuroscience and psychology. Meaning, the study of the neural mechanisms underlying cognitive function.”
  • Biological psychology: “Similar to neuropsychology, but probably broader. Like, it includes the role of genes and hormones and kidneys in cognitive function.”
  • Neurology: “The pure study of the brain, without worrying about all of that associated psychological stuff.”
  • Cognitive neuroscience: “Well if it doesn’t mean the same thing as neuropsychology and biological psychology, then it probably refers to the branch of neuroscience that deals with how we think and reason. Kind of like cognitive psychology, only with brains!”
  • Cognitive science: “Like cognitive neuroscience, but not just for brains. It’s the study of human cognition in general.”
  • Systems neuroscience: “Mmm… I don’t really know. The study of how the brain functions as a whole system?”
  • Behavioral neuroscience: “Easy: it’s the study of the relationship between brain and behavior. For example, how we voluntarily generate actions.”
  • Psychiatry: “That’s the branch of medicine that concerns itself with handing out multicolored pills that do funny things to your thoughts and feelings. Of course.”

If this list seems sort of sensible to you, you probably live in a wonderful world where compound words mean what you intuitively think they mean, the subject matter of scientific disciplines can be transparently discerned, and everyone eats ice cream for dinner every night terms that sound extremely similar have extremely similar referents rather than referring to completely different fields of study. Unfortunately, that world is not the world we happen to actually inhabit. In our world, most of the disciplines at the intersection of psychology and neuroscience have funny names that reflect accidents of history, and tell you very little about what the people in that field actually study.

Here’s the list your guests might hand back in this world, if you ever made the terrible, terrible mistake of inviting a bunch of working scientists to dinner:

  • Neuropsychology: The study of how brain damage affects cognition and behavior. Most often focusing on the effects of brain lesions in humans, and typically relying primarily on behavioral evaluations (i.e., no large magnetic devices that take photographs of the space inside people’s skulls). People who call themselves neuropsychologists are overwhelmingly trained as clinical psychologists, and many of them work in big white buildings with a red cross on the front. Note that this isn’t the definition of neuropsychology that Wikipedia gives you; Wikipedia seems to think that neuropsychology is “the basic scientific discipline that studies the structure and function of the brain related to specific psychological processes and overt behaviors.” Nice try, Wikipedia, but that’s much too general. You didn’t even use the words ‘brain damage’, ‘lesion’, or ‘patient’ in the first sentence.
  • Biological psychology: To be perfectly honest, I’m going to have to step out of dinner-guest character for a moment and admit I don’t really have a clue what biological psychologists study. I can’t remember the last time I heard someone refer to themselves as a biological psychologist. To an approximation, I think biological psychology differs from, say, cognitive neuroscience in placing greater emphasis on everything outside of higher cognitive processes (sensory systems, autonomic processes, the four F’s, etc.). But that’s just idle speculation based largely on skimming through the chapter names of my old “Biological Psychology” textbook. What I can definitively confidently comfortably tentatively recklessly assert is that you really don’t want to trust the Wikipedia definition here, because when you type ‘biological psychology‘ into that little box that says ‘search’ on Wikipedia, it redirects you to the behavioral neuroscience entry. And that can’t be right, because, as we’ll see in a moment, behavioral neuroscience refers to something very different…
  • Neurology: Hey, look! A wikipedia entry that doesn’t lie to our face! It says neurology is “a medical specialty dealing with disorders of the nervous system. Specifically, it deals with the diagnosis and treatment of all categories of disease involving the central, peripheral, and autonomic nervous systems, including their coverings, blood vessels, and all effector tissue, such as muscle.” That’s a definition I can get behind, and I think 9 out of 10 dinner guests would probably agree (the tenth is probably drunk). But then, I’m not (that kind of) doctor, so who knows.
  • Cognitive neuroscience: In principle, cognitive neuroscience actually means more or less what it sounds like it means. It’s the study of the neural mechanisms underlying cognitive function. In practice, it all goes to hell in a handbasket when you consider that you can prefix ‘cognitive neuroscience’ with pretty much any adjective you like and end up with a valid subdiscipline. Developmental cognitive neuroscience? Check. Computational cognitive neuroscience? Check. Industrial/organizational cognitive neuroscience? Amazingly, no; until just now, that phrase did not exist on the internet. But by the time you read this, Google will probably have a record of this post, which is really all it takes to legitimate I/OCN as a valid field of inquiry. It’s just that easy to create a new scientific discipline, so be very afraid–things are only going to get messier.
  • Cognitive science: A field that, by most accounts, lives up to its name. Well, kind of. Cognitive science sounds like a blanket term for pretty much everything that has to do with cognition, and it sort of is. You have psychology and linguistics and neuroscience and philosophy and artificial intelligence all represented. I’ve never been to the annual CogSci conference, but I hear it’s a veritable orgy of interdisciplinary activity. Still, I think there’s a definite bias towards some fields at the expense of others. Neuroscientists (of any stripe), for instance, rarely call themselves cognitive scientists. Conversely, philosophers of mind or language love to call themselves cognitive scientists, and the jerk cynic in me says it’s because it means they get to call themselves scientists. Also, in terms of content and coverage, there seems to be a definite emphasis among self-professed cognitive scientists on computational and mathematical modeling, and not so much emphasis on developing neuroscience-based models (though neural network models are popular). Still, if you’re scoring terms based on clarity of usage, cognitive science should score at least an 8.5 / 10.
  • Systems neuroscience: The study of neural circuits and the dynamics of information flow in the central nervous system (note: I stole part of that definition from MIT’s BCS website, because MIT people are SMART). Systems neuroscience doesn’t overlap much with psychology; you can’t defensibly argue that the temporal dynamics of neuronal assemblies in sensory cortex have anything to do with human cognition, right? I just threw this in to make things even more confusing.
  • Behavioral neuroscience: This one’s really great, because it has almost nothing to do with what you think it does. Well, okay, it does have something to do with behavior. But it’s almost exclusively animal behavior. People who refer to themselves as behavioral neuroscientists are generally in the business of poking rats in the brain with very small, sharp, glass objects; they typically don’t care much for human beings (professionally, that is). I guess that kind of makes sense when you consider that you can have rats swim and jump and eat and run while electrodes are implanted in their heads, whereas most of the time when we study human brains, they’re sitting motionless in (a) a giant magnet, (b) a chair, or (c) a jar full of formaldehyde. So maybe you could make an argument that since humans don’t get to BEHAVE very much in our studies, people who study humans can’t call themselves behavioral neuroscientists. But that would be a very bad argument to make, and many of the people who work in the so-called “behavioral sciences” and do nothing but study human behavior would probably be waiting to thump you in the hall the next time they saw you.
  • Psychiatry: The branch of medicine that concerns itself with handing out multicolored pills that do funny things to your thoughts and feelings. Of course.

Anyway, the basic point of all this long-winded nonsense is just that, for all that stuff we tell undergraduates about how science is such a wonderful way to achieve clarity about the way the world works, scientists–or at least, neuroscientists and psychologists–tend to carve up their disciplines in pretty insensible ways. That doesn’t mean we’re dumb, of course; to the people who work in a field, the clarity (or lack thereof) of the terminology makes little difference, because you only need to acquire it once (usually in your first nine years of grad school), and after that you always know what people are talking about. Come to think of it, I’m pretty sure the whole point of learning big words is that once you’ve successfully learned them, you can stop thinking deeply about what they actually mean.

It is kind of annoying, though, to have to explain to undergraduates that, DUH, the class they really want to take given their interests is OBVIOUSLY cognitive neuroscience and NOT neuropsychology or biological psychology. I mean, can’t they read? Or to pedantically point out to someone you just met at a party that saying “the neurological mechanisms of such-and-such” makes them sound hopelessly unsophisticated, and what they should really be saying is “the neural mechanisms,” or “the neurobiological mechanisms”, or (for bonus points) “the neurophysiological substrates”. Or, you know, to try (unsuccessfully) to convince your mother on the phone that even though it’s true that you study the relationship between brains and behavior, the field you work in has very little to do with behavioral neuroscience, and so you really aren’t an expert on that new study reported in that article she just read in the paper the other day about that interesting thing that’s relevant to all that stuff we all do all the time.

The point is, the world would be a slightly better place if cognitive science, neuropsychology, and behavioral neuroscience all meant what they seem like they should mean. But only very slightly better.

Anyway, aside from my burning need to complain about trivial things, I bring these ugly terminological matters up partly out of idle curiosity. And what I’m idly curious about is this: does this kind of confusion feature prominently in other disciplines too, or is psychology-slash-neuroscience just, you know, “special”? My intuition is that it’s the latter; subdiscipline names in other areas just seem so sensible to me whenever I hear them. For instance, I’m fairly confident that organic chemists study the chemistry of Orgas, and I assume condensed matter physicists spend their days modeling the dynamics of teapots. Right? Yes? No? Perhaps my  millions thousands hundreds dozens three regular readers can enlighten me in the comments…

fMRI, not coming to a courtroom near you so soon after all

That’s a terribly constructed title, I know, but bear with me. A couple of weeks ago I blogged about a courtroom case in Tennessee where the defense was trying to introduce fMRI to the courtroom as a way of proving the defendant’s innocence (his brain, apparently, showed no signs of guilt). The judge’s verdict is now in, and…. fMRI is out. In United States v. Lorne Semrau, Judge Pham recommended that the government’s motion to exclude fMRI scans from consideration be granted. That’s the outcome I think most respectable cognitive neuroscientists were hoping for; as many people associated with the case or interviewed about it have noted (and as the judge recognized), there just isn’t a shred of evidence to suggest that fMRI has any utility as a lie detector in real-world situations.

The judge’s decision, which you can download in PDF form here (hat-tip: Thomas Nadelhoffer), is really quite elegant, and worth reading (or at least skimming through). He even manages some subtle snark in places. For instance (my italics):

Regarding the existence and maintenance of standards, Dr. Laken testified as to the protocols and controlling standards that he uses for his own exams. Because the use of fMRI-based lie detection is still in its early stages of development, standards controlling the real-life application have not yet been established. Without such standards, a court cannot adequately evaluate the reliability of a particular lie detection examination. Cordoba, 194 F.3d at 1061. Assuming, arguendo, that the standards testified to by Dr. Laken could satisfy Daubert, it appears that Dr. Laken violated his own protocols when he re-scanned Dr. Semrau on the AIMS tests SIQs, after Dr. Semrau was found “deceptive” on the first AIMS tests scan. None of the studies cited by Dr. Laken involved the subject taking a second exam after being found to have been deceptive on the first exam. His decision to conduct a third test begs the question whether a fourth scan would have revealed Dr. Semrau to be deceptive again.

The absence of real-life error rates, lack of controlling standards in the industry for real-life exams, and Dr. Laken’s apparent deviation from his own protocols are negative factors in the analysis of whether fMRI-based lie detection is scientifically valid. See Bonds, 12 F.3d at 560.

The reference here is to the fact that Laken and his company scanned Semrau (the defendant) on three separate occasions. The first two scans were planned ahead of time, but the third apparently wasn’t:

From the first scan, which included SIQs relating to defrauding the government, the results showed that Dr. Semrau was “not deceptive.” However, from the second scan, which included SIQs relating to AIMS tests, the results showed that Dr. Semrau was “being deceptive.” According to Dr. Laken, “testing indicates that a positive test result in a person purporting to tell the truth is accurate only 6% of the time.” Dr. Laken also believed that the second scan may have been affected by Dr. Semrau’s fatigue. Based on his findings on the second test, Dr. Laken suggested that Dr. Semrau be administered another fMRI test on the AIMS tests topic, but this time with shorter questions and conducted later in the day to reduce the effects of fatigue. … The third scan was conducted on January 12, 2010 at around 7:00 p.m., and according to Dr. Laken, Dr. Semrau tolerated it well and did not express any fatigue. Dr. Laken reviewed this data on January 18, 2010, and concluded that Dr. Semrau was not deceptive. He further stated that based on his prior studies, “a finding such as this is 100% accurate in determining truthfulness from a truthful person.”

I may very well be misunderstanding something here (and so might the judge), but if the positive predictive value of the test is only 6%, I’m guessing that the probability that the test is seriously miscalibrated is somewhat higher than 6%. Especially since the base rate for lying among people who are accused of committing serious fraud is probably reasonably high (this matters, because when base rates are very low, low positive predictive values are not unexpected). But then, no one really knows how to calibrate these tests properly, because the data you’d need to do that simply don’t exist. Serious validation of fMRI as a tool for lie detection would require assembling a large set of brain scans from defendants accused of various crimes (real crimes, not simulated ones) and using that data to predict whether those defendants were ultimately found guilty or not. There really isn’t any substitute for doing a serious study of that sort, but as far as I know, no one’s done it yet. Fortunately, the few judges who’ve had to rule on the courtroom use of fMRI seem to recognize that.

Regarding the existence and maintenance of standards, Dr. Laken testified as to the protocols and controlling standards that he uses for his own exams. Because the use of fMRI-based lie detection is still in its early stages of development, standards controlling the real-life application have not yet been established. Without such standards, a court cannot adequately evaluate the reliability of a particular lie detection examination. Cordoba, 194 F.3d at 1061. Assuming, arguendo, that the standards testified to by Dr. Laken could satisfy Daubert, it appears that Dr. Laken violated his own protocols when he re-scanned Dr. Semrau on the AIMS tests SIQs, after Dr. Semrau was found “deceptive” on the first AIMS tests scan. None of the studies cited by Dr. Laken involved the subject taking a second exam after being found to have been deceptive on the first exam. His decision to conduct a third test begs the question whether a fourth scan would have revealed Dr. Semrau to be deceptive again.
The absence of real-life error rates, lack of controlling standards in the industry for real-life exams, and Dr. Laken’s apparent deviation from his own protocols are negative factors in the analysis of whether fMRI-based lie detection is scientifically valid. See Bonds, 12 F.3d at 560

elsewhere on the net

I’ve been swamped with work lately, so blogging has taken a backseat. I keep a text file on my desktop of interesting things I’d like to blog about; normally, about three-quarters of the links I paste into it go unblogged, but in the last couple of weeks it’s more like 98%. So here are some things I’ve found interesting recently, in no particular order:

It’s World Water Day 2010! Or at least it was a week ago, which is when I should have linked to these really moving photos.

Carl Zimmer has a typically brilliant (and beautifully illustrated) article in the New York Times about “Unseen Beasts, Then and Now“:

Somewhere in England, about 600 years ago, an artist sat down and tried to paint an elephant. There was just one problem: he had never seen one.

John Horgan writes a surprisingly bad guest blog post for Scientific American in which he basically accuses neuroscientists (not a neuroscientist or some neuroscientists, but all of us, collectively) of selling out by working with the US military. I’m guessing that the number of working neuroscientists who’ve ever received any sort of military funding is somewhere south of 10%, and is probably much smaller than the corresponding proportion in any number of other scientific disciplines, but why let data get in the way of a good anecdote or two. [via Peter Reiner]

Mark Liberman follows up his first critique of Louann Brizendine’s new “book” The Male Brain with second one, now that he’s actually got his hands on a copy. Verdict: the book is still terrible. Mark was also kind enough to answer my question about what the mysterious “sexual pursuit area” is. Apparently it’s the medial preoptic area. And the claim that this area governs sexual behavior in humans and is 2.5 times larger in males is, once again, based entirely on work in the rat.

Commuting sucks. Jonah Lehrer discusses evidence from happiness studies (by way of David Brooks) suggesting that most people would be much happier living in a smaller house close to work than a larger house that requires a lengthy commute:

According to the calculations of Frey and Stutzer, a person with a one-hour commute has to earn 40 percent more money to be as satisfied with life as someone who walks to the office.

I’ve taken these findings to heart, and whenever my wife and I move now, we prioritize location over space. We’re currently paying through the nose to live in a 750 square foot apartment near downtown Boulder. It’s about half the size of our old place in St. Louis, but it’s close to everything, including our work, and we love living here.

The modern human brain is much bigger than it used to be, but we didn’t get that way overnight. John Hawks disputes Colin Blakemore’s claim that “the human brain got bigger by accident and not through evolution“.

Sanjay Srivastava leans (or maybe used to lean) toward the permissive side; Andrew Gelman is skeptical. Attitudes toward causal modeling of correlational (and even some experimental) data differ widely. There’s been a flurry of recent work suggesting that causal modeling techniques like mediation analysis and SEM suffer from a number of serious and underappreciated problems, and after reading this paper by Bullock, Green and Ha, I guess I incline to agree.

A landmark ruling by a New York judge yesterday has the potential to invalidate existing patents on genes, which currently cover about 20% of the human genome in some form. Daniel MacArthur has an excellent summary.

what the general factor of intelligence is and isn’t, or why intuitive unitarianism is a lousy guide to the neurobiology of higher cognitive ability

This post shamelessly plagiarizes liberally borrows ideas from a much longer, more detailed, and just generally better post by Cosma Shalizi. I’m not apologetic, since I’m a firm believer in the notion that good ideas should be repeated often and loudly. So I’m going to be often and loud here, though I’ll try to be (slightly) more succinct than Shalizi. Still, if you have the time to spare, you should read his longer and more mathematical take.

There’s a widely held view among intelligence researchers in particular, and psychologists more generally, that there’s a general factor of intelligence (often dubbed g) that accounts for a very large portion of the variance in a broad range of cognitive performance tasks. Which is to say, if you have a bunch of people do a bunch of different tasks, all of which we think tap different aspects of intellectual ability, and then you take all those scores and factor analyze them, you’ll almost invariably get a first factor that explains 50% or more of the variance in the zero-order scores. Or to put it differently, if you know a person’s relative standing on g, you can make a reasonable prediction about how that person will do on lots of different tasks–for example, digit symbol substitution, N-back, go/no-go, and so on and so forth. Virtually all tasks that we think reflect cognitive ability turn out, to varying extents, to reflect some underlying latent variable, and that latent variable is what we dub g.

In a trivial sense, no one really disputes that there’s such a thing as g. You can’t really dispute the existence of g, seeing as a general factor tends to fall out of virtually all factor analyses of cognitive tasks; it’s about as well-replicated a finding as you can get. To say that g exists, on the most basic reading, is simply to slap a name on the empirical fact that scores on different cognitive measures tend to intercorrelate positively to a considerable extent.

What’s not so clear is what the implications of g are for our understanding of how the human mind and brain works. If you take the presence of g at face value, all it really says is what we all pretty much already know: some people are smarter than others. People who do well in one intellectual domain will tend to do pretty well in others too, other things being equal. With the exception of some people who’ve tried to argue that there’s no such thing as general intelligence, but only “multiple intelligences” that totally fractionate across domains (not a compelling story, if you look at the evidence), it’s pretty clear that cognitive abilities tend to hang together pretty well.

The trouble really crops up when we try to say something interesting about the architecture of the human mind on the basis of the psychometric evidence for g. If someone tells you that there’s a single psychometric factor that explains at least 50% of the variance in a broad range of human cognitive abilities, it seems perfectly reasonable to suppose that that’s because there’s some unitary intelligence system in people’s heads, and that that system varies in capacity across individuals. In other words, the two intuitive models people have about intelligence seem to be that either (a) there’s some general cognitive system that corresponds to g, and supports a very large portion of the complex reasoning ability we call “intelligence” or (b) there are lots of different (and mostly unrelated) cognitive abilities, each of which contributes only to specific types of tasks and not others. Framed this way, it just seems obvious that the former view is the right one, and that the latter view has been discredited by the evidence.

The problem is that the psychometric evidence for g stems almost entirely from statistical procedures that aren’t really supposed to be use for causal inference. The primary weapon in the intelligence researcher’s toolbox has historically been principal components analysis (PCA) or exploratory factor analysis, which are really just data reduction techniques. PCA tells you how you can describe your data in a more compact way, but it doesn’t actually tell you what structure is in your data. A good analogy is the use of digital compression algorithms. If you take a directory full of .txt files and compress them into a single .zip file, you’ll almost certainly end up with a file that’s only a small fraction of the total size of the original texts. The reason this works is because certain patterns tend to repeat themselves over and over in .txt files, and a smart algorithm will store an abbreviated description of those patterns rather than the patterns themselves. Which, conceptually, is almost exactly what happens when you run a PCA on a dataset: you’re searching for consistent patterns in the way observations vary along multiple variables, and discarding any redundancy you come across in favor of a more compact description.

Now, in a very real sense, compression is impressive. It’s certainly nice to be able to email your friend a 140kb .zip of your 1200-page novel rather than a 2mb .doc. But note that you don’t actually learn much from the compression. It’s not like your friend can open up that 140k binary representation of your novel, read it, and spare herself the torture of the other 1860kb. If you want to understand what’s going on in a novel, you need to read the novel and think about the novel. And if you want to understand what’s going on in a set of correlations between different cognitive tasks, you need to carefully inspect those correlations and carefully think about those correlations. You can run a factor analysis if you like, and you might learn something, but you’re not going to get any deep insights into the “true” structure of the data. The “true” structure of the data is, by definition, what you started out with (give or take some error). When you run a PCA, you actually get a distorted (but simpler!) picture of the data.

To most people who use PCA, or other data reduction techniques, this isn’t a novel insight by any means. Most everyone who uses PCA knows that in an obvious sense you’re distorting the structure of the data when you reduce its dimensionality. But the use of data reduction is often defended by noting that there must be some reason why variables hang together in such a way that they can be reduced to a much smaller set of variables with relatively little loss of variance. In the context of intelligence, the intuition can be expressed as: if there wasn’t really a single factor underlying intelligence, why would we get such a strong first factor? After all, it didn’t have to turn out that way; we could have gotten lots of smaller factors that appear to reflect distinct types of ability, like verbal intelligence, spatial intelligence, perceptual speed, and so on. But it did turn out that way, so that tells us something important about the unitary nature of intelligence.

This is a strangely compelling argument, but it turns out to be only minimally true. What the presence of a strong first factor does tell you is that you have a lot of positively correlated variables in your data set. To be fair, that is informative. But it’s only minimally informative, because, assuming you eyeballed the correlation matrix in the original data, you already knew that.

What you don’t know, and can’t know, on the basis of a PCA, is what underlying causal structure actually generated the observed positive correlations between your variables. It’s certainly possible that there’s really only one central intelligence system that contributes the bulk of the variance to lots of different cognitive tasks. That’s the g model, and it’s entirely consistent with the empirical data. Unfortunately, it’s not the only one. To the contrary, there are an infinite number of possible causal models that would be consistent with any given factor structure derived from a PCA, including a structure dominated by a strong first factor. In fact, you can have a causal structure with as many variables as you like be consistent with g-like data. So long as the variables in your model all make contributions in the same direction to the observed variables, you will tend to end up with an excessively strong first factor. So you could in principle have 3,000 distinct systems in the human brain, all completely independent of one another, and all of which contribute relatively modestly to a bunch of different cognitive tasks. And you could still get a first factor that accounts for 50% or more of the variance. No g required.

If you doubt this is true, go read Cosma Shalizi’s post, where he not only walks you through a more detailed explanation of the mathematical necessity of this claim, but also illustrates the point using some very simple simulations. Basically, he builds a toy model in which 11 different tasks each draw on several hundred underlying cognitive tasks, which are turn drawn from a larger pool of 2,766 completely independent abilities. He then runs a PCA on the data and finds, lo and behold, a single factor that explains nearly 50% of the variance in scores. Using PCA, it turns out, you can get something huge from (almost) nothing.

Now, at this point a proponent of a unitary g might say, sure, it’s possible that there isn’t really a single cognitive system underlying variation in intelligence; but it’s not plausible, because it’s surely more parsimonious to posit a model with just one variable than a model with 2,766. But that’s only true if you think that our brains evolved in order to make life easier for psychometricians, which, last I checked, wasn’t the case. If you think even a little bit about what we know about the biological and genetic bases of human cognition, it starts to seem really unlikely that there really could be a single central intelligence system. For starters, the evidence just doesn’t support it. In the cognitive neuroscience literature, for example, biomarkers of intelligence abound, and they just don’t seem all that related. There’s a really nice paper in Nature Reviews Neuroscience this month by Deary, Penke, and Johnson that reviews a substantial portion of the literature of intelligence; the upshot is that intelligence has lots of different correlates. For example, people who score highly on intelligence tend to (a) have larger brains overall; (b) show regional differences in brain volume; (c) show differences in neural efficiency when performing cognitive tasks; (d) have greater white matter integrity; (e) have brains with more efficient network structures;  and so on.

These phenomena may not all be completely independent, but it’s hard to believe there’s any plausible story you could tell that renders them all part of some unitary intelligence system, or subject to unitary genetic influence. And really, why should they be part of a unitary system? Is there really any reason to think there has to be a single rate-limiting factor on performance? It’s surely perfectly plausible (I’d argue, much more plausible) to think that almost any complex cognitive task you use as an index of intelligence is going to draw on many, many different cognitive abilities. Take a trivial example: individual differences in visual acuity probably make a (very) small contribution to performance on many different cognitive tasks. If you can’t see the minute details of the stimuli as well as the next person, you might perform slightly worse on the task. So some variance in putatively “cognitive” task performance undoubtedly reflects abilities that most intelligence researchers wouldn’t really consider properly reflective of higher cognition at all. And yet, that variance has to go somewhere when you run a factor analysis. Most likely, it’ll go straight into that first factor, or g, since it’s variance that’s common to multiple tasks (i.e., someone with poorer eyesight may tend to do very slightly worse on any task that requires visual attention). In fact, any ability that makes unidirectional contributions to task performance, no matter how relevant or irrelevant to the conceptual definition of intelligence, will inflate the so-called g factor.

If this still seems counter-intuitive to you, here’s an analogy that might, to borrow Dan Dennett’s phrase, prime your intuition pump (it isn’t as dirty as it sounds). Imagine that instead of studying the relationship between different cognitive tasks, we decided to study the relation between performance at different sports. So we went out and rounded up 500 healthy young adults and had them engage in 16 different sports, including basketball, soccer, hockey, long-distance running, short-distance running, swimming, and so on. We then took performance scores for all 16 tasks and submitted them to a PCA. What do you think would happen? I’d be willing to bet good money that you’d get a strong first factor, just like with cognitive tasks. In other words, just like with g, you’d have one latent variable that seemed to explain the bulk of the variance in lots of different sports-related abilities. And just like g, it would have an easy and parsimonious interpretation: a general factor of athleticism!

Of course, in a trivial sense, you’d be right to call it that. I doubt anyone’s going to deny that some people just are more athletic than others. But if you then ask, “well, what’s the mechanism that underlies athleticism,” it’s suddenly much less plausible to think that there’s a single physiological variable or pathway that supports athleticism. In fact, it seems flatly absurd. You can easily think of dozens if not hundreds of factors that should contribute a small amount of the variance to performance on multiple sports. To name just a few: height, jumping ability, running speed, oxygen capacity, fine motor control, gross motor control, perceptual speed, response time, balance, and so on and so forth. And most of these are individually still relatively high-level abilities that break down further at the physiological level (e.g., “balance” is itself a complex trait that at minimum reflects contributions of the vestibular, visual, and cerebellar systems, and so on.). If you go down that road, it very quickly becomes obvious that you’re just not going to find a unitary mechanism that explains athletic ability. Because it doesn’t exist.

All of this isn’t to say that intelligence (or athleticism) isn’t “real”. Intelligence and athleticism are perfectly real; it makes complete sense, and is factually defensible, to talk about some people being smarter or more athletic than other people. But the point is that those judgments are based on superficial observations of behavior; knowing that people’s intelligence or athleticism may express itself in a (relatively) unitary fashion doesn’t tell you anything at all about the underlying causal mechanisms–how many of them there are, or how they interact.

As Cosma Shalizi notes, it also doesn’t tell you anything about heritability or malleability. The fact that we tend to think intelligence is highly heritable doesn’t provide any evidence in favor of a unitary underlying mechanism; it’s just as plausible to think that there are many, many individual abilities that contribute to complex cognitive behavior, all of which are also highly heritable individually. Similarly, there’s no reason to think our cognitive abilities would be any less or any more malleable depending on whether they reflect the operation of a single system or hundreds of variables. Regular physical exercise clearly improves people’s capacity to carry out all sorts of different activities, but that doesn’t mean you’re only training up a single physiological pathway when you exercise; a whole host of changes are taking place throughout your body.

So, assuming you buy the basic argument, where does that leave us? Depends. From a day-to-day standpoint, nothing changes. You can go on telling your friends that so-and-so is a terrific athlete but not the brightest crayon in the box, and your friends will go on understanding exactly what you meant. No one’s suggesting that intelligence isn’t stable and trait-like, just that, at the biological level, it isn’t really one stable trait.

The real impact of relaxing the view that g is a meaningful construct at the biological level, I think, will be in removing an artificial and overly restrictive constraint on researchers’ theorizing. The sense I get, having done some work on executive control, is that g is the 800-pound gorilla in the room: researchers interested in studying the neural bases of intelligence (or related constructs like executive or cognitive control) are always worrying about how their findings relate to g, and how to explain the fact that there might be dissociable neural correlates of different abilities (or even multiple independent contributions to fluid intelligence). To show you that I’m not making this concern up, and that it weighs heavily on many researchers, here’s a quote from the aforementioned and otherwise really excellent NRN paper by Deary et al reviewing recent findings on the neural bases of intelligence:

The neuroscience of intelligence is constrained by — and must explain — the following established facts about cognitive test performance: about half of the variance across varied cognitive tests is contained in general cognitive ability; much less variance is contained within broad domains of capability; there is some variance in specific abilities; and there are distinct ageing patterns for so-called fluid and crystallized aspects of cognitive ability.

The existence of g creates a complicated situation for neuroscience. The fact that g contributes substantial variance to all specific cognitive ability tests is generally thought to indicate that g contributes directly in some way to performance on those tests. That is, when domains of thinking skill (such as executive function and memory) or specific tasks (such as mental arithmetic and non-verbal reasoning on the Raven’s Progressive Matrices test) are studied, neuroscientists are observing brain activity related to g as well as the specific task activities. This undermines the ability to determine localized brain activities that are specific to the task at hand.

I hope I’ve convinced you by this point that the neuroscience of intelligence doesn’t have to explain why half of the variance is contained in general cognitive ability, because there’s no good evidence that there is such a thing as general cognitive ability (except in the descriptive psychometric sense, which carries no biological weight). Relaxing this artificial constraint would allow researchers to get on with the interesting and important business of identifying correlates (and potential causal determinants) of different cognitive abilities without having to worry about the relation of their finding to some Grand Theory of Intelligence. If you believe in g, you’re going to be at a complete loss to explain how researchers can continually identify new biological and genetic correlates of intelligence, and how the effect sizes could be so small (particularly at a genetic level, where no one’s identified a single polymorphism that accounts for more than a fraction of the observable variance in intelligence–the so called problem of “missing heritability”). But once you discard the fiction of g, you can take such findings in stride, and can set about the business of building integrative models that allow for and explicitly model the presence of multiple independent contributions to intelligence. And if studying the brain has taught us anything at all, it’s that the truth is inevitably more complicated than what we’d like to believe.

in praise of (lab) rotation

I did my PhD in psychology, but in a department that had close ties and collaborations with neuroscience. One of the interesting things about psychology and neuroscience programs is that they seem to have quite different graduate training models, even in cases where the area of research substantively overlaps (e.g., in cognitive neuroscience). In psychology, there seem two be two general models (at least, at American and Canadian universities; I’m not really familiar with other systems). One is that graduate students are accepted into a specific lab and have ties to a specific advisor (or advisors); the other, more common at large state schools, is that graduate students are accepted into the program (or an area within the program) as a whole, and are then given the (relative) freedom to find an advisor they want to work with. There are pros and cons to either model: the former ensures that every student has a place in someone’s lab from the very beginning of training, so that no one falls through the cracks; but the downside is that beginning students often aren’t sure exactly what they want to work on, and there are occasional (and sometimes acrimonious) mentor-mentee divorces. The latter gives students more freedom to explore their research interests, but can make it more difficult for students to secure funding, and has more of a sink-or-swim flavor (i.e., there’s less institutional support for students).

Both of these models differ quite a bit from what I take to be the most common neuroscience model, which is that students spend all or part of their first year doing a series of rotations through various labs–usually for about 2 months at a time. The idea is to expose students to a variety of different lines of research so that they get a better sense of what people in different areas are doing, and can make a more informed judgment about what research they’d like to pursue. And there are obviously other benefits too: faculty get to evaluate students on a trial basis before making a long-term commitment, and conversely, students get to see the internal workings of the lab and have more contact with the lab head before signing on.

I’ve always thought the rotation model makes a lot of sense, and wonder why more psychology programs don’t try to implement one. I can’t complain about my own training, in that I had a really great experience on both personal and professional levels in the labs I worked in; but I recognize that this was almost entirely due to dumb luck. I didn’t really do my homework very well before entering graduate school, and I could easily have landed in a department or lab I didn’t mesh well with, and spent the next few years miserable and unproductive. I’ll freely admit that I was unusually clueless going into grad school (that’s a post for another time), but I think no matter how much research you do, there’s just no way to know for sure how well you’ll do in a particular lab until you’ve spent some time in it. And most first-year graduate students have kind of fickle interests anyway; it’s hard to know when you’re 22 or 23 exactly what problem you want to spend the rest of your life (or at least the next 4 – 7 years) working on. Having people do rotations in multiple labs seems like an ideal way to maximize the odds of students (and faculty) ending up in happy, productive working relationships.

A question, then, for people who’ve had experience on the administrative side of psychology (or neuroscience) departments: what keeps us from applying a rotation model in psychology too? Are there major disadvantages I’m missing? Is the problem one of financial support? Do we think that psychology students come into graduate programs with more focused interests? Or is it just a matter of convention? Inquiring minds (or at least one of them) want to know…