what the general factor of intelligence is and isn’t, or why intuitive unitarianism is a lousy guide to the neurobiology of higher cognitive ability

March 7th, 2010

This post shamelessly plagiarizes liberally borrows ideas from a much longer, more detailed, and just generally better post by Cosma Shalizi. I’m not apologetic, since I’m a firm believer in the notion that good ideas should be repeated often and loudly. So I’m going to be often and loud here, though I’ll try to be (slightly) more succinct than Shalizi. Still, if you have the time to spare, you should read his longer and more mathematical take.

There’s a widely held view among intelligence researchers in particular, and psychologists more generally, that there’s a general factor of intelligence (often dubbed g) that accounts for a very large portion of the variance in a broad range of cognitive performance tasks. Which is to say, if you have a bunch of people do a bunch of different tasks, all of which we think tap different aspects of intellectual ability, and then you take all those scores and factor analyze them, you’ll almost invariably get a first factor that explains 50% or more of the variance in the zero-order scores. Or to put it differently, if you know a person’s relative standing on g, you can make a reasonable prediction about how that person will do on lots of different tasks–for example, digit symbol substitution, N-back, go/no-go, and so on and so forth. Virtually all tasks that we think reflect cognitive ability turn out, to varying extents, to reflect some underlying latent variable, and that latent variable is what we dub g.

In a trivial sense, no one really disputes that there’s such a thing as g. You can’t really dispute the existence of g, seeing as a general factor tends to fall out of virtually all factor analyses of cognitive tasks; it’s about as well-replicated a finding as you can get. To say that g exists, on the most basic reading, is simply to slap a name on the empirical fact that scores on different cognitive measures tend to intercorrelate positively to a considerable extent.

What’s not so clear is what the implications of g are for our understanding of how the human mind and brain works. If you take the presence of g at face value, all it really says is what we all pretty much already know: some people are smarter than others. People who do well in one intellectual domain will tend to do pretty well in others too, other things being equal. With the exception of some people who’ve tried to argue that there’s no such thing as general intelligence, but only “multiple intelligences” that totally fractionate across domains (not a compelling story, if you look at the evidence), it’s pretty clear that cognitive abilities tend to hang together pretty well.

The trouble really crops up when we try to say something interesting about the architecture of the human mind on the basis of the psychometric evidence for g. If someone tells you that there’s a single psychometric factor that explains at least 50% of the variance in a broad range of human cognitive abilities, it seems perfectly reasonable to suppose that that’s because there’s some unitary intelligence system in people’s heads, and that that system varies in capacity across individuals. In other words, the two intuitive models people have about intelligence seem to be that either (a) there’s some general cognitive system that corresponds to g, and supports a very large portion of the complex reasoning ability we call “intelligence” or (b) there are lots of different (and mostly unrelated) cognitive abilities, each of which contributes only to specific types of tasks and not others. Framed this way, it just seems obvious that the former view is the right one, and that the latter view has been discredited by the evidence.

The problem is that the psychometric evidence for g stems almost entirely from statistical procedures that aren’t really supposed to be use for causal inference. The primary weapon in the intelligence researcher’s toolbox has historically been principal components analysis (PCA) or exploratory factor analysis, which are really just data reduction techniques. PCA tells you how you can describe your data in a more compact way, but it doesn’t actually tell you what structure is in your data. A good analogy is the use of digital compression algorithms. If you take a directory full of .txt files and compress them into a single .zip file, you’ll almost certainly end up with a file that’s only a small fraction of the total size of the original texts. The reason this works is because certain patterns tend to repeat themselves over and over in .txt files, and a smart algorithm will store an abbreviated description of those patterns rather than the patterns themselves. Which, conceptually, is almost exactly what happens when you run a PCA on a dataset: you’re searching for consistent patterns in the way observations vary along multiple variables, and discarding any redundancy you come across in favor of a more compact description.

Now, in a very real sense, compression is impressive. It’s certainly nice to be able to email your friend a 140kb .zip of your 1200-page novel rather than a 2mb .doc. But note that you don’t actually learn much from the compression. It’s not like your friend can open up that 140k binary representation of your novel, read it, and spare herself the torture of the other 1860kb. If you want to understand what’s going on in a novel, you need to read the novel and think about the novel. And if you want to understand what’s going on in a set of correlations between different cognitive tasks, you need to carefully inspect those correlations and carefully think about those correlations. You can run a factor analysis if you like, and you might learn something, but you’re not going to get any deep insights into the “true” structure of the data. The “true” structure of the data is, by definition, what you started out with (give or take some error). When you run a PCA, you actually get a distorted (but simpler!) picture of the data.

To most people who use PCA, or other data reduction techniques, this isn’t a novel insight by any means. Most everyone who uses PCA knows that in an obvious sense you’re distorting the structure of the data when you reduce its dimensionality. But the use of data reduction is often defended by noting that there must be some reason why variables hang together in such a way that they can be reduced to a much smaller set of variables with relatively little loss of variance. In the context of intelligence, the intuition can be expressed as: if there wasn’t really a single factor underlying intelligence, why would we get such a strong first factor? After all, it didn’t have to turn out that way; we could have gotten lots of smaller factors that appear to reflect distinct types of ability, like verbal intelligence, spatial intelligence, perceptual speed, and so on. But it did turn out that way, so that tells us something important about the unitary nature of intelligence.

This is a strangely compelling argument, but it turns out to be only minimally true. What the presence of a strong first factor does tell you is that you have a lot of positively correlated variables in your data set. To be fair, that is informative. But it’s only minimally informative, because, assuming you eyeballed the correlation matrix in the original data, you already knew that.

What you don’t know, and can’t know, on the basis of a PCA, is what underlying causal structure actually generated the observed positive correlations between your variables. It’s certainly possible that there’s really only one central intelligence system that contributes the bulk of the variance to lots of different cognitive tasks. That’s the g model, and it’s entirely consistent with the empirical data. Unfortunately, it’s not the only one. To the contrary, there are an infinite number of possible causal models that would be consistent with any given factor structure derived from a PCA, including a structure dominated by a strong first factor. In fact, you can have a causal structure with as many variables as you like be consistent with g-like data. So long as the variables in your model all make contributions in the same direction to the observed variables, you will tend to end up with an excessively strong first factor. So you could in principle have 3,000 distinct systems in the human brain, all completely independent of one another, and all of which contribute relatively modestly to a bunch of different cognitive tasks. And you could still get a first factor that accounts for 50% or more of the variance. No g required.

If you doubt this is true, go read Cosma Shalizi’s post, where he not only walks you through a more detailed explanation of the mathematical necessity of this claim, but also illustrates the point using some very simple simulations. Basically, he builds a toy model in which 11 different tasks each draw on several hundred underlying cognitive tasks, which are turn drawn from a larger pool of 2,766 completely independent abilities. He then runs a PCA on the data and finds, lo and behold, a single factor that explains nearly 50% of the variance in scores. Using PCA, it turns out, you can get something huge from (almost) nothing.

Now, at this point a proponent of a unitary g might say, sure, it’s possible that there isn’t really a single cognitive system underlying variation in intelligence; but it’s not plausible, because it’s surely more parsimonious to posit a model with just one variable than a model with 2,766. But that’s only true if you think that our brains evolved in order to make life easier for psychometricians, which, last I checked, wasn’t the case. If you think even a little bit about what we know about the biological and genetic bases of human cognition, it starts to seem really unlikely that there really could be a single central intelligence system. For starters, the evidence just doesn’t support it. In the cognitive neuroscience literature, for example, biomarkers of intelligence abound, and they just don’t seem all that related. There’s a really nice paper in Nature Reviews Neuroscience this month by Deary, Penke, and Johnson that reviews a substantial portion of the literature of intelligence; the upshot is that intelligence has lots of different correlates. For example, people who score highly on intelligence tend to (a) have larger brains overall; (b) show regional differences in brain volume; (c) show differences in neural efficiency when performing cognitive tasks; (d) have greater white matter integrity; (e) have brains with more efficient network structures;  and so on.

These phenomena may not all be completely independent, but it’s hard to believe there’s any plausible story you could tell that renders them all part of some unitary intelligence system, or subject to unitary genetic influence. And really, why should they be part of a unitary system? Is there really any reason to think there has to be a single rate-limiting factor on performance? It’s surely perfectly plausible (I’d argue, much more plausible) to think that almost any complex cognitive task you use as an index of intelligence is going to draw on many, many different cognitive abilities. Take a trivial example: individual differences in visual acuity probably make a (very) small contribution to performance on many different cognitive tasks. If you can’t see the minute details of the stimuli as well as the next person, you might perform slightly worse on the task. So some variance in putatively “cognitive” task performance undoubtedly reflects abilities that most intelligence researchers wouldn’t really consider properly reflective of higher cognition at all. And yet, that variance has to go somewhere when you run a factor analysis. Most likely, it’ll go straight into that first factor, or g, since it’s variance that’s common to multiple tasks (i.e., someone with poorer eyesight may tend to do very slightly worse on any task that requires visual attention). In fact, any ability that makes unidirectional contributions to task performance, no matter how relevant or irrelevant to the conceptual definition of intelligence, will inflate the so-called g factor.

If this still seems counter-intuitive to you, here’s an analogy that might, to borrow Dan Dennett’s phrase, prime your intuition pump (it isn’t as dirty as it sounds). Imagine that instead of studying the relationship between different cognitive tasks, we decided to study the relation between performance at different sports. So we went out and rounded up 500 healthy young adults and had them engage in 16 different sports, including basketball, soccer, hockey, long-distance running, short-distance running, swimming, and so on. We then took performance scores for all 16 tasks and submitted them to a PCA. What do you think would happen? I’d be willing to bet good money that you’d get a strong first factor, just like with cognitive tasks. In other words, just like with g, you’d have one latent variable that seemed to explain the bulk of the variance in lots of different sports-related abilities. And just like g, it would have an easy and parsimonious interpretation: a general factor of athleticism!

Of course, in a trivial sense, you’d be right to call it that. I doubt anyone’s going to deny that some people just are more athletic than others. But if you then ask, “well, what’s the mechanism that underlies athleticism,” it’s suddenly much less plausible to think that there’s a single physiological variable or pathway that supports athleticism. In fact, it seems flatly absurd. You can easily think of dozens if not hundreds of factors that should contribute a small amount of the variance to performance on multiple sports. To name just a few: height, jumping ability, running speed, oxygen capacity, fine motor control, gross motor control, perceptual speed, response time, balance, and so on and so forth. And most of these are individually still relatively high-level abilities that break down further at the physiological level (e.g., “balance” is itself a complex trait that at minimum reflects contributions of the vestibular, visual, and cerebellar systems, and so on.). If you go down that road, it very quickly becomes obvious that you’re just not going to find a unitary mechanism that explains athletic ability. Because it doesn’t exist.

All of this isn’t to say that intelligence (or athleticism) isn’t “real”. Intelligence and athleticism are perfectly real; it makes complete sense, and is factually defensible, to talk about some people being smarter or more athletic than other people. But the point is that those judgments are based on superficial observations of behavior; knowing that people’s intelligence or athleticism may express itself in a (relatively) unitary fashion doesn’t tell you anything at all about the underlying causal mechanisms–how many of them there are, or how they interact.

As Cosma Shalizi notes, it also doesn’t tell you anything about heritability or malleability. The fact that we tend to think intelligence is highly heritable doesn’t provide any evidence in favor of a unitary underlying mechanism; it’s just as plausible to think that there are many, many individual abilities that contribute to complex cognitive behavior, all of which are also highly heritable individually. Similarly, there’s no reason to think our cognitive abilities would be any less or any more malleable depending on whether they reflect the operation of a single system or hundreds of variables. Regular physical exercise clearly improves people’s capacity to carry out all sorts of different activities, but that doesn’t mean you’re only training up a single physiological pathway when you exercise; a whole host of changes are taking place throughout your body.

So, assuming you buy the basic argument, where does that leave us? Depends. From a day-to-day standpoint, nothing changes. You can go on telling your friends that so-and-so is a terrific athlete but not the brightest crayon in the box, and your friends will go on understanding exactly what you meant. No one’s suggesting that intelligence isn’t stable and trait-like, just that, at the biological level, it isn’t really one stable trait.

The real impact of relaxing the view that g is a meaningful construct at the biological level, I think, will be in removing an artificial and overly restrictive constraint on researchers’ theorizing. The sense I get, having done some work on executive control, is that g is the 800-pound gorilla in the room: researchers interested in studying the neural bases of intelligence (or related constructs like executive or cognitive control) are always worrying about how their findings relate to g, and how to explain the fact that there might be dissociable neural correlates of different abilities (or even multiple independent contributions to fluid intelligence). To show you that I’m not making this concern up, and that it weighs heavily on many researchers, here’s a quote from the aforementioned and otherwise really excellent NRN paper by Deary et al reviewing recent findings on the neural bases of intelligence:

The neuroscience of intelligence is constrained by — and must explain — the following established facts about cognitive test performance: about half of the variance across varied cognitive tests is contained in general cognitive ability; much less variance is contained within broad domains of capability; there is some variance in specific abilities; and there are distinct ageing patterns for so-called fluid and crystallized aspects of cognitive ability.

The existence of g creates a complicated situation for neuroscience. The fact that g contributes substantial variance to all specific cognitive ability tests is generally thought to indicate that g contributes directly in some way to performance on those tests. That is, when domains of thinking skill (such as executive function and memory) or specific tasks (such as mental arithmetic and non-verbal reasoning on the Raven’s Progressive Matrices test) are studied, neuroscientists are observing brain activity related to g as well as the specific task activities. This undermines the ability to determine localized brain activities that are specific to the task at hand.

I hope I’ve convinced you by this point that the neuroscience of intelligence doesn’t have to explain why half of the variance is contained in general cognitive ability, because there’s no good evidence that there is such a thing as general cognitive ability (except in the descriptive psychometric sense, which carries no biological weight). Relaxing this artificial constraint would allow researchers to get on with the interesting and important business of identifying correlates (and potential causal determinants) of different cognitive abilities without having to worry about the relation of their finding to some Grand Theory of Intelligence. If you believe in g, you’re going to be at a complete loss to explain how researchers can continually identify new biological and genetic correlates of intelligence, and how the effect sizes could be so small (particularly at a genetic level, where no one’s identified a single polymorphism that accounts for more than a fraction of the observable variance in intelligence–the so called problem of “missing heritability”). But once you discard the fiction of g, you can take such findings in stride, and can set about the business of building integrative models that allow for and explicitly model the presence of multiple independent contributions to intelligence. And if studying the brain has taught us anything at all, it’s that the truth is inevitably more complicated than what we’d like to believe.

functional MRI and the many varieties of reliability

March 5th, 2010

Craig Bennett and Mike Miller have a new paper on the reliability of fMRI. It’s a nice review that I think most people who work with fMRI will want to read. Bennett and Miller discuss a number of issues related to reliability, including why we should care about the reliability of fMRI, what factors influence reliability, how to obtain estimates of fMRI reliability, and what previous studies suggest about the reliability of fMRI. Their bottom line is that the reliability of fMRI often leaves something to be desired:

One thing is abundantly clear: fMRI is an effective research tool that has opened broad new horizons of investigation to scientists around the world. However, the results from fMRI research may be somewhat less reliable than many researchers implicitly believe. While it may be frustrating to know that fMRI results are not perfectly replicable, it is beneficial to take a longer-term view regarding the scientific impact of these studies. In neuroimaging, as in other scientific fields, errors will be made and some results will not replicate.

I think this is a wholly appropriate conclusion, and strongly recommend reading the entire article. Because there’s already a nice write-up of the paper over at Mind Hacks, I’ll content myself to adding a number of points to B&M’s discussion (I talk about some of these same issues in a chapter I wrote with Todd Braver).

First, even though I agree enthusiastically with the gist of B&M’s conclusion, it’s worth noting that, strictly speaking, there’s actually no such thing as “the reliability of fMRI”. Reliability isn’t a property of a technique or instrument, it’s a property of a specific measurement. Because every measurement is made under slightly different conditions, reliability will inevitably vary on a case-by-case basis. But since it’s not really practical (or even possible) to estimate reliability for every single analysis, researchers take necessary short-cuts. The standard in the psychometric literature is to establish reliability on a per-measure (not per-method!) basis, so long as conditions don’t vary too dramatically across samples. For example, once someone “validates” a given self-report measure, it’s generally taken for granted that that measure is “reliable”, and most people feel comfortable administering it to new samples without having to go to the trouble of estimating reliability themselves. That’s a perfectly reasonable approach, but the critical point is that it’s done on a relatively specific basis. Supposing you made up a new self-report measure of depression from a set of items you cobbled together yourself, you wouldn’t be entitled to conclude that your measure was reliable simply because some other self-report measure of depression had already been psychometrically validated. You’d be using an entirely new set of items, so you’d have to go to the trouble of validating your instrument anew.

By the same token, the reliability of any given fMRI measurement is going to fluctuate wildly depending on the task used, the timing of events, and many other factors. That’s not just because some estimates of reliability are better than others; it’s because there just isn’t a fact of the matter about what the “true” reliability of fMRI is. Rather, there are facts about how reliable fMRI is for specific types of tasks with specific acquisition parameters and preprocessing streams in specific scanners, and so on (which can then be summarized by talking about the general distribution of fMRI reliabilities). B&M are well aware of this point, and discuss it in some detail, but I think it’s worth emphasizing that when they say that “the results from fMRI research may be somewhat less reliable than many researchers implicitly believe,” what they mean isn’t that the “true” reliability of fMRI is likely to be around .5; rather, it’s that if you look at reliability estimates across a bunch of different studies and analyses, the estimated reliability is often low. But it’s not really possible to generalize from this overall estimate to any particular study; ultimately, if you want to know whether your data were measured reliably, you need to quantify that yourself. So the take-away message shouldn’t be that fMRI is an inherently unreliable method (and I really hope that isn’t how B&M’s findings get reported by the mainstream media should they get picked up), but rather, that there’s a very good chance that the reliability of fMRI in any given situation is not particularly high. It’s a subtle difference, but an important one.

Second, there’s a common misconception that reliability estimates impose an upper bound on the true detectable effect size. B&M make this point in their review, Vul et al made it in their “voodoo correlations”" paper, and in fact, I’ve made it myself before. But it’s actually not quite correct. It’s true that, for any given test, the true reliability of the variables involved limits the potential size of the true effect. But there are many different types of reliability, and most will generally only be appropriate and informative for a subset of statistical procedures. Virtually all types of reliability estimate will underestimate the true reliability in some cases and overestimate it in others. And in extreme cases, there may be close to zero relationship between the estimate and the truth.

To see this, take the following example, which focuses on internal consistency. Suppose you have two completely uncorrelated items, and you decide to administer them together as a single scale by simply summing up their scores. For example, let’s say you have an item assessing shoelace-tying ability, and another assessing how well people like the color blue, and you decide to create a shoelace-tying-and-blue-preferring measure. Now, this measure is clearly nonsensical, in that it’s unlikely to predict anything you’d ever care about. More important for our purposes, its internal consistency would be zero, because its items are (by hypothesis) uncorrelated, so it’s not measuring anything coherent. But that doesn’t mean the measure is unreliable! So long as the constituent items are each individually measured reliably, the true reliability of the total score could potentially be quite high, and even perfect. In other words, if I can measure your shoelace-tying ability and your blueness-liking with perfect reliability, then by definition, I can measure any linear combination of those two things with perfect reliability as well. The result wouldn’t mean anything, and the measure would have no validity, but from a reliability standpoint, it’d be impeccable. This problem of underestimating reliability when items are heterogeneous has been discussed in the psychometric literature for at least 70 years, and yet you still very commonly see people do questionable things like “correcting for attenuation” based on dubious internal consistency estimates.

In their review, B&M mostly focus on test-retest reliability rather than internal consistency, but the same general point applies. Test-retest reliability is the degree to which people’s scores on some variable are consistent across multiple testing occasions. The intuition is that, if the rank-ordering of scores varies substantially across occasions (e.g., if the people who show the highest activation of visual cortex at Time 1 aren’t the same ones who show the highest activation at Time 2), the measurement must not have been reliable, so you can’t trust any effects that are larger than the estimated test-retest reliability coefficient. The problem with this intuition is that there can be any number of systematic yet session-specific influences on a person’s score on some variable (e.g., activation level). For example, let’s say you’re doing a study looking at the relation between performance on a difficult working memory task and frontoparietal activation during the same task. Suppose you do the exact same experiment with the same subjects on two separate occasions three weeks apart, and it turns out that the correlation between DLPFC activation across the two occasions is only .3. A simplistic view would be that this means that the reliability of DLPFC activation is only .3, so you couldn’t possibly detect any correlations between performance level and activation greater than .3 in DLPFC. But that’s simply not true. It could, for example, be that the DLPFC response during WM performance is perfectly reliable, but is heavily dependent on session-specific factors such as baseline fatigue levels, motivation, and so on. In other words, there might be a very strong and perfectly “real” correlation between WM performance and DLPFC activation on each of the two testing occasions, even though there’s very little consistency across the two occasions. Test-retest reliability estimates only tell you how much of the signal is reliably due to temporally stable variables, and not how much of the signal is reliable, period.

The general point is that you can’t just report any estimate of reliability that you like (or that’s easy to calculate) and assume that tells you anything meaningful about the likelihood of your analyses succeeding. You have to think hard about exactly what kind of reliability you care about, and then come up with an estimate to match that. There’s a reasonable argument to be made that most of the estimates of fMRI reliability reported to date are actually not all that relevant to many people’s analyses, because the majority of reliability analyses have focused on test-retest reliability, which is only an appropriate way to estimate reliability if you’re trying to relate fMRI activation to stable trait measures (e.g., personality or cognitive ability). If you’re interested in relating in-scanner task performance or state-dependent variables (e.g., mood) to brain activation (arguably the more common approach), or if you’re conducting within-subject analyses that focus on comparisons between conditions, using test-retest reliability isn’t particularly informative, and you really need to focus on other types of reliability (or reproducibility).

Third, and related to the above point, between-subject and within-subject reliability are often in statistical tension with one another. B&M don’t talk about this, as far as I can tell, but it’s an important point to remember when designing studies and/or conducting analyses. Essentially, the issue is that what counts as error depends on what effects you’re interested in. If you’re interested in individual differences, it’s within-subject variance that counts as error, so you want to minimize that. Conversely, if you’re interested in within-subject effects (the norm in fMRI), you want to minimize between-subject variance. But you generally can’t do both of these at the same time. If you use a very “strong” experimental manipulation (i.e., a task that produces a very large difference between conditions for virtually all subjects), you’re going to reduce the variability between individuals, and you may very well end up with very low test-retest reliability estimates. And that would actually be a good thing! Conversely, if you use a “weak” experimental manipulation, you might get no mean effect at all, because there’ll be much more variability between individuals. There’s no right or wrong here; the trick is to pick a design that matches the focus of your study. In the context of reliability, the essential point is that if all you’re interested in is the contrast between high and low working memory load, it shouldn’t necessarily bother you if someone tells you that the test-retest reliability of induced activation in your study is close to zero. Conversely, if you care about individual differences, it shouldn’t worry you if activations aren’t reproducible across studies at the group level. In some ways, those are actual the ideal situations for each of those two types of studies.

Lastly, B&M raise a question as to what level of reliability we should consider “acceptable” for fMRI research:

There is no consensus value regarding what constitutes an acceptable level of reliability in fMRI. Is an ICC value of 0.50 enough? Should studies be required to achieve an ICC of 0.70? All of the studies in the review simply reported what the reliability values were. Few studies proposed any kind of criteria to be considered a ‘reliable’ result. Cicchetti and Sparrow did propose some qualitative descriptions of data based on the ICC-derived reliability of results (1981). They proposed that results with an ICC above 0.75 be considered ‘excellent’, results between 0.59 and 0.75 be considered ‘good’, results between .40 and .58 be considered ‘fair’, and results lower than 0.40 be considered ‘poor’. More specifically to neuroimaging, Eaton et al. (2008) used a threshold of ICC > 0.4 as the mask value for their study while Aron et al. (2006) used an ICC cutoff of ICC > 0.5 as the mask value.

On this point, I don’t really see any reason to depart from psychometric convention just because we’re using fMRI rather than some other technique. Conventionally, reliability estimates of around .8 (or maybe .7, if you’re feeling generous) are considered adequate. Any lower and you start to run into problems, because effect sizes will shrivel up. So I think we should be striving to attain the same levels of reliability with fMRI as with any other measure. If it turns out that that’s not possible, we’ll have to live with that, but I don’t think the solution is to conclude that reliability estimates on the order of .5 are ok “for fMRI” (I’m not saying that’s what B&M say, just that that’s what we should be careful not to conclude). Rather, we should just accept that the odds of detecting certain kinds of effects with fMRI are probably going to be lower than with other techniques. And maybe we should minimize the use of fMRI for those types of analyses where reliability is generally not so good (e.g., using brain activation to predict trait variables over long intervals).

I hasten to point out that none of this should be taken as a criticism of B&M’s paper; I think all of these points complement B&M’s discussion, and don’t detract in any way from its overall importance. Reliability is a big topic, and there’s no way Bennett and Miller could say everything there is to be said about it in one paper. I think they’ve done the field of cognitive neuroscience an important service by raising awareness and providing an accessible overview of some of the issues surrounding reliability, and it’s certainly a paper that’s going on my “essential readings in fMRI methods” list.

ResearchBlogging.org
Bennett, C. M., & Miller, M. B. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences

naught but a comforting illusion

March 3rd, 2010

A succinct summary of a perennial debate…

[via reddit]

Kahneman on happiness

March 1st, 2010

The latest TED talk is an instant favorite of mine. Daniel Kahneman talks about the striking differences in the way we experience versus remember events:

It’s an entertaining and profoundly insightful 20-minute talk, and worth watching even if you think you’ve heard these ideas before.

The fundamental problem Kahneman discusses is that we all experience our lives on a moment-by-moment basis, and yet we make decisions based on our memories of the past. Unfortunately, it turns out that the experiencing self and the remembering self don’t necessarily agree about what things make us happy, and so we often end up in situations where we voluntarily make choices that actually substantially reduce our experienced utility. I won’t give away the examples Kahneman talks about, other than to say that they beautifully illustrate the relevance of psychology (or at least some branches of psychology) to the real-world decisions we all make–both the trival, day-to-day variety, and the rarer, life-or-death kind.

As an aside, Kahneman gave a talk at Brain Camp (or, officially, the annual Summer Institute in Cognitive Neuroscience, which may now be defunct–or perhaps only on hiatus?) the year I attended. There were a lot of great talks that year, but Kahneman’s really stood out for me, despite the fact that he hardly talked about research at all. It was more of a meditation on the scientific method–how to go about building and testing new theories. You don’t often hear a Nobel Prize winner tell an audience that the work that won the Nobel Prize was completely wrong, but that’s essentially what Kahneman claimed. Of course, his point wasn’t that Prospect Theory was useless, but rather, that many of the holes and limitations of the theory that people have gleefully pointed out over the last three decades were already well-recognized at the time the original findings were published. Kahneman and Tversky’s goal wasn’t to produce a perfect description or explanation of the mechanisms underlying human decision-making, but rather, an approximation that made certain important facts about human decision-making clear (e.g., the fact that people simply don’t follow the theory of Expected Utility), and opened the door to entirely new avenues of research. Kahneman seemed to think that ultimately what we really want isn’t a protracted series of incremental updates to Prospect Theory, but a more radical paradigm shift, and that in that sense, clinging to Prospect Theory might now actually be impeding progress.

You might think that’s a pretty pessimistic message–”hey, you can win a Nobel Prize for being completely wrong!”–but it really wasn’t; I actually found it quite uplifting (if Daniel Kahneman feels comfortable being mostly wrong about his ideas, why should the rest of us get attached to ours?). At least, that’s the way I remember it now. But that talk was nearly three years ago, you see, so my actual experience at the time may have been quite different. Turns out you can’t really trust my remembering self; it’ll tell you anything it thinks it wants me to hear.

in praise of (lab) rotation

February 27th, 2010

I did my PhD in psychology, but in a department that had close ties and collaborations with neuroscience. One of the interesting things about psychology and neuroscience programs is that they seem to have quite different graduate training models, even in cases where the area of research substantively overlaps (e.g., in cognitive neuroscience). In psychology, there seem two be two general models (at least, at American and Canadian universities; I’m not really familiar with other systems). One is that graduate students are accepted into a specific lab and have ties to a specific advisor (or advisors); the other, more common at large state schools, is that graduate students are accepted into the program (or an area within the program) as a whole, and are then given the (relative) freedom to find an advisor they want to work with. There are pros and cons to either model: the former ensures that every student has a place in someone’s lab from the very beginning of training, so that no one falls through the cracks; but the downside is that beginning students often aren’t sure exactly what they want to work on, and there are occasional (and sometimes acrimonious) mentor-mentee divorces. The latter gives students more freedom to explore their research interests, but can make it more difficult for students to secure funding, and has more of a sink-or-swim flavor (i.e., there’s less institutional support for students).

Both of these models differ quite a bit from what I take to be the most common neuroscience model, which is that students spend all or part of their first year doing a series of rotations through various labs–usually for about 2 months at a time. The idea is to expose students to a variety of different lines of research so that they get a better sense of what people in different areas are doing, and can make a more informed judgment about what research they’d like to pursue. And there are obviously other benefits too: faculty get to evaluate students on a trial basis before making a long-term commitment, and conversely, students get to see the internal workings of the lab and have more contact with the lab head before signing on.

I’ve always thought the rotation model makes a lot of sense, and wonder why more psychology programs don’t try to implement one. I can’t complain about my own training, in that I had a really great experience on both personal and professional levels in the labs I worked in; but I recognize that this was almost entirely due to dumb luck. I didn’t really do my homework very well before entering graduate school, and I could easily have landed in a department or lab I didn’t mesh well with, and spent the next few years miserable and unproductive. I’ll freely admit that I was unusually clueless going into grad school (that’s a post for another time), but I think no matter how much research you do, there’s just no way to know for sure how well you’ll do in a particular lab until you’ve spent some time in it. And most first-year graduate students have kind of fickle interests anyway; it’s hard to know when you’re 22 or 23 exactly what problem you want to spend the rest of your life (or at least the next 4 – 7 years) working on. Having people do rotations in multiple labs seems like an ideal way to maximize the odds of students (and faculty) ending up in happy, productive working relationships.

A question, then, for people who’ve had experience on the administrative side of psychology (or neuroscience) departments: what keeps us from applying a rotation model in psychology too? Are there major disadvantages I’m missing? Is the problem one of financial support? Do we think that psychology students come into graduate programs with more focused interests? Or is it just a matter of convention? Inquiring minds (or at least one of them) want to know…

best. video. ever.

February 27th, 2010

I know I’m easily amused, but this is hands-down the best thing I’ve seen since that Super Bowl Old Spice commercial:

what’s adaptive about depression?

February 26th, 2010

Jonah Lehrer has an interesting article in the NYT magazine about a recent Psych Review article by Paul Andrews and J. Anderson Thomson. The basic claim Andrews and Thomson make in their paper is that depression is “an adaptation that evolved as a response to complex problems and whose function is to minimize disruption of rumination and sustain analysis of complex problems”. Lehrer’s article is, as always, engaging, and he goes out of his way to obtain some critical perspectives from other researchers not affiliated with Andrews & Thomson’s work. It’s definitely worth a read.

In reading Lehrer’s article and the original paper, two things struck me. One is that I think Lehrer slightly exaggerates the novelty of Andrews and Thomson’s contribution. The novel suggestion of their paper isn’t that depression can be adaptive under the right circumstances (I think most people already believe that, and as Lehrer notes, the idea traces back a long way); it’s that the specific adaptive purpose of depression is to facilitate solving of complex problems. I think Andrews and Thomson’s paper received a somewhat critical reception (which Lehrer discusses) not so much because people found the suggestion that depression might be adaptive objectionable, but because there are arguably more plausible things depression could have been selected for. Lehrer mentions a few:

Other scientists, including Randolph Nesse at the University of Michigan, say that complex psychiatric disorders like depression rarely have simple evolutionary explanations. In fact, the analytic-rumination hypothesis is merely the latest attempt to explain the prevalence of depression. There is, for example, the “plea for help” theory, which suggests that depression is a way of eliciting assistance from loved ones. There’s also the “signal of defeat” hypothesis, which argues that feelings of despair after a loss in social status help prevent unnecessary attacks; we’re too busy sulking to fight back. And then there’s “depressive realism”: several studies have found that people with depression have a more accurate view of reality and are better at predicting future outcomes. While each of these speculations has scientific support, none are sufficient to explain an illness that afflicts so many people. The moral, Nesse says, is that sadness, like happiness, has many functions.

Personally, I find these other suggestions more plausible than the Andrews and Thomson story (if still not terribly compelling). There are a variety of reasons for this (see Jerry Coyne’s twin posts for some of them, along with the many excellent comments), but one pretty big one is that is that they’re all at least somewhat more consistent with a continuity hypothesis under which many of the selection pressures that influenced the structure of the human mind have been at work in our lineage for millions of years. That’s to say, if you believe in a “signal of defeat” account, you don’t have to come up with complex explanations for why human depression is adaptive (the problem being that other mammals don’t seem to show an affinity for ruminating over complex analytical problems); you can just attribute depression to much more general selection pressures found in other animals as well.

One hypothesis I particularly like in this respect, related to the signal-of-defeat account, is that depression is essentially just a human manifestation of a general tendency toward low self-confidence and aggression. The value of low self-confidence is pretty obvious: you don’t challenge the alpha male, so you don’t get into fights; you only chase prey you think you can safely catch; and so on. Now suppose humans inherited this basic architecture from our ancestral apes. In human societies there’s still a clear potential benefit to being subservient and non-confrontational; it’s a low-risk, low-reward strategy. If you don’t bother anyone, you’re probably not going to get the girl impress the opposite sex very much, but at least you won’t get clubbed over the head by a competitor very often. So there’s a sensible argument to be made for frequency dependent selection for depression-related traits (the reason it’s likely to be frequency dependent is that if you ever had a population made up entirely of self-doubting, non-aggressive individuals, being more aggressive would probably become highly advantageous, so at some point, you’d achieve a stable equilibrium).

So where does rumination–the main focus of the Andrews and Thomson paper–come into the picture? Well, I don’t know for sure, but here’s a pretty plausible just-so story: once you evolve the capacity to reason intelligently about yourself, you now have a higher cognitive system that’s naturally going to want to understand why it feels the way it does so often. If you’re someone who feels pretty upset about things much of the time, you’re going to think about those things a lot. So… you ruminate. And that’s really all you need! Saying that depression is adaptive doesn’t require you to think of every aspect of depression (e.g., rumination) as a complex and human-specific adaptation; it seems more parsimonious to see depressive rumination as a non-adaptive by-product of a more general and (potentially) adaptive disposition to experience negative affect.  On this type of account, ruminating isn’t actually helping a depressed person solve any problems at all. In fact, you could even argue that rumination shouldn’t make you feel better, or it would defeat the very purpose of having a depressive nature in the first place. In other words, it’s entirely consistent with the basic argument that depression is adaptive under some circumstances that the very purpose of rumination might be to keep depressed people in a depressed state. I don’t have any direct evidence for this, of course; it’s a just-so story. But it’s one that is, in my opinion (a) more plausible and (b) more consistent with indirect evidence (e.g., that rumination generally doesn’t seem to make people feel better!) than the Andrews and Thomson view.

The other thing that struck me about the Andrews and Thomson paper, and to a lesser extent, Lehrer’s article, is that the focus is (intentionally) squarely on whether and why depression is adaptive from an evolutionary standpoint. But it’s not clear that the average person suffering from depression really cares, or should care, about whether their depression exists for some distant evolutionary reason. What’s much more germane to someone suffering from depression is whether their depression is actually increasing their quality of life, and in that respect, it’s pretty difficult to make a positive case. The argument that rumination is adaptive because it helps you solve complex analytical problems is only compelling if you think that those problems are really worth mulling over deeply in the first place. For most of the things that depressed people tend to ruminate over (most of which aren’t life-changing decisions, but trivial things like whether your co-workers hate you because of the unfashionable shirt you wore to work yesterday), that just doesn’t seem to be the case. So the argument becomes circular: rumination helps you solve problems that a happier person probably wouldn’t have been bothered by in the first place. Now, that isn’t to say that there aren’t some very specific environments in which depression might still be adaptive today; it’s just that there don’t seem to be very many of them. If you look at the data, it’s quite clear that, on average, depression has very negative effects. People lose friends, jobs, and the joy of life because of their depression; it’s hard to see what monumental problem-solving insight could possibly compensate for that in most cases. By way of analogy, saying that depression is adaptive because it promotes rumination seems kind of like saying that cigarettes serve an adaptive purpose because they make nicotine withdrawal go away. Well, maybe. But wouldn’t you rather not have the withdrawal symptoms to begin with?

To be clear, I’m not suggesting that we should view depression solely in pathological terms, and should entirely write off the possibility that there are some potentially adaptive aspects to depression (or personality traits that go along with it). Rather, the point is that, if you’re suffering from depression, it’s not clear what good it’ll do you to learn that some of your ancestors may have benefited from their depressive natures. (By the same token, you wouldn’t expect a person suffering from sickle-cell anemia to gain much comfort from learning that they carry two copies of a mutation that, in a heterozygous carrier, would confer a strong resistance to malaria.) Conversely, there’s a very real danger here, in the sense that, if Andrews and Thomson are wrong about rumination being adaptive, they might be telling people it’s OK to ruminate when in fact excessive rumination could be encouraging further depression. My sense is that that’s actually the received wisdom right now (i.e., much of cognitive-behavioral therapy is focused on getting depressed individuals to recognize their ruminative cycles and break out of them). So the concern is that too much publicity might be a bad thing in this case, and, far from heralding the arrival of a new perspective on the conceptualization and treatment of depression, may actually be hurting some people. Ultimately, of course, it’s an empirical matter, and certainly not one I have any conclusive answers to. But what I can quite confidently assert in the meantime is that the Lehrer article is an enjoyable read, so long as you read it with a healthy dose of skepticism.

ResearchBlogging.org
Andrews, P., & Thomson, J. (2009). The bright side of being blue: Depression as an adaptation for analyzing complex problems. Psychological Review, 116 (3), 620-654 DOI: 10.1037/a0016242

if natural selection goes, so does most everything else

February 23rd, 2010

Jerry Fodor and Massimo Piattelli-Palmarini have a new book out entitled What Darwin Got Wrong. The book hasto put it gentlynot been very well received (well, the creationists love it). Its central thesis is that natural selection fails as a mechanism for explaining observable differences between species, because there’s ultimately no way to conclusively determine whether a given trait was actively selected for, or if it’s just a free-rider that happened to be correlated with another trait that truly was selected for. For example, we can’t really know why polar bears are white: it could be that natural selection favored white fur because it allows the bears to blend into their surroundings better (presumably improving their hunting success), or it could be that bears with sharper teeth happen to have white fur, or that smaller, less energetic bears who need to eat less often tend to have white fur, or that a mutant population of polar bears who happened to be white also happened to have a resistance to some deadly disease that wiped out all non-white polar bears, or… you get the idea.

If this sounds like pretty silly reasoning to you, you’re not alone. Virtually all of the reviews (or at least, those written by actual scientists) have resoundingly panned Fodor and Piattelli-Palmarini for writing a book about evolution with very little apparent understanding of evolution. Since I haven’t read the book, and can’t claim much knowledge of evolutionary biology, I’m not going to weigh in with a substantive opinion, except to say that, based on the reviews I’ve read, along with an older article of Fodor’s that makes much the same argument, I don’t see any reason to disagree with the critics. The most elegant critique I’ve come across is Block and Kitcher’s review of the book in the Boston Review:

The basic problem, according to Fodor and Piattelli-Palmarini, is that the distinction between free-riders and what they ride on is “invisible to natural selection.” Thus stated, their objection is obscure because it relies on an unfortunate metaphor, introduced by Darwin. In explaining natural selection, the Origin frequently resorts to personification: “natural selection is daily and hourly scrutinising, throughout the world, every variation, even the slightest” (emphasis added). When they talk of distinctions that are “invisible” to selection, they continue this personification, treating selection as if it were an observer able to choose among finely graded possibilities. Central to their case is the thesis that Darwinian evolutionary theory must suppose that natural selection can make the same finely graded discriminations available to a human (or divine?) observer.

Neither Darwin, nor any of his successors, believes in the literal scrutiny of variations. Natural selection, soberly presented, is about differential success in leaving descendants. If a variant trait (say, a long neck or reduced forelimbs) causes its bearer to have a greater number of offspring, and if the variant is heritable, then the proportion of organisms with the variant trait will increase in subsequent generations. To say that there is “selection for” a trait is thus to make a causal claim: having the trait causes greater reproductive success.

Causal claims are of course familiar in all sorts of fields. Doctors discover that obesity causes increased risk of cardiac disease; atmospheric scientists find out that various types of pollutants cause higher rates of global warming; political scientists argue that party identification is an important cause of voting behavior. In each of these fields, the causes have correlates: that is why causation is so hard to pin down. If Fodor and Piattelli-Palmarini believe that this sort of causal talk is “conceptually flawed” or “incoherent,” then they have a much larger opponent then Darwinism: their critique will sweep away much empirical inquiry.

This really seems to me to get at the essence of the claim, and why it’s silly. Fodor and Piattelli-Palmarini are essentially claiming that natural selection is bunk because you can never be absolutely sure that natural selection operated on the trait you think it operated on. But scientists don’t require absolute certainty to hold certain beliefs about the way the world works; we just require that those beliefs seem somewhat more plausible than other available alternatives. If you take absolute certainty as a necessary criterion for causal inference, you can’t do any kind of science, period.

It’s not just evolutionary biology that suffers; if you held psychologists to the same standards, for example, we’d be in just as much trouble, because there’s always some potential confound that might explain away a putative relation between an experimental manipulation and a behavioral difference. If nothing else, you can always blame sampling error: you might think that giving your subjects 200 mg of caffeine was what caused them to have to go to the bathroom every fifteen minutes report decreased levels of subjective fatigue, but maybe you just happened to pick a particularly sleep-deprived control group. That’s surely no less plausible an explanation than some of the alternative accounts for the whiteness of the polar bear suggested above. But if you take this type of argument seriously, you can pretty much throw any type of causal inference (and hence, most science) out the window. So it’s hardly surprising that Fodor and Piattelli-Palmarini’s new book hasn’t received a particularly warm reception. Most of the critics are under the impression that science is a pretty valuable enterprise, and seems to work reasonably well most of the time, despite the rampant uncertainty that surrounds most causal inferences.

Lest you think there must be some subtlety to Fodor’s argument the critics have missed, or that there’s some knee-jerk defensiveness going on on the part of, well, damned near every biologist who’s cared to comment, I leave you with this gem, from a Salon interview with Fodor (via Jerry Coyne):

Creationism isn’t the only doctrine that’s heavily into post-hoc explanation. Darwinism is too. If a creature develops the capacity to spin a web, you could tell a story of why spinning a web was good in the context of evolution. That is why you should be as suspicious of Darwinism as of creationism. They have spurious consequence in common. And that should be enough to make you worry about either account.

I guess if you really believed that every story you could come up with about web-spinning was just as good as any other, and that there was no way to discriminate between them empirically (a notion Coyne debunks), this might seem reasonable. But then, you can always make up just-so stories to fit any set of facts. If you don’t allow for the fact that some stories have better evidential support than others, you indeed have no way to discriminate creationism from science. But I think it’s a sad day if Jerry Fodor–who’s made several seminal contributions to cognitive science and the philosophy of science–really believes that.

what do personality psychology and social psychology actually have in common?

February 18th, 2010

Is there a valid (i.e., non-historical) reason why personality psychology and social psychology are so often lumped together as one branch of psychology? There are PSP journals, PSP conferences, PSP brownbags… the list goes on. It all seems kind of odd considering that, in some ways, personality psychologists and social psychologists have completely opposite focuses (foci?). Personality psychologists are all about the consistencies in people’s behavior, and classify situational variables under “measurement error”; social psychologists care not one whit for traits, and are all about how behavior is influenced by the situation. Also, aside from the conceptual tension, I’ve often gotten the sense that personality psychologists and social psychologists often just don’t like each other very much. Which I guess would make sense if you think these are two relatively distinct branches of psychology that, for whatever reason, have been lumped together inextricably for several decades. It’s kind of like being randomly assigned a roommate in college, except that you have to live with that roommate for the rest of your life.

I’m not saying there aren’t ways in which the two disciplines overlap. There are plenty of similarities; for example, they both tend to heavily feature self-report, and both often involve the study of social behavior. But that’s not really a good enough reason to lump them together. You can take almost any two branches of psychology and find a healthy intersection. For example, the interface between social psychology and cognitive psychology is one of the hottest areas of research in psychology at the moment. There’s a journal called Social Cognition–which, not coincidentally, is published by the International Social Cognition Network. Lots of people are interested in applying cognitive psychology models to social psychological issues. But you’d probably be taking bullets from both sides of the hallway if you ever suggested that your department should combine their social psychology and cognitive psychology brown bag series. Sure, there’s an overlap, but there’s also far more content that’s unique to each discipline.

The same is true for personality psychology and social psychology, I’d argue. Many (most?) personality psychologists aren’t intrinsically interested in social aspects of personality (at least, no more so than in other, non-social aspects), and many social psychologists couldn’t give a rat’s ass about the individual differences that make each of us a unique and special flower. And yet there we sit, week after week, all together in the same seminar room, as one half of the audience experiences rapture at the speaker’s words, and the other half wishes they could be slicing blades of grass off their lawn with dental floss. What gives?

the OKCupid guide to dating older women

February 17th, 2010

Continuing along on their guided tour of Data I Wish I Had Access To, the OKCupid folks have posted another set of interesting figures on their blog. This time, they make the case for dating older women, suggesting that men might get more bang for their buck (in a literal sense, I suppose) by trying to contact women their age or older, rather than trying to hit on the young ‘uns. Men, it turns out, are creepy. Here’s how creepy:

Actually, that’s not so creepy. All it says is that men say they prefer to date younger women. That’s not going to shock anyone. This one is creepier:

The reason it’s creepy is that it basically says that, irrespective of what age ranges men say they find acceptable in a potential match, they’re actually all indiscriminately messaging 18-year old women. So basically, if you’re a woman on OKCupid who’s searching for that one special, non-creepy guy, be warned: they don’t exist. They’re pretty much all going to be eying 18-year olds for the rest of their lives. (To be fair, women also show a tendency to contact men below their lowest reported acceptable age. But it’s a much weaker effect; 40-year old women only occasionally try to hit on 24-year old guys, and tend to stay the hell away from the not-yet-of-drinking-age male population.)

Anyway, using this type of data, the OKCupid folks then generate this figure:

…which also will probably surprise no one, as it basically says women are most desirable when they’re young, and men when they’re (somewhat) older. But what the OKCupid folks then suggest is that it would be to men’s great advantage to broaden their horizons, because older women (which, in their range-restricted population, basically means anything over 30) self-report being much more interested in having sex more often, having casual sex, and using protection. I won’t bother hotlinking to all of those images, but here’s where they’re ultimately going with this:

I’m not going to comment on the appropriateness of trying to nudge one’s male userbase in the direction of more readily available casual sex (though I suspect they don’t need much nudging anyway). What I do wonder is to what extent these results reflect selection effects rather than a genuine age difference. The OKCupid folks suggest that women’s sexual interest increases as they age, which seems plausible given the conventional wisdom that women peak sexually in their 30s. But the effects in this case look pretty huge (unless the color scheme is misleading, which it might be; you’ll have to check out the post for the neat interactive flash animations), and it seems pretty plausible that much of the age effect could be driven by selection bias. Women with a more monogamous orientation are probably much more likely to be in committed, stable relationships by the time they turn 30 or 35, and probably aren’t scanning OKCupid for potential mates. Women who are in their 30s and 40s and still using online dating services are probably those who weren’t as interested in monogamous relationships to begin with. (Of course, the same is probably true of older men. Except that since men of all ages appear to be pretty interested in casual sex, there’s unlikely to be an obvious age differential.)

The other thing I’m not clear on is whether these analyses control for the fact that the userbase is heavily skewed toward younger users:

The people behind OKCupid are all mathematicians by training, so I’d be surprised if they hadn’t taken the underlying age distribution into consideration. But they don’t say anything about it in their post. The worry is that, if the base rate of different age groups isn’t taken into consideration, the heat map displayed above could be quite misleading. Given that there are many, many more 25-year old women on OKCupid than 35-year old women, failing to normalize properly would almost invariably make it look like there’s a heavy skew for men to message relatively younger women, irrespective of the male sender’s age. By the same token, it’s not clear that it’d be good advice to tell men to seek out older women, given that there are many fewer older women in the pool to begin with. As a thought experiment, suppose that the entire OKCupid male population suddenly started messaging women 5 years older than them, and entirely ignored their usual younger targets. The hit rate wouldn’t go up; it would probably actually fall precipitously, since there wouldn’t be enough older women to keep all the younger men entertained (at least, I certainly hope there wouldn’t). No doubt there’s a stable equilibrium point somewhere, where men and women are each targeting exactly the right age range to maximize their respective chances. I’m just not sure that it’s in OKCupid’s proposed “zone of greatness” for the men.

It’s also a bit surprising that OKCupid didn’t break down the response rate to people of the opposite gender as a function of the sender and receiver’s age. They’ve done this in the past, and it seems like the most direct way of testing whether men are more likely to get lucky by messaging older or younger women. Without knowing whether older women are actually responding to younger men’s overtures, it’s kind of hard to say what it all means. Except that I’d still kill to have their data.