babygate blues: a neuromarketing tale

Cory Doctorow has a new short story (“Ghosts in my Head“) about the undesirable consequences of neuromarketing run amok up on the Subterranean Press website.  I liked the story, but thought the premise was pretty unrealistic (and, yes, I do know it’s called science fiction for a reason–I’m just sayin’). So as a counterpoint, here’s an alternative neuromarketing future that I personally find much more plausible.

Deborah Stojko didn’t care much for Pockter and Gramble’s corporate headquarters. The building smelled of disinfectant and organization; the halogen corridors all blended together into one giant dimly-lit maze. Stojko had been visiting P&G regularly for several years now; it was never a pleasant experience, but it couldn’t be avoided. Communicating with major stakeholders was a large part of her job as director of the International Consortium for Neuromarketing Research. And P&G was by far the largest stakeholder, contributing over 70% of the money that supported the consortium’s work.

For several years now, ICNR had been pumping out first-class scientific research on the neural mechanisms of economic decision-making. The Richelieu effect, Preinforcement Learning, the neurometric satisficing theorem… ICNR was behind any number of recent discoveries; its members were continually in the news. And all of it was made possible only through the generosity of the marketing and R&D wings of P&G.

The generosity, or the naivete? Stojko asked herself as she reached her destination and knocked softly on an office door. Somehow, the executives at Pockter and Gramble had managed to convince themselves that the survival of P&G rested on their ability to mine the deep secrets of the brain. For years now, they’d been throwing sums of money at cognitive neuroscientists that would make European royalty blush. That streak of good fortune, Stojko suspected, was now about to end. Recent events had rendered P&G’s massive investment in ICNR something of a political liability; she had the feeling this was the last time she’d be making the trip to P&G headquarters.

And not a moment too soon, she thought, as the door opened in front of her.

*    *    *

“How long has Pockter and Gramble been funding you, Deborah,” Bob Ramsey, Chief Executive Officer, asked, once Stojko was seated and they’d gotten the standard pleasantries out of the way.

Stojko did the arithmetic in her head. The International Neuromarketing Consortium had formed in 2013, following a massive infusion of P&G cash, so…

“Six years,” she said.

“Right. And do you know how much money Pockter and Gramble has given your consortium in those six years?”

“I’d put it somewhere between 251.8 and 251.9 million dollars.”

“Very clever. A quarter of a billion dollars. We’ve given you a quarter. Of a billion. Dollars.”

“Well, to be fair, that amount is spread out over 8 sites and 30 other investigators,” Stojko pointed out. “It’s not like you wrote me a check for 250 million. My institution only got about forty-five million.”

Ramsey didn’t say anything, but his expression bespoke a thinly-veiled irritation. He picked up a remote control on the desk and pushed a button. Behind Stojko, the wall turned translucent as the embedded display lit up.

“No doubt you’ll recognize this clip,” Ramsey said.

Stojko swiveled around to watch the giant screen. The camera faded in on a bright and comfortable-looking living room somewhere in America. Almost immediately, six or seven babies in diapers filed into the room and began dancing synchronously in a circle. After a few seconds of dancing, the babies started babbling an Eastern-sounding melody in a totally incomprehensible–and, Stojko suspected, nonexistent–language. And a few seconds after that, they started banging spoons on the tabletop in perfect unison, all the while still dancing and singing in tongues. The whole thing lasted exactly thirty seconds, and occupied a very narrow emotional niche between really adorable and utterly creepy.

Stojko did recognize the clip, of course; it was an ad for Dampers, a P&G-owned diaper brand. The consortium had selected the ad from over two dozen candidates that P&G had asked them to test. For reasons that remained unclear to Stojko–and to pretty much everyone else–singing, dancing, spoon-banging babies lit the brain up like a christmas tree.

Stojko had had her reservations about declaring a ‘winner'; she’d written several long emails to the P&G marketing brain trust explaining that, brain activation notwithstanding, there really wasn’t any evidence yet that this particular ad was going to help sell more diapers, and many more studies were needed before the consortium could confidently interpret its own results. But marketing wasn’t into the whole waiting thing, and the ad was on the air within three months of the consortium’s initial report.

As it turned out, it didn’t do so well.

“That ad bombed,” Ramsey said, wagging his finger in the general direction of the screen, “According to you people, it was supposed to push all of the brain’s buttons at once. You spent three million dollars of our money just on that one testing program. Two dozen ads to choose from, and the one you pick completely tanked. It was an epic failure. At this very moment, people in living rooms all over America are laughing at Pockter and Gramble because of that ad.”

“I’m sure it’s not that bad” said Stojko, smirking almost imperceptibly. She was well aware of the PR disaster P&G had on its hands, of course. But she couldn’t deny the warm feeling of schadenfreude that accompanied the knowledge that P&G was now paying many times over for disregarding just about every recommendation the consortium had made in its 480-page report. She was pretty sure the suits had never made it past the fifth or sixth page.

“It is that bad,” Ramsey shot back. “We blew half of our network budget for the year on this ad. Our initial focus groups were already pretty positive, and then we received your report saying things like–and I quote–“of all the ads tested, number seventeen elicited the largest response in brain areas associated with reward.” So we figured it was a sure thing, and started airing the ad in all the major markets. And then, out of nowhere, we get this massive backlash. Thousands of angry emails from people complaining that the ad was trite and we were shamefully “exploiting babies”. People saying they would never buy Dampers diapers again; that the CEO–that’s me, mind you–should resign; that someone should “just torch Pockter and Gramble headquarters”. And those were just the serious complaints. There were also the people who apparently thought the whole thing was just a big joke that gave them an opening to do their own thing. We had forty YouTube videos a day uploaded by people spoofing the ad. There was one clip of six guys in giraffe suits singing and doing our baby dance. Sixteen million hits.”

“All publicity is good publicity, right?”

“No. Not even close.”

Stojko chuckled just loudly enough for Ramsey to hear.

“Is this funny to you?” Ramsey asked. “We give you a quarter of a billion dollars for commercials designed to push the brain’s reward buttons, and we get grown men in giraffe suits?”

“Well, let me put it this way, Bob. If your goal was really to make commercials that light up the brain’s reward circuitry, you wouldn’t have needed to do any serious research in the first place; you could have just run 30-second clips of semi-nude women making out with each other, or couples giggling and cuddling in bed. That’d cover most of the bases. You’d have all the reward-related activation you could want. But how many deodorant sticks do you think commercials like that would sell?”

Ramsey stared at Stojko blankly.

“Porn, flashing lights, pictures of hundred-dollar bills, a basket of shiny fresh fruit… lots of things activate the brain’s reward centers,” Stojko continued. “What makes you think a commercial that tangentially elicits reward-related activation is going to make people buy any more of a product?”

“Well, can’t you tell that?”

“Can we?” asked Stojko rhetorically. “I don’t know. Can you tell that? You guys probably have labs full of people trying to figure out whether the fact that people tell you they like a commercial means they’re going to buy more of the product featured in that commercial. And what’s the answer?”

“I don’t know that myself,” Ramsey replied abruptly. “It’s not my job to know that. I can have marketing come up here and tell you the answer if you like.”

Stojko shook her head.

“Doesn’t matter. I mean, it can only go one of two ways. If marketing doesn’t know what makes a commercial good or bad, you can’t really expect us to tell you what it is about the brain that makes people buy things. We don’t track how well your products sell after different ads go into circulation; how the hell would we know which commercials have the largest impact on sales? I can tell you which commercials activate the nucleus accumbens more than others, but so what? How am I supposed to know if nucleus accumbens activation is a good predictor of actual purchases without actually knowing anything about real-world purchases?”

Ramsey had nothing to say to that; he stared down at his shoes.

“So clearly, that’s not going to help us,” Stojko continued. “But suppose instead we pretend that the people in your marketing department are smart cookies, and they do know what it is about commercials that makes people buy your products. Well, in that case, what the hell would you need us? If you’ve figured out that people are more likely to buy your anti-dandruff shampoo after watching ads they rate ‘extremely interesting’, what is peering into the brain going to tell you?”

“Well, I guess you could use brain imaging to figure out what it is that people find extremely interesting, right?”

“Sure, Bob, we could do that. And you know how we’d do that? By asking people which commercials they found interesting, and then correlating their verbal responses with what their brains were doing while they watched those commercials. And you know what that means? It means we can never do any better than your people can do with your focus groups and spreadsheets. Because basically, we’re stuck trying to predict the same variables that you guys are using to predict people’s buying behavior. We’re just one step further removed.”

Ramsey listened quietly, but anger visibly colored his face as Stojko spoke.

“This is the kind of thing that might have been good to bring up, oh, say, five years ago,” he said.

“Oh, believe me, we did bring it up,” Stojko smiled bitterly. “Or at least, we tried to.”

She tapped a few keys on her holoboard.

“Here’s an email dated June 18th, 2014: “Dear Mr. Chauahan–I believe that’s your VP of marketing, right?–senior members of the consortium continue to express their frustration at Pockter and Gramble’s failure to provide us with the sales data we requested. As we indicated in our letter dated April 21st, it is not possible for us to properly evaluate the efficacy of our program without the use of real-world performance metrics. We understand your concerns about sharing private data with outside contractors; however…”

Stojko shot Ramsey a pointed look.

“I’ll spare you the rest; it goes on like that for three pages. See, we’ve been asking for the data we need for six years now–pretty much since we started. And every time we ask, you throw more money at us and tell us to go back to work, that you’re not going to share your numbers with us because they’re confidential and we shouldn’t need that information anyway.”

She tapped a few more keys.

“Here’s another similar one. September 30th: Dear Mr. Chauahan, the consortium is at a loss to understand…”

“Enough!” yelled Ramsey, slamming his fist down on the desk. “I get the point! We’ve spent a quarter of a billion buying you new toys to play with, and all the while you’ve been playing us for idiots. Well, you know what–enjoy your toys while they last, because we’re going to have Legal look at our options for recovering that money first thing Monday morning. Those fancy new scanners of yours are going away.”

He wheeled his chair away from Stojko and sat there fuming. Stojko took it as a sign the meeting was over; she shrugged and got up to leave.

The falling out was unfortunate, she thought as she walked down the long sterile corridor towards the elevator. But it had been a long time coming, and after the whole Babygate episode (as the scientists at ICNR had started calling it), no one at ICNR would be surprised to hear that P&G was pulling the plug.

Nor would most of them mind terribly much. Stojko had always planned for a six or seven-year run, and had stopped hiring people on short-term contracts a couple of years ago. There would be no massive lay-offs, no collective plunge into obscurity for the many researchers invested in the project. The data was already collected, and she and her colleagues would be kept busy analyzing and publishing the results for years to come.

As for Ramsey’s legal threats, Stojko wasn’t the least bit worried. Universities had lawyers too, and there wasn’t a judge in the country who’d award P&G a single nickel for breach of contract; not after reading the long series of emails from the consortium that already explained in excruciating detail exactly why P&G was never going to recoup its financial investment unless it fundamentally changed the way it did things. Which, of course, hadn’t happened–and probably never would.

Stojko left Pockter and Gramble headquarters with a clear conscience. At the end of the day, she thought as she walked to her car, all you could do was represent yourself honestly to the other party and let the chips fall where they may. And that was what she’d done. She’d told P&G all along exactly how the consortium was going to spend the money they received; the service agreements she signed were very clearly delineated in legalese that several lawyers on the institutional payroll had contributed to and pored over. Stojko and her colleagues had worked hard to ensure that no one at P&G was laboring under false pretenses about the likely outcome of ICNR’s work. As she’d once put it to a mid-level P&G executive over dinner, neuromarketing research was great for science, and (in her estimation) utterly useless for advertising. But if the suits were willing to pay for it, she was willing to do the research. That, after all, was her job; it was what she’d be doing with her time anyway, ICNR or no ICNR.

No, she thought, turning the key in the ignition. She’d been right to take the industry money; ICNR had conducted itself impeccably over the past six years. If someone insisted on filling your cup up with change even after you very carefully explained to them that you were only going to buy beer with it, who could blame you for paying a visit to the bar once panhandling hours were over?

elsewhere on the net, vacation edition

I’m hanging out in Boston for a few days, so blogging will probably be sporadic or nonexistent. Which is to say, you probably won’t notice any difference.

The last post on the Dunning-Kruger effect somehow managed to rack up 10,000 hits in 48 hours; but that was last week. Today I looked at my stats again, and the blog is back to a more normal 300 hits, so I feel like it’s safe to blog again. Here are some neat (and totally unrelated) links from the past week:

  • OKCupid has another one of those nifty posts showing off all the cool things they can learn from their gigantic userbase (who else gets to say things like “this analysis includes 1.51 million users’ data”???). Apparently, tall people (claim to) have more sex, attractive photos are more likely to be out of date, and most people who claim to be bisexual aren’t really bisexual.
  • After a few months off, my department-mate Chris Chatham is posting furiously again over at Developing Intelligence, with a series of excellent posts reviewing recent work on cognitive control and the perils of fMRI research. I’m not really sure what Chris spent his blogging break doing, but given the frequency with which he’s been posting lately, my suspicion is that he spent it secretly writing blog posts.
  • Mark Liberman points out a fundamental inconsistency in the way we view attributions of authorship: we get appropriately angry at academics who pass someone else’s work off as their own, but think it’s just fine for politicians to pay speechwriters to write for them. It’s an interesting question, and leads to an intimately related, and even more important question–namely, will anyone get mad at me if I pay someone else to write a blog post for me about someone else’s blog post discussing people getting angry at people paying or not paying other people to write material for other people that they do or don’t own the copyright on?
  • I like oohing and aahing over large datasets, and the Guardian’s Data Blog provides a nice interface to some of the most ooh- and aah-able datasets out there. [via R-Chart]
  • Ed Yong has a characteristically excellent write-up about recent work on the magnetic vision of birds. Yong also does link dump posts better than anyone else, so you should probably stop reading this one right now and read his instead.
  • You’ve probably heard about this already, but some time last week, the brain trust at ScienceBlogs made the amazingly clever decision to throw away their integrity by selling PepsiCo its very own “science” blog. Predictably, a lot of the bloggers weren’t happy with the decision, and many have now moved onto greener pastures; Carl Zimmer’s keeping score. Personally, I don’t have anything intelligent to add to everything that’s already been said; I’m literally dumbfounded.
  • Andrew Gelman takes apart an obnoxious letter from pollster John Zogby to Nate Silver of I guess now we know that Zogby didn’t get where he is by not being an ass to other people.
  • Vaughan Bell of Mind Hacks points out that neuroplasticity isn’t a new concept, and was discussed seriously in the literature as far back as the 1800s. Apparently our collective views about the malleability of mind are not, themselves, very plastic.
  • NPR ran a three-part story by Barbara Bradley Hagerty on the emerging and somewhat uneasy relationship between neuroscience and the law. The articles are pretty good, but much better, in my opinion, was the Talk of the Nation episode that featured Hagerty as a guest alongside Joshua Greene, Kent Kiehl, and Stephen Morse–people who’ve all contributed in various ways to the emerging discipline of NeuroLaw. It’s a really interesting set of interviews and discussions. For what it’s worth, I think I agree with just about everything Greene has to say about these issues–except that he says things much more eloquently than I think them.
  • Okay, this one’s totally frivolous, but does anyone want to buy me one of these things? I don’t even like dried food; I just think it would be fun to stick random things in there and watch them come out pale, dried husks of their former selves. Is it morbid to enjoy watching the life slowly being sucked out of apples and mushrooms?

what the Dunning-Kruger effect is and isn’t

If you regularly read cognitive science or psychology blogs (or even just the lowly New York Times!), you’ve probably heard of something called the Dunning-Kruger effect. The Dunning-Kruger effect refers to the seemingly pervasive tendency of poor performers to overestimate their abilities relative to other people–and, to a lesser extent, for high performers to underestimate their abilities. The explanation for this, according to Kruger and Dunning, who first reported the effect in an extremely influential 1999 article in the Journal of Personality and Social Psychology, is that incompetent people by lack the skills they’d need in order to be able to distinguish good performers from bad performers:

…people who lack the knowledge or wisdom to perform well are often unaware of this fact. We attribute this lack of awareness to a deficit in metacognitive skill. That is, the same incompetence that leads them to make wrong choices also deprives them of the savvy necessary to recognize competence, be it their own or anyone else’s.

For reasons I’m not really clear on, the Dunning-Kruger effect seems to be experiencing something of a renaissance over the past few months; it’s everywhere in the blogosphere and media. For instance, here are just a few alleged Dunning-Krugerisms from the past few weeks:

So what does this mean in business? Well, it’s all over the place. Even the title of Dunning and Kruger’s paper, the part about inflated self-assessments, reminds me of a truism that was pointed out by a supervisor early in my career: The best employees will invariably be the hardest on themselves in self-evaluations, while the lowest performers can be counted on to think they are doing excellent work…

Heidi Montag and Spencer Pratt are great examples of the Dunning-Kruger effect. A whole industry of assholes are making a living off of encouraging two attractive yet untalented people they are actually genius auteurs. The bubble around them is so thick, they may never escape it. At this point, all of America (at least those who know who they are), is in on the joke – yet the two people in the center of this tragedy are completely unaware…

Not so fast there — the Dunning-Kruger effect comes into play here. People in the United States do not have a high level of understanding of evolution, and this survey did not measure actual competence. I’ve found that the people most likely to declare that they have a thorough knowledge of evolution are the creationists…but that a brief conversation is always sufficient to discover that all they’ve really got is a confused welter of misinformation…

As you can see, the findings reported by Kruger and Dunning are often interpreted to suggest that the less competent people are, the more competent they think they are. People who perform worst at a task tend to think they’re god’s gift to said task, and the people who can actually do said task often display excessive modesty. I suspect we find this sort of explanation compelling because it appeals to our implicit just-world theories: we’d like to believe that people who obnoxiously proclaim their excellence at X, Y, and Z must really not be so very good at X, Y, and Z at all, and must be (over)compensating for some actual deficiency; it’s much less pleasant to imagine that people who go around shoving their (alleged) superiority in our faces might really be better than us at what they do.

Unfortunately, Kruger and Dunning never actually provided any support for this type of just-world view; their studies categorically didn’t show that incompetent people are more confident or arrogant than competent people. What they did show is this:

This is one of the key figures from Kruger and Dunning’s 1999 paper (and the basic effect has been replicated many times since). The critical point to note is that there’s a clear positive correlation between actual performance (gray line) and perceived performance (black line): the people in the top quartile for actual performance think they perform better than the people in the second quartile, who in turn think they perform better than the people in the third quartile, and so on. So the bias is definitively not that incompetent people think they’re better than competent people. Rather, it’s that incompetent people think they’re much better than they actually are. But they typically still don’t think they’re quite as good as people who, you know, actually are good. (It’s important to note that Dunning and Kruger never claimed to show that the unskilled think they’re better than the skilled; that’s just the way the finding is often interpreted by others.)

That said, it’s clear that there is a very large discrepancy between the way incompetent people actually perform and the way they perceive their own performance level, whereas the discrepancy is much smaller for highly competent individuals. So the big question is why. Kruger and Dunning’s explanation, as I mentioned above, is that incompetent people lack the skills they’d need in order to know they’re incompetent. For example, if you’re not very good at learning languages, it might be hard for you to tell that you’re not very good, because the very skills that you’d need in order to distinguish someone who’s good from someone who’s not are the ones you lack. If you can’t hear the distinction between two different phonemes, how could you ever know who has native-like pronunciation ability and who doesn’t? If you don’t understand very many words in another language, how can you evaluate the size of your own vocabulary in relation to other people’s?

This appeal to people’s meta-cognitive abilities (i.e., their knowledge about their knowledge) has some intuitive plausibility, and Kruger, Dunning and their colleagues have provided quite a bit of evidence for it over the past decade. That said, it’s by no means the only explanation around; over the past few years, a fairly sizeable literature criticizing or extending Kruger and Dunning’s work has developed. I’ll mention just three plausible (and mutually compatible) alternative accounts people have proposed (but there are others!)

1. Regression toward the mean. Probably the most common criticism of the Dunning-Kruger effect is that it simply reflects regression to the mean–that is, it’s a statistical artifact. Regression to the mean refers to the fact that any time you select a group of individuals based on some criterion, and then measure the standing of those individuals on some other dimension, performance levels will tend to shift (or regress) toward the mean level. It’s a notoriously underappreciated problem, and probably explains many, many phenomena that people have tried to interpret substantively. For instance, in placebo-controlled clinical trials of SSRIs, depressed people tend to get better in both the drug and placebo conditions. Some of this is undoubtedly due to the placebo effect, but much of it is probably also due to what’s often referred to as “natural history”. Depression, like most things, tends to be cyclical: people get better or worse better over time, often for no apparent rhyme or reason. But since people tend to seek help (and sign up for drug trials) primarily when they’re doing particularly badly, it follows that most people would get better to some extent even without any treatment. That’s regression to the mean (the Wikipedia entry has other nice examples–for example, the famous Sports Illustrated Cover Jinx).

In the context of the Dunning-Kruger effect, the argument is that incompetent people simply regress toward the mean when you ask them to evaluate their own performance. Since perceived performance is influenced not only by actual performance, but also by many other factors (e.g., one’s personality, meta-cognitive ability, measurement error, etc.), it follows that, on average, people with extreme levels of actual performance won’t be quite as extreme in terms of their perception of their performance. So, much of the Dunning-Kruger effect arguably doesn’t need to be explained at all, and in fact, it would be quite surprising if you didn’t see a pattern of results that looks at least somewhat like the figure above.

2. Regression to the mean plus better-than-average. Having said that, it’s clear that regression to the mean can’t explain everything about the Dunning-Kruger effect. One problem is that it doesn’t explain why the effect is greater at the low end than at the high end. That is, incompetent people tend to overestimate their performance to a much greater extent than competent people underestimate their performance. This asymmetry can’t be explained solely by regression to the mean. It can, however, be explained by a combination of RTM and a “better-than-average” (or self-enhancement) heuristic which says that, in general, most people have a tendency to view themselves excessively positively. This two-pronged explanation was proposed by Krueger and Mueller in a 2002 study (note that Krueger and Kruger are different people!), who argued that poor performers suffer from a double whammy: not only do their perceptions of their own performance regress toward the mean, but those perceptions are also further inflated by the self-enhancement bias. In contrast, for high performers, these two effects largely balance each other out: regression to the mean causes high performers to underestimate their performance, but to some extent that underestimation is offset by the self-enhancement bias. As a result, it looks as though high performers make more accurate judgments than low performers, when in reality the high performers are just lucky to be where they are in the distribution.

3. The instrumental role of task difficulty. Consistent with the notion that the Dunning-Kruger effect is at least partly a statistical artifact, some studies have shown that the asymmetry reported by Kruger and Dunning (i.e., the smaller discrepancy for high performers than for low performers) actually goes away, and even reverses, when the ability tests given to participants are very difficult. For instance, Burson and colleagues (2006), writing in JPSP, showed that when University of Chicago undergraduates were asked moderately difficult trivia questions about their university, the subjects who performed best were just as poorly calibrated as the people who performed worst, in the sense that their estimates of how well they did relative to other people were wildly inaccurate. Here’s what that looks like:

Notice that this finding wasn’t anomalous with respect to the Kruger and Dunning findings; when participants were given easier trivia (the diamond-studded line), Burson et al observed the standard pattern, with poor performers seemingly showing worse calibration. Simply knocking about 10% off the accuracy rate on the trivia questions was enough to induce a large shift in the relative mismatch between perceptions of ability and actual ability. Burson et al then went on to replicate this pattern in two additional studies involving a number of different judgments and tasks, so this result isn’t specific to trivia questions. In fact, in the later studies, Burson et al showed that when the task was really difficult, poor performers were actually considerably better calibrated than high performers.

Looking at the figure above, it’s not hard to see why this would be. Since the slope of the line tends to be pretty constant in these types of experiments, any change in mean performance levels (i.e., a shift in intercept on the y-axis) will necessarily result in a larger difference between actual and perceived performance at the high end. Conversely, if you raise the line, you maximize the difference between actual and perceived performance at the lower end.

To get an intuitive sense of what’s happening here, just think of it this way: if you’re performing a very difficult task, you’re probably going to find the experience subjectively demanding even if you’re at the high end relative to other people. Since people’s judgments about their own relative standing depends to a substantial extent on their subjective perception of their own performance (i.e., you use your sense of how easy a task was as a proxy of how good you must be at it), high performers are going to end up systematically underestimating how well they did. When a task is difficult, most people assume they must have done relatively poorly compared to other people. Conversely, when a task is relatively easy (and the tasks Dunning and Kruger studied were on the easier side), most people assume they must be pretty good compared to others. As a result, it’s going to look like the people who perform well are well-calibrated when the task is easy and poorly-calibrated when the task is difficult; less competent people are going to show exactly the opposite pattern. And note that this doesn’t require us to assume any relationship between actual performance and perceived performance. You would expect to get the Dunning-Kruger effect for easy tasks even if there was exactly zero correlation between how good people actually are at something and how good they think they are.

Here’s how Burson et al summarized their findings:

Our studies replicate, eliminate, or reverse the association between task performance and judgment accuracy reported by Kruger and Dunning (1999) as a function of task difficulty. On easy tasks, where there is a positive bias, the best performers are also the most accurate in estimating their standing, but on difficult tasks, where there is a negative bias, the worst performers are the most accurate. This pattern is consistent with a combination of noisy estimates and overall bias, with no need to invoke differences in metacognitive abilities. In this  regard, our findings support Krueger and Mueller’s (2002) reinterpretation of Kruger and Dunning’s (1999) findings. An association between task-related skills and metacognitive insight may indeed exist, and later we offer some suggestions for ways to test for it. However, our analyses indicate that the primary drivers of errors in judging relative standing are general inaccuracy and overall biases tied to task difficulty. Thus, it is important to know more about those sources of error in order to better understand and ameliorate them.

What should we conclude from these (and other) studies? I think the jury’s still out to some extent, but at minimum, I think it’s clear that much of the Dunning-Kruger effect reflects either statistical artifact (regression to the mean), or much more general cognitive biases (the tendency to self-enhance and/or to use one’s subjective experience as a guide to one’s standing in relation to others). This doesn’t mean that the meta-cognitive explanation preferred by Dunning, Kruger and colleagues can’t hold in some situations; it very well may be that in some cases, and to some extent, people’s lack of skill is really what prevents them from accurately determining their standing in relation to others. But I think our default position should be to prefer the alternative explanations I’ve discussed above, because they’re (a) simpler, (b) more general (they explain lots of other phenomena), and (c) necessary (frankly, it’d be amazing if regression to the mean didn’t explain at least part of the effect!).

We should also try to be aware of another very powerful cognitive bias whenever we use the Dunning-Kruger effect to explain the people or situations around us–namely, confirmation bias. If you believe that incompetent people don’t know enough to know they’re incompetent, it’s not hard to find anecdotal evidence for that; after all, we all know people who are both arrogant and not very good at what they do. But if you stop to look for it, it’s probably also not hard to find disconfirming evidence. After all, there are clearly plenty of people who are good at what they do, but not nearly as good as they think they are (i.e., they’re above average, and still totally miscalibrated in the positive direction). Just like there are plenty of people who are lousy at what they do and recognize their limitations (e.g., I don’t need to be a great runner in order to be able to tell that I’m not a great runner–I’m perfectly well aware that I have terrible endurance, precisely because I can’t finish runs that most other runners find trivial!). But the plural of anecdote is not data, and the data appear to be equivocal. Next time you’re inclined to chalk your obnoxious co-worker’s delusions of grandeur down to the Dunning-Kruger effect, consider the possibility that your co-worker’s simply a jerk–no meta-cognitive incompetence necessary.

ResearchBlogging.orgKruger J, & Dunning D (1999). Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and social psychology, 77 (6), 1121-34 PMID: 10626367
Krueger J, & Mueller RA (2002). Unskilled, unaware, or both? The better-than-average heuristic and statistical regression predict errors in estimates of own performance. Journal of personality and social psychology, 82 (2), 180-8 PMID: 11831408
Burson KA, Larrick RP, & Klayman J (2006). Skilled or unskilled, but still unaware of it: how perceptions of difficulty drive miscalibration in relative comparisons. Journal of personality and social psychology, 90 (1), 60-77 PMID: 16448310

this year, i backed new zealand to go all the way

Jerry Coyne ponders whether the best football/soccer team generally wins the World Cup. The answer is clearly no: any sporting event where games are settled on the basis of rare events (e.g., only one or two goals per match), and teams only play each other once to determine a winner, is going to be at the mercy of Lady Luck a good deal of the time. If we really wanted the best team to come out on top reliably, we’d probably need teams to play multiple games at every stage of the Cup, which isn’t very practical. Coyne discusses an (old) paper demonstrating that the occurrence of goals during World Cup matches is well fit by a poisson distribution, allowing one to calculate the probability of various unjust outcomes taking place (which turn out to be surprisingly high).

The curious thing, I think, is that it’s not really clear that sporting fans really do want the best team to come out on top. We don’t want outcomes to be determined by a coin toss, of course; it would kind of suck if, say, New Zealand had as much chance of lifting the cup as Brazil did. But it would also be pretty boring if it were a foregone conclusion that Brazil was going to win it all every time around. We want events to make sense, but we don’t want them to be too predictable. I suppose you could tell an interesting prediction error story about this kind of thing–e.g., that maximally engaging stimuli may be ones that seem to occur systematically yet defy easy explanation–but it’s probably more fun to sit around and curse at the television set as the Netherlands make short work of the Samba Kings (I don’t know if anyone actually uses that nickname; I just picked it off Wikipedia to make it look like I know what I’m talking about). Go Oranje!

will trade two Methods sections for twenty-two subjects worth of data

The excellent and ever-candid Candid Engineer in Academia has an interesting post discussing the love-hate relationship many scientists who work in wet labs have with benchwork. She compares two very different perspectives:

She [a current student] then went on to say that, despite wanting to go to grad school, she is pretty sure she doesn’t want to continue in academia beyond the Ph.D. because she just loves doing the science so much and she can’t imagine ever not being at the bench.

Being young and into the benchwork, I remember once asking my grad advisor if he missed doing experiments. His response: “Hell no.” I didn’t understand it at the time, but now I do. So I wonder if my student will always feel the way she does now- possessing of that unbridled passion for the pipet, that unquenchable thirst for the cell culture hood.

Wet labs are pretty much nonexistent in psychology–I’ve never had to put on gloves or goggles to do anything that I’d consider an “experiment”, and I’ve certainly never run the risk of  spilling dangerous chemicals all over myself–so I have no opinion at all about benchwork. Maybe I’d love it, maybe I’d hate it; I couldn’t tell you. But Candid Engineer’s post did get me thinking about opinions surrounding the psychological equivalent of benchwork–namely, collecting data form human subjects. My sense is that there’s somewhat more consensus among psychologists, in that most of us don’t seem to like data collection very much. But there are plenty of exceptions, and there certainly are strong feelings on both sides.

More generally, I’m perpetually amazed at the wide range of opinions people can hold about the various elements of scientific research, even when the people doing the different-opinion-holding all work in very similar domains. For instance, my favorite aspect of the research I do, hands down, is data analysis. I’d be ecstatic if I could analyze data all day and never have to worry about actually communicating the results to anyone (though I enjoy doing that too). After that, there are activities like writing and software development, which I spend a lot of time doing, and occasionally enjoy, but also frequently find very frustrating. And then, at the other end, there are aspects of research that I find have little redeeming value save for their instrumental value in supporting other, more pleasant, activities–nasty, evil activities like writing IRB proposals and, yes, collecting data.

To me, collecting data is something you do because you’re fundamentally interested in some deep (or maybe not so deep) question about how the mind works, and the only way to get an answer is to actually interrogate people while they do stuff in a controlled environment. It isn’t something I do for fun. Yet I know people who genuinely seem to love collecting data–or, for that matter, writing Methods sections or designing new experiments–even as they loathe perfectly pleasant activities like, say, sitting down to analyze the data they’ve collected, or writing a few lines of code that could save them hours’ worth of manual data entry. On a personal level, I find this almost incomprehensible: how could anyone possibly enjoy collecting data more than actually crunching the numbers and learning new things? But I know these people exist, because I’ve talked to them. And I recognize that, from their perspective, I’m the guy with the strange views. They’re sitting there thinking: what kind of joker actually likes to turn his data inside out several dozen times? What’s wrong with just running a simple t-test and writing up the results as fast as possible, so you can get back to the pleasure of designing and running new experiments?

This of course leads us directly to the care bears fucking tea party moment where I tell you how wonderful it is that we all have these different likes and dislikes. I’m not being sarcastic; it really is great. Ultimately, it works to everyone’s advantage that we enjoy different things, because it means we get to collaborate on projects and take advantage of complementary strengths and interests, instead of all having to fight over who gets to write the same part of the Methods section. It’s good that there are some people who love benchwork and some people who hate it, and it’s good that there are people who’re happy to write software that other people who hate writing software can use. We don’t all have to pretend we understand each other; it’s enough just to nod and smile and say “but of course you can write the Methods for that paper; I really don’t mind. And yes, I guess I can run some additional analyses for you, really, it’s not too much trouble at all.”