Tag Archives: data

deconstructing the turducken

This is fiction. Which means it’s entirely made up, and definitely not at all based on any real people or events.

 

Cornelius Kipling came over to our house for Thanksgiving. I didn’t invite him; I would never, ever invite him. He was guaranteed to show up slightly drunk and very belligerent, carrying a two-thirds empty bottle of cheap wine, which he’d then hand to us as if it had arrived unopened from some fancy French cellar.

Cornelius Kiping was never invited; he invited himself.

“Good to see you,” he said to me when we let him in. “Thanks for inviting me over. It’s very kind of you, seeing as how my other plans fell through at the last minute.”

“Hi Kip,” I said, knowing full well he’d never had any other plans.

“Ella,” Kip nodded in my wife’s general direction, taking care not to make direct eye contact. He’d learned from extended experience that once he made eye contact with people, it became much harder to ignore social cues.

“Cornelius,” she said, through a mouth as thin as a zipper.

“Just Kip is fine,” said Kip.

“Cornelius,” my wife repeated, louder this time.

“What are we having for dinner,” Kip asked, handing me a two-thirds empty  bottle of Zinfandel.

“Well,” said Ella, “I was going to make a turducken. But now that you’re here, I figure I should make something special. So we’re having frozen chicken nuggets and mashed potatoes.”

“We spare no expense!” I added cheerfully.

“Funny you should mention turducken,” Kip said, ignoring our jabs. “My new business plan is based on the turducken.”

“Oh really,” I said. “Do pray tell.”

I wasn’t surprised Kip had a new business plan. If anything, I was surprised he’d managed to get as far as exchanging pleasantries before launching into a graphic description of his latest scheme.

“Well,” he said, “it’s not really based on the turducken. The turducken is more of an analogy. To illustrate what it is that my new startup does.”

“And what is it that your new startup does,” Ella’s mouth asked, though the rest of her face very clearly did not care to hear the answer.

“We miniaturize data,” Kip said. He waved his hands in the air with a flourish and looked at us expectantly. It made me think back to something my wife had said about Kip after the first time she ever met him: He thinks he’s a magician, and he acts like he’s a magician, but none of his tricks ever work.

“Prithee, do continue,” I said.

“We take big datasets,” he said. “Large datasets. Enormous datasets. Doesn’t matter what kind of data. You give it to us, and we miniaturize it. We give you back a much smaller dataset. And then you carry on your work with your wonderfully shrunken new spreadsheet, which keeps only the important trends and throws out all of the unnecessary details.”

“Interesting,” I nodded. On a scale of one-to-Kipsanity, this one was a solid five. “And the turducken figures into this how?”

“Weeeeeell, imagine someone hands you a turducken and asks you to figure out what’s in it,” said Kip. “I grant that this may not happen to you very often, but it happens all the time in KipLand. So, you know there’s a bunch of birds in there, all stuffed into each other’s–well, you know–but you don’t know which birds. All you see is this giant deep-fried bird collage, and you want to disassemble it into a set of discrete, identifiable fowls. Now, you hear a lot about how to construct a turducken. But if you think about it, deconstructing a turducken is a much more interesting engineering problem. And that’s what my new venture is all about. We take a complicated mass of data and pick out all the key elements that went into it. Deconstructing the turducken.”

He did the little flourish with his hands again. Again, Ella’s words rang out in my head. None of his tricks ever work.

“That’s quite possibly the craziest thing I’ve ever heard,” I observed. “This whole turducken analogy isn’t working so well for me. I hope you haven’t put it in your promotional materials.”

Kip stared at me unpleasantly for a good ten or fifteen seconds.

“Actually, I take that back,” I said. “That conversation we had about the shinbones on Isaac Newton’s coat of arms that time I ran into you at the dry cleaner’s… that was an order of magnitude more ridiculous.”

Maybe it was a mean thing to say, but you have to understand: my friendship with Kip is built entirely on mutual abuse. And he who flinches first, loses.

“Whatever,” Kip said. He looked annoyed, which filled me with schadenfreude. It wasn’t often he got to experience the full range of emotions he routinely visited on others.

“I didn’t come here to talk about turducken,” he continued. “You brought up the turducken, not me. I just wanted to get your opinion on something…”

Again the hand flourish. Again the voice.

“I’m trying to figure out what to call my new startup,” he said. “Which do you like better: ‘Small data’ or ‘little data’? Neither has the ring of ‘big data‘, but I think both sound better than ‘Kipling Data Miniaturization Services’.”

“How about MiniData,” Ella offered. I noticed she was hitting the wine pretty hard, though we both knew it would do nothing to blunt the Kipling trauma.

“Or maybe NanoData,” I offered. “If you can make the data small enough. What level of compression are you aiming for?”

“Oh, sky’s the limit. Actually, that’s one of the unique features of my service. Most compression schemes have a fixed limit. Take a standard algorithm like bzip2. You compress text, you might get a file 10% of the size if you’re lucky. But binary data? You’ll be lucky if you shrink it by a factor of three. Now, with my NanoData compression service, you as the customer get to choose how much or how little you want. And you select the output format. You can hand me a terabyte of data and say, ‘Dr. Kipling, sir, I want you to distill this eight-dimensional MATLAB array down to a single Excel spreadsheet, no more than 10 rows by 10 columns.’ And that’s exactly what you’ll get.”

“And this miraculously distilled dataset that you give me… will it, by chance, have any passing resemblance to the original dataset I gave you?”

“Oh, sure, if you want it to,” said Kip. “But the fidelity service costs double.”

I resisted the overpowering urge to facepalm.

“Well, it’s certainly not the worst idea you’ve ever had,” I said diplomatically. “But I have to say, I’m amazed you keep launching new startups. A lesser man would have given up ten or twelve bankruptcies ago.”

“I guess I just have an uncanny sense for ideas ten years ahead of their time,” Kip smiled.

“Ten years ahead of anyone’s time,” Ella muttered.

“Right,” I said. “You’re a visionary. You have… the visions. Hey, what happened to that deli you were going to open? The one that was going to sell premium hay sandwiches? I thought that one was going to make it for sure.”

“Terrible shame. Turns out it’s very difficult to get sandwich-grade hay in Colorado. So, you know, it didn’t pan out. Very sad; I even had a name picked out: Hay Day Sandwiches. Get it?”

I didn’t really get it, but still nodded in mock sympathy.

“Anyway, since you brought up my new startup,” Kip said, oblivious to the death rays radiating towards him from Ella’s head, “let me take this opportunity to give the both of you the opportunity of your lifetime. I like you guys, so I’m going to cut you in as my very first angel investors. All I’m asking…”

And here he paused, looking at us. I knew what he was doing; he was trying to gauge our level of displeasure with him so he could pick a number that was sufficiently high, but not completely ridiculous.

“…is fifteen thousand,” he finished “You get 5% of equity, and I’ll even throw in some nice swag. I’m having mugs and frisbees printed up as we speak.”

Around this time, Ella put her head down on her arms; she may or may not have been softly sobbing, I couldn’t really tell.

“That’s quite an offer, Kip,” I said. “And I’m really glad you like me enough to make it. It’s not like I’ve ever bought into your ideas before, but then, the thing I like best about you is how you never take repeated failure for an answer. Unfortunately, I just don’t have fifteen thousand right now. I just spent my last fifteen thousand souping up an old John Deer lawnmower so I can drive around the bike path blaring Ridin’ Dirty from three hundred watt speakers while glowing pink neon lights presage my arrival by five hundred feet. You should see it, it’s beautiful. But I swear, if I hadn’t done that, I’d be ready to sign on the dotted line right now.”

“That’s quite alright,” Kip said. “No harm, no foul. Your loss, my gain. It’s probably crazy of me to give up that much equity for so little anyway; this idea is going to make millions. No. Billions.”

He paused just long enough for some of the delusion to drip off; then I watched in real time as yet another unwise idea corkscrewed through his ear and crawled into his brain.

“Hey,” he said. “I’ve never thought of pimping out a John Deer lawnmower, but that’s a pretty good idea too. You sound like you have some experience with this now; want to go fifty-fifty on a startup? I’ll provide the salesmanship and take advantage of my many business contacts. You provide the technical knowledge. Ella, you can get in on this too; we’ll throw in a free turducken with every purchase.”

This time I definitely heard my wife sobbing, and just like that, it was time for Cornelius Kipling to leave.

see me flub my powerpoint slides on NIF tv!

 

UPDATE: the webcast is now archived here for posterity.

This is kind of late notice and probably of interest to few people, but I’m giving the NIF webinar tomorrow (or today, depending on your time zone–either way, we’re talking about November 1st). I’ll be talking about Neurosynth, and focusing in particular on the methods and data, since that’s what NIF (which stands for Neuroscience Information Framework) is all about. Assuming all goes well, the webinar should start at 11 am PST. But since I haven’t done a webcast of any kind before, and have a surprising knack for breaking audiovisual equipment at a distance, all may not go well. Which I suppose could make for a more interesting presentation. In any case, here’s the abstract:

The explosive growth of the human neuroimaging literature has led to major advances in understanding of human brain function, but has also made aggregation and synthesis of neuroimaging findings increasingly difficult. In this webinar, I will describe a highly automated brain mapping framework called NeuroSynth that uses text mining, meta-analysis and machine learning techniques to generate a large database of mappings between neural and cognitive states. The NeuroSynth framework can be used to automatically conduct large-scale, high-quality neuroimaging meta-analyses, address long-standing inferential problems in the neuroimaging literature (e.g., how to infer cognitive states from distributed activity patterns), and support accurate ‘decoding’ of broad cognitive states from brain activity in both entire studies and individual human subjects. This webinar will focus on (a) the methods used to extract the data, (b) the structure of the resulting (publicly available) datasets, and (c) some major limitations of the current implementation. If time allows, I’ll also provide a walk-through of the associated web interface (http://neurosynth.org) and will provide concrete examples of some potential applications of the framework.

There’s some more info (including details about how to connect, which might be important) here. And now I’m off to prepare my slides. And script some evasive and totally non-committal answers to deploy in case of difficult questions from the peanut gallery respected audience.

elsewhere on the net, vacation edition

I’m hanging out in Boston for a few days, so blogging will probably be sporadic or nonexistent. Which is to say, you probably won’t notice any difference.

The last post on the Dunning-Kruger effect somehow managed to rack up 10,000 hits in 48 hours; but that was last week. Today I looked at my stats again, and the blog is back to a more normal 300 hits, so I feel like it’s safe to blog again. Here are some neat (and totally unrelated) links from the past week:

  • OKCupid has another one of those nifty posts showing off all the cool things they can learn from their gigantic userbase (who else gets to say things like “this analysis includes 1.51 million users’ data”???). Apparently, tall people (claim to) have more sex, attractive photos are more likely to be out of date, and most people who claim to be bisexual aren’t really bisexual.
  • After a few months off, my department-mate Chris Chatham is posting furiously again over at Developing Intelligence, with a series of excellent posts reviewing recent work on cognitive control and the perils of fMRI research. I’m not really sure what Chris spent his blogging break doing, but given the frequency with which he’s been posting lately, my suspicion is that he spent it secretly writing blog posts.
  • Mark Liberman points out a fundamental inconsistency in the way we view attributions of authorship: we get appropriately angry at academics who pass someone else’s work off as their own, but think it’s just fine for politicians to pay speechwriters to write for them. It’s an interesting question, and leads to an intimately related, and even more important question–namely, will anyone get mad at me if I pay someone else to write a blog post for me about someone else’s blog post discussing people getting angry at people paying or not paying other people to write material for other people that they do or don’t own the copyright on?
  • I like oohing and aahing over large datasets, and the Guardian’s Data Blog provides a nice interface to some of the most ooh- and aah-able datasets out there. [via R-Chart]
  • Ed Yong has a characteristically excellent write-up about recent work on the magnetic vision of birds. Yong also does link dump posts better than anyone else, so you should probably stop reading this one right now and read his instead.
  • You’ve probably heard about this already, but some time last week, the brain trust at ScienceBlogs made the amazingly clever decision to throw away their integrity by selling PepsiCo its very own “science” blog. Predictably, a lot of the bloggers weren’t happy with the decision, and many have now moved onto greener pastures; Carl Zimmer’s keeping score. Personally, I don’t have anything intelligent to add to everything that’s already been said; I’m literally dumbfounded.
  • Andrew Gelman takes apart an obnoxious letter from pollster John Zogby to Nate Silver of fivethirtyeight.com. I guess now we know that Zogby didn’t get where he is by not being an ass to other people.
  • Vaughan Bell of Mind Hacks points out that neuroplasticity isn’t a new concept, and was discussed seriously in the literature as far back as the 1800s. Apparently our collective views about the malleability of mind are not, themselves, very plastic.
  • NPR ran a three-part story by Barbara Bradley Hagerty on the emerging and somewhat uneasy relationship between neuroscience and the law. The articles are pretty good, but much better, in my opinion, was the Talk of the Nation episode that featured Hagerty as a guest alongside Joshua Greene, Kent Kiehl, and Stephen Morse–people who’ve all contributed in various ways to the emerging discipline of NeuroLaw. It’s a really interesting set of interviews and discussions. For what it’s worth, I think I agree with just about everything Greene has to say about these issues–except that he says things much more eloquently than I think them.
  • Okay, this one’s totally frivolous, but does anyone want to buy me one of these things? I don’t even like dried food; I just think it would be fun to stick random things in there and watch them come out pale, dried husks of their former selves. Is it morbid to enjoy watching the life slowly being sucked out of apples and mushrooms?

elsewhere on the net

Some neat links from the past few weeks:

  • You Are No So Smart: A celebration of self-delusion. An excellent blog by journalist David McCraney that deconstructs common myths about the way the mind works.
  • NPR has a great story by Jon Hamilton about the famous saga of Einstein’s brain and what it’s helped teach us about brain function. [via Carl Zimmer]
  • The Neuroskeptic has a characteristically excellent 1,000 word explanation of how fMRI works.
  • David Rock has an interesting post on some recent work from Baumeister’s group purportedly showing that it’s good to believe in free will (whether or not it exists). My own feeling about this is that Baumeister’s not really studying people’s philosophical views about free will, but rather a construct closely related to self-efficacy and locus of control. But it’s certainly an interesting line of research.
  • The Prodigal Academic is a great new blog about all things academic. I’ve found it particularly interesting since several of the posts so far have been about job searches and job-seeking–something I’ll be experiencing my fill of over the next few months.
  • Prof-like Substance has a great 5-part series (1, 2, 3, 4, 5) on how blogging helps him as an academic. My own (much less eloquent) thoughts on that are here.
  • Cameron Neylon makes a nice case for the development of social webs for data mining.
  • Speaking of data mining, Michael Driscoll of Dataspora has an interesting pair of posts extolling the virtues of Big Data.
  • And just to balance things out, there’s this article in the New York Times by John Allen Paulos that offers some cautionary words about the challenges of using empirical data to support policy decisions.
  • On a totally science-less note, some nifty drawings (or is that photos?) by Ben Heine (via Crooked Brains):

in defense of three of my favorite sayings

Seth Roberts takes issue with three popular maxims that (he argues) people use “to push away data that contradicts this or that approved view of the world”. He terms this preventive stupidity. I’m a frequent user of all three sayings, so I suppose that might make me preventively stupid; but I do feel like I have good reasons for using these sayings, and I confess to not really seeing Roberts’ point.

Here’s what Roberts has to say about the three sayings in question:

1. Absence of evidence is not evidence of absence. Øyhus explains why this is wrong. That such an Orwellian saying is popular in discussions of data suggests there are many ways we push away inconvenient data.

In my own experience, by far the biggest reason this saying is popular in discussions of data (and the primary reason I use it when reviewing papers) is that many people have a very strong tendency to interpret null results as an absence of any meaningful effect. That’s a very big problem, because the majority of studies in psychology tend to have relatively little power to detect small to moderate-sized effects. For instance, as I’ve discussed here, most whole-brain analyses in typical fMRI samples (of say, 15 – 20 subjects) have very little power to detect anything but massive effects. And yet people routinely interpret a failure to detect hypothesized effects as an indication that they must not exist at all. The simplest and most direct counter to this type of mistake is to note that one shouldn’t accept the null hypothesis unless one has very good reasons to think that power is very high and effect size estimates are consequently quite accurate. Which is just another way of saying that absence of evidence is not evidence of absence.

2. Correlation does not equal causation. In practice, this is used to mean that correlation is not evidence for causation. At UC Berkeley, a job candidate for a faculty position in psychology said this to me. I said, “Isn’t zero correlation evidence against causation?” She looked puzzled.

Again, Roberts’ experience clearly differs from mine; I’ve far more often seen this saying used as a way of suggesting that a researcher may be drawing overly strong causal conclusions from the data, not as a way of simply dismissing a correlation outright. A good example of this is found in the developmental literature, where many researchers have observed strong correlations between parents’ behavior and their children’s subsequent behavior. It is, of course, quite plausible to suppose that parenting behavior exerts a direct causal influence on children’s behavior, so that the children of negligent or abusive parents are more likely to exhibit delinquent behavior and grow up to perpetuate the “cycle of violence”. But this line of reasoning is substantially weakened by behavioral genetic studies indicating that very little of the correlation between parents’ and children’s personalities is explained by shared environmental factors, and that the vast majority reflects heritable influences and/or unique environmental influences. Given such findings, it’s a perfectly appropriate rebuttal to much of the developmental literature to note that correlation doesn’t imply causation.

It’s also worth pointing out that the anecdote Roberts provides isn’t exactly a refutation of the maxim; it’s actually an affirmation of the consequent. The fact that an absence of any correlation could potentially be strong evidence against causation (under the right circumstances) doesn’t mean that the presence of a correlation is strong evidence for causation. It may or may not be, but that’s something to be weighed on a case-by-case basis. There certainly are plenty of cases where it’s perfectly appropriate (and even called for) to remind someone that correlation doesn’t imply causation.

3. The plural of anecdote is not data. How dare you try to learn from stories you are told or what you yourself observe!

I suspect this is something of a sore spot for Roberts, who’s been an avid proponent of self-experimentation and case studies. I imagine people often dismiss his work as mere anecdote rather than valuable data. Personally, I happen to think there’s tremendous value to self-experimentation (at least when done in as controlled a manner as possible), so I don’t doubt there are many cases where this saying is unfairly applied. That said, I think Roberts fails to appreciate that people who do his kind of research constitute a tiny fraction of the population. Most of the time, when someone says that “the plural of anecdote is not data,” they’re not talking to someone who does rigorous self-experimentation, but to people who, say, don’t believe they should give up smoking seeing as how their grandmother smoked till she was 88 and died in a bungee-jumping accident, or who are convinced that texting while driving is perfectly acceptable because they don’t personally know anyone who’s gotten in an accident. In such cases, it’s not only legitimate but arguably desirable to point out that personal anecdote is no substitute for hard data.

Orwell was right. People use these sayings — especially #1 and #3 — to push away data that contradicts this or that approved view of the world. Without any data at all, the world would be simpler: We would simply believe what authorities tell us. Data complicates things. These sayings help those who say them ignore data, thus restoring comforting certainty.

Maybe there should be a term (antiscientific method?) to describe the many ways people push away data. Or maybe preventive stupidity will do.

I’d like to be charitable here, since there very clearly are cases where Roberts’ point holds true: sometimes people do toss out these sayings as a way of not really contending with data they don’t like. But frankly, the general claim that these sayings are antiscientific and constitute an act of stupidity just seems silly. All three sayings are clearly applicable in a large number of situations; to deny that, you’d have to believe that (a) it’s always fine to accept the null hypothesis, (b) correlation is always a good indicator of a causal relationship, and (c) personal anecdotes are just as good as large, well-controlled studies. I take it that no one, including Roberts, really believes that. So then it becomes a matter of when to apply these sayings, and not whether or not to use them. After all, it’d be silly to think that the people who use these sayings are always on the side of darkness, and the people who wield null results, correlations, and anecdotes with reckless abandon are always on the side of light.

My own experience, for what it’s worth, is that the use of these sayings is justified far more often than not, and I don’t have any reservation applying them myself when I think they’re warranted (which is relatively often–particularly the first one). But I grant that that’s just my own personal experience talking, and no matter how many experiences I’ve had of people using these sayings appropriately, I’m well aware that the plural of anecdote…

in brief…

Some neat stuff from the past week or so:

  • If you’ve ever wondered how to go about getting a commentary on an article published in a peer-reviewed journal, wonder no longer… you can’t. Or rather, you can, but it may not be worth your trouble. Rick Trebino explains. [new to me via A.C. Thomas, though apparently this one's been around for a while.]
  • The data-driven life: A great article in the NYT magazine discusses the growing number of people who’re quantitatively recording the details of every aspect of their lives, from mood to glucose levels to movement patterns. I dabbled with this a few years ago, recording my mood, diet, and exercise levels for about 6 months. I’m not sure how much I learned that was actually useful, but if nothing else, it’s a fun exercise to play aroundwith a giant matrix of correlations that are all about YOU.
  • Cameron Neylon has an excellent post up defending the viability (and superiority) of the author-pays model of publication.
  • In typical fashion, Carl Zimmer has a wonderful blog up post explaining why tapeworms in Madagascar tell us something important about human evolution.
  • The World Bank, as you might expect, has accumulated a lot of economic data. For years, they’ve been selling it at a premium, but as of 2010 the World Development Indicators are completely free to access. via [via Flowing Data]
  • Every tried Jew’s Ear Juice? No? In China, you can–but not for long, if the government has its way. The NYT reports on efforts to eradicate Chinglish in public. Money quote:

“The purpose of signage is to be useful, not to be amusing,” said Zhao Huimin, the former Chinese ambassador to the United States who, as director general of the capital’s Foreign Affairs Office, has been leading the fight for linguistic standardization and sobriety.

fMRI becomes big, big science

There are probably lots of criteria you could use to determine the relative importance of different scientific disciplines, but the one I like best is the Largest Number of Authors on a Paper. Physicists have long had their hundred-authored papers (see for example this individual here; be sure to click on the “show all authors/affiliations” link), and with the initial sequencing and analysis of the human genome, which involved contributions from 452 different persons, molecular geneticists also joined the ranks of Officially Big Science. Meanwhile, us cognitive neuroscientists have long had to content ourselves with silly little papers that have only four to seven authors (maybe a dozen on a really good day). Which means, despite the pretty pictures we get to put in our papers, we’ve long had this inferiority complex about our work, and a nagging suspicion that it doesn’t really qualify as big science (full disclosure: so when I say “we”, I probably just mean “I”).

UNTIL NOW.

Thanks to the efforts of Bharat Biswal and 53 collaborators (yes, I counted) reported in a recent paper in PNAS, fMRI is now officially Big, Big Science. Granted, 54 authors is still small potatoes in physics-and-biology-land. And for all I know, there could be other fMRI papers with even larger author lists out there that I’ve missed.  BUT THAT’S NOT THE POINT. The point is, people like me now get to run around and say we do something important.

You might think I’m being insincere here, and that I’m really poking fun at ridiculously long author lists that couldn’t possibly reflect meaningful contributions from that many people. Well, I’m not. While I’m not seriously suggesting that the mark of good science is how many authors are on the paper, I really do think that the prevalence of long author lists in a discipline are an important sign of a discipline’s maturity, and that the fact that you can get several dozen contributors to a single paper means you’re seeing a level of collaboration across different labs that previously didn’t exist.

The importance of large-scale collaboration is one of the central elements of the new PNAS article, which is appropriately entitled Toward discovery science of human brain function. What Biswal et al have done is compile the largest publicly-accessible fMRI dataset on the planet, consisting of over 1,400 scans from 35 different centers. All of the data, along with some tools for analysis, are freely available for download from NITRC. Be warned though: you’re probably going to need a couple of terabytes of free space if you want to download the entire dataset.

You might be wondering why no one’s assembled an fMRI dataset of this scope until now; after all, fMRI isn’t that new a technique, having been around for about 20 years now. The answer (or at least, one answer) is that it’s not so easy–and often flatly impossible–to combine raw fMRI datasets in any straightforward way. The problem is that the results of any given fMRI study only really make sense in the context of a particular experimental design. Functional MRI typically measures the change in signal associated with some particular task, which means that you can’t really go about combining the results of studies of phonological processing with those of thermal pain and obtain anything meaningful (actually, this isn’t entirely true; there’s a movement afoot to create image-based centralized databases that will afford meta-analyses on an even more massive scale,  but that’s a post for another time). You need to ensure that the tasks people performed across different sites are at least roughly in the same ballpark.

What allowed Biswal et al  to consolidate datasets to such a degree is that they focused exclusively on one particular kind of cognitive task. Or rather, they focused on a non-task: all 1400+ scans in the 1000 Functional Connectomes Project (as they’re calling it) are from participants being scanned during the “resting state”. The resting state is just what it sounds like: participants are scanned while they’re just resting; usually they’re given no specific instructions other than to lie still, relax, and not fall asleep. The typical finding is that, when you contrast this resting state with activation during virtually any kind of goal-directed processing, you get widespread activation increases in a network that’s come to be referred to as the “default” or “task-negative” network (in reference to the fact that it’s maximally active when people are in their “default” state).

One of the main (and increasingly important) applications of resting state fMRI data is in functional connectivity analyses, which aim to identify patterns of coactivation across different regions rather than mean-level changes associated with some task. The fundamental idea is that you can get a lot of traction on how the brain operates by studying how different brain regions interact with one another spontaneously over time, without having to impose an external task set. The newly released data is ideal for this kind of exploration, since you have a simply massive dataset that includes participants from all over the world scanned in a range of different settings using different scanners. So if you want to explore the functional architecture of the human brain during the resting state, this should really be your one-stop shop. (In fact, I’m tempted to say that there’s going to be much less incentive for people to collect resting-state data from now on, since there really isn’t much you’re going to learn from one sample of 20 – 30 people that you can’t learn from 1,400 people from 35+ combined samples).

Aside from introducing the dataset to the literature, Biswal et al also report a number of new findings. One neat finding is that functional parcellation of the brain using seed-based connectivity (i.e., identifying brain regions that coactivate with a particular “seed” or target region) shows marked consistency across different sites, revealing what Biswal et al call a “universal architecture”. This type of approach by itself isn’t particularly novel, as similar techniques have been used before. Bt no one’s done it on anything approaching this scale. Here’s what the results look like:

You can see that different seeds produce difference functional parcellations across the brain (the brighter areas denote ostensive boundaries).

Another interesting finding is the presence of gender and age differences in functional connectivity:

What this image shows is differences in functional connectivity with specific seed regions (the black dots) as a function of age (left) or gender (right). (The three rows reflect different techniques for producing the maps, with the upshot being that the results are very similar regardless of exactly how you do the analysis.) It isn’t often you get to see scatterplots with 1,400+ points in cognitive neuroscience, so this is a welcome sight. Although it’s also worth pointing out the inevitable downside of having huge sample sizes, which is that even tiny effects attain statistical significance. Which is to say, while the above findings are undoubtedly more representative of gender and age differences in functional connectivity than anything else you’re going to see for a long time, notice that they’re they’re very small effects (e.g., in the right panels, you can see that the differences between men and women are only a fraction of a standard deviation in size, despite the fact that these regions are probably selected because they show some of the “strongest” effects). That’s not meant as a criticism; it’s actually a very good thing, in that these modest effects are probably much closer to the truth than what previous studies have reported. Such findings should serve as an important reminder that most of the effects identified by fMRI studies are almost certainly massively inflated by small sample size (as I’ve discussed before here and in this paper).

Anyway, the bottom line is that if you’ve ever thought to yourself, “gee, I wish I could do cutting-edge fMRI research, but I really don’t want to leave my house to get a PhD; it’s almost lunchtime,” this is your big chance. You can download the data, rejoice in the magic that is the resting state, and bathe yourself freely in functional connectivity. The Biswal et al paper bills itself as “a watershed event in functional imaging,” and it’s hard to argue otherwise. Researchers now have a definitive data set to use for analyses of functional connectivity and the resting state, as well as a model for what other similar data sets might look like in the future.

More importantly, with 54 authors on the paper, fMRI is now officially big science. Prepare to suck it, Human Genome Project!

ResearchBlogging.orgBiswal, B., Mennes, M., Zuo, X., Gohel, S., Kelly, C., Smith, S., Beckmann, C., Adelstein, J., Buckner, R., Colcombe, S., Dogonowski, A., Ernst, M., Fair, D., Hampson, M., Hoptman, M., Hyde, J., Kiviniemi, V., Kotter, R., Li, S., Lin, C., Lowe, M., Mackay, C., Madden, D., Madsen, K., Margulies, D., Mayberg, H., McMahon, K., Monk, C., Mostofsky, S., Nagel, B., Pekar, J., Peltier, S., Petersen, S., Riedl, V., Rombouts, S., Rypma, B., Schlaggar, B., Schmidt, S., Seidler, R., Siegle, G., Sorg, C., Teng, G., Veijola, J., Villringer, A., Walter, M., Wang, L., Weng, X., Whitfield-Gabrieli, S., Williamson, P., Windischberger, C., Zang, Y., Zhang, H., Castellanos, F., & Milham, M. (2010). Toward discovery science of human brain function Proceedings of the National Academy of Sciences, 107 (10), 4734-4739 DOI: 10.1073/pnas.0911855107

the OKCupid guide to dating older women

Continuing along on their guided tour of Data I Wish I Had Access To, the OKCupid folks have posted another set of interesting figures on their blog. This time, they make the case for dating older women, suggesting that men might get more bang for their buck (in a literal sense, I suppose) by trying to contact women their age or older, rather than trying to hit on the young ‘uns. Men, it turns out, are creepy. Here’s how creepy:

Actually, that’s not so creepy. All it says is that men say they prefer to date younger women. That’s not going to shock anyone. This one is creepier:

The reason it’s creepy is that it basically says that, irrespective of what age ranges men say they find acceptable in a potential match, they’re actually all indiscriminately messaging 18-year old women. So basically, if you’re a woman on OKCupid who’s searching for that one special, non-creepy guy, be warned: they don’t exist. They’re pretty much all going to be eying 18-year olds for the rest of their lives. (To be fair, women also show a tendency to contact men below their lowest reported acceptable age. But it’s a much weaker effect; 40-year old women only occasionally try to hit on 24-year old guys, and tend to stay the hell away from the not-yet-of-drinking-age male population.)

Anyway, using this type of data, the OKCupid folks then generate this figure:

…which also will probably surprise no one, as it basically says women are most desirable when they’re young, and men when they’re (somewhat) older. But what the OKCupid folks then suggest is that it would be to men’s great advantage to broaden their horizons, because older women (which, in their range-restricted population, basically means anything over 30) self-report being much more interested in having sex more often, having casual sex, and using protection. I won’t bother hotlinking to all of those images, but here’s where they’re ultimately going with this:

I’m not going to comment on the appropriateness of trying to nudge one’s male userbase in the direction of more readily available casual sex (though I suspect they don’t need much nudging anyway). What I do wonder is to what extent these results reflect selection effects rather than a genuine age difference. The OKCupid folks suggest that women’s sexual interest increases as they age, which seems plausible given the conventional wisdom that women peak sexually in their 30s. But the effects in this case look pretty huge (unless the color scheme is misleading, which it might be; you’ll have to check out the post for the neat interactive flash animations), and it seems pretty plausible that much of the age effect could be driven by selection bias. Women with a more monogamous orientation are probably much more likely to be in committed, stable relationships by the time they turn 30 or 35, and probably aren’t scanning OKCupid for potential mates. Women who are in their 30s and 40s and still using online dating services are probably those who weren’t as interested in monogamous relationships to begin with. (Of course, the same is probably true of older men. Except that since men of all ages appear to be pretty interested in casual sex, there’s unlikely to be an obvious age differential.)

The other thing I’m not clear on is whether these analyses control for the fact that the userbase is heavily skewed toward younger users:

The people behind OKCupid are all mathematicians by training, so I’d be surprised if they hadn’t taken the underlying age distribution into consideration. But they don’t say anything about it in their post. The worry is that, if the base rate of different age groups isn’t taken into consideration, the heat map displayed above could be quite misleading. Given that there are many, many more 25-year old women on OKCupid than 35-year old women, failing to normalize properly would almost invariably make it look like there’s a heavy skew for men to message relatively younger women, irrespective of the male sender’s age. By the same token, it’s not clear that it’d be good advice to tell men to seek out older women, given that there are many fewer older women in the pool to begin with. As a thought experiment, suppose that the entire OKCupid male population suddenly started messaging women 5 years older than them, and entirely ignored their usual younger targets. The hit rate wouldn’t go up; it would probably actually fall precipitously, since there wouldn’t be enough older women to keep all the younger men entertained (at least, I certainly hope there wouldn’t). No doubt there’s a stable equilibrium point somewhere, where men and women are each targeting exactly the right age range to maximize their respective chances. I’m just not sure that it’s in OKCupid’s proposed “zone of greatness” for the men.

It’s also a bit surprising that OKCupid didn’t break down the response rate to people of the opposite gender as a function of the sender and receiver’s age. They’ve done this in the past, and it seems like the most direct way of testing whether men are more likely to get lucky by messaging older or younger women. Without knowing whether older women are actually responding to younger men’s overtures, it’s kind of hard to say what it all means. Except that I’d still kill to have their data.

elsewhere on the internets…

The good people over at OKCupid, the best dating site on Earth (their words, not mine! I’m happily married!), just released a new slew of data on their OKTrends blog. Apparently men like women with smiley, flirty profile photos, and women like dismissive, unsmiling men. It’s pretty neat stuff, and definitely worth a read. Mating rituals aside, thuough, what I really like to think about whenever I see a new OKTrends post is how many people I’d be willing to kill to get my hands on their data.

Genetic Future covers the emergence of Counsyl, a new player in the field of personal genomics. Unlike existing outfits like 23andme and deCODEme.com, Counsyl focuses on rare Mendelian disorders, with an eye to helping prospective parents evaluate their genetic liabilities. What’s really interesting about Counsyl is its business model; if you have health insurance provided by Aetna or Blue Cross, you could potentially get a free test. Of course, the catch is that Aetna or Blue Cross get access to your results. In theory, this shouldn’t matter, since health insurers can’t use genetic information as grounds for discrimination. But then, on paper, employers can’t use race, gender, or sexual orientation as grounds for discrimination either, and yet we know it’s easier to get hired if your name is John than Jamal. That said, I’d probably go ahead and take Aetna up on its generous offer, except that my wife and I have no plans for kids, and the Counsyl test looks like it stays away from the garden-variety SNPs the other services cover…

The UK has banned the export of dowsing rods. In 2010! This would be kind of funny if not for the fact that dozens if not hundreds of Iraqis have probably died horrible deaths as a result of the Iraqi police force trying to detect roadside bombs using magic. [via Why Evolution is True].

Over at Freakonomics, regular contributor Ryan Hagen interviews psychologist, magician, and author Richard Wiseman, who just published a new empirically-based self-help book (can such a thing exist?). I haven’t read the book, but the interview is pretty good. Favorite quote:

What would I want to do? I quite like the idea of the random giving of animals. There’s a study where they took two groups of people and randomly gave people in one group a dog. But I’d quite like to replicate that with a much wider range of animals — including those that should be in zoos. I like the idea of signing up for a study, and you get home and find you’ve got to look after a wolf … .

On a professional note, Professor in Training has a really great two part series (1, 2) on what new tenure-track faculty need to know before starting the job. I’ve placed both posts inside Google Reader’s golden-starred vault, and fully expect to come back to them next Fall when I’m on the job market. Which means if you’re reading this and you’re thinking of hiring me, be warned: I will demand that a life-size bobble-head doll of Hans Eysenck be installed in my office, and thanks to PiT, I do now have the awesome negotiating powers needed to make it happen.

tuesday at 3 pm works for me

Apparently, Tuesday at 3 pm is the best time to suggest as a meeting time–that’s when people have the most flexibility available in their schedule. At least, that’s the conclusion drawn by a study based on data from WhenIsGood, a free service that helps with meeting scheduling. There’s not much to the study beyond the conclusion I just gave away; not surprisingly, people don’t like to meet before 10 or 11 am or after 4 pm, and there’s very little difference in availability across different days of the week.

What I find neat about this isn’t so much the results of the study itself as the fact that it was done at all. I’m a big proponent of using commercial website data for research purposes–I’m about to submit a paper that relies almost entirely on content pulled using the Blogger API, and am working on another project that makes extensive use of the Twitter API. The scope of the datasets one can assemble via these APIs is simply unparalleled; for example, there’s no way I could ever realistically collect writing samples of 50,000+ words from 500+ participants in a laboratory setting, yet the ability to programmatically access blogspot.com blog contents makes the task trivial. And of course, many websites collect data of a kind that just isn’t available off-line. For example, the folks at OKCupid are able to continuously pump out interesting data on people’s online dating habits because they have comprehensive data on interactions between literally millions of prospective dating partners. If you want to try to generate that sort of data off-line, I hope you have a really large lab.

Of course, I recognize that in this case, the WhenIsGood study really just amounts to a glorified press release. You can tell that’s what it is from the URL, which literally includes the “press/” directory in its path. So I’m certainly not naive enough to think that Web 2.0 companies are publishing interesting research based on their proprietary data solely out of the goodness of their hearts. Quite the opposite. But I think in this case the desire for publicity works in researchers’ favor: It’s precisely because virtually any press is considered good press that many of these websites would probably be happy to let researchers play with their massive (de-identified) datasets. It’s just that, so far, hardly anyone’s asked. The Web 2.0 world is a largely untapped resource that researchers (or at least, psychologists) are only just beginning to take advantage of.

I suspect that this will change in the relatively near future. Five or ten years from now, I imagine that a relatively large chunk of the research conducted in many area of psychology (particularly social and personality psychology) will rely heavily on massive datasets derived from commercial websites. And then we’ll all wonder in amazement at how we ever put up with the tediousness of collecting real-world data from two or three hundred college students at a time, when all of this online data was just lying around waiting for someone to come take a peek at it.