to each their own addiction

An only slightly fictionalized story, for my long-suffering wife.

“It’s happening again,” I tell my wife from the couch. “I’m having that soul-crushing experience again.”

“Too much work?” she asks, expecting the answer to be yes, since no matter what quantity of work I’m actually burdened with at any given moment, the way I describe it to to other people when they ask is always “too much.”

“No,” I say. “Work is fine right now.”

“Had a paper rejected?”

“Pfft, no,” I say. “Like that ever happens to me!” (I don’t tell her it’s happened to me twice in the past week.)

“Then what?”

“The blog posts,” I tell her, motioning to my laptop screen. “There’s just too many of them in my Reader. I can’t keep up! I’m drowning in RSS feeds!”

My wife has learned not to believe anything I say, ever; we’ve lived together long enough that her modal response to my complaints is an arched eyebrow. So I flip my laptop around and point at the gigantic bolded text in the corner that says All Items (118). Emotionally gigantic, I mean; physically, I think it’s only like 12 point font.

“One hundred and eighteen blog posts!” I yell at absolutely no one. “I’m going to be here all night!”

“That’s because you live here,” she helpfully points out.

I’m not sure exactly when I became enslaved by my blog feeds. I know it was sometime after Carl Zimmer‘s amazing post about the man-eating fireflies of Sri Lanka, and sometime before the Neuroskeptic self-published his momentous report introducing three entirely new mental health diagnoses. But that’s as much as I can tell you; the rest is lost in a haze of rapid-scrolling text, retweeted links, and never-ending comment threads. There’s no alarm bell that sounds out loud to indicate that you’ve stomped all over the line that separates occasional indulgence from outright “I can quit any time, honest!” abuse. No one shows up at your door, hands you a bucket of Skittles, and says, “congratulations! You’re hooked on feeds!”

The thought of all those unread posts piling up causes me to hyperventilate. My wife, who sits unperturbed in her chair as 1,000+ unread articles pile up in her Reader, stares at me with a mixture of bemusement and horror.

“Let’s go for a walk,” she suggests, making a completely transparent effort to distract me from my immense problems.

Going for a walk is, of course, completely out of the question; I still have 118 blog posts to read before I can do anything else. So I read all 118 posts, which turns out not to take all night, but more like 15 minutes (I have a very loose definition of reading; it’s closer to what other people call ‘seeing’). By the time I’ve done that, the internet has written another 8 new articles, so now I feel compelled to read those too. So I do that, and then I hit refresh again, and lo and behold, there are 2 MORE articles. So I grudgingly read those as well, and then I quickly shut my laptop so that no new blog posts can sneak up on me while I’m off hanging out in Microsoft Word pretending to do work.

Screw this, I think after a few seconds, and run to find my wife.

“Come on, let’s go for that walk,” I say, running as fast as I can towards my sandals.

“What’s the big rush,” she asks. “I want to go walking, not jogging; I already went to the gym today.”

“No choice,” I say. “We have to get back before the posts pile up again.”

“What?”

“I said, I have a lot of work to do.”

So we go out walking, and it’s nice and all that; the temperature is probably around 70 degrees; it’s cool and dry and the sun’s just going down; the ice cream carts are out in force on the Pearl Street mall; the jugglers juggle and the fire eaters eat fire and give themselves cancer; a little kid falls down and skins his knee but gets up and laughs like it didn’t even hurt, which it probably didn’t, because everyone knows children under seven years of age don’t have a central nervous system and can’t feel pain. It’s a really nice walk, and I’m happy we’re on it, but the whole time I keep thinking, How many dozens of posts has PZ Myers put up while I’ve been gone? Are Razib Khan and Ed Yong posting their link dumps as I think this? And what’s the over-under on the number of posts in my ‘cog blogs’ folder?

She sees me doing all this of course, and she’s not happy about it. So she lets me know it.

“I’m not happy about this,” she says.

When we get back, we each back to our respective computer screen. I’m relieved to note that the internet’s only made 11 more deliveries, which I promptly review and discharge. I star two posts for later re-consideration and let the rest disappear into the ether of spent words. Then I open up a manuscript I’ve been working on for a while and pretend to do some real work for a couple of hours. With periodic edutainment breaks, of course.

Around 11:30 pm I decide to close up shop for the night. No one really blogs after about 9 pm, which is fortunate, or I’d never get any sleep. It’s also the reason I avoid subscribing to European blogs if I can help it. Europeans have no respect for Mountain Time.

“Are you coming to bed,” I ask my wife.

“Not yet,” she says, looking guilty and avoiding eye contact.

“Why not? You have work to do?”

“Nope, no work.”

“Cooking? Are you making a fancy meal for dinner tomorrow?”

“No, it’s your turn to cook tomorrow,” she says, knowing full well that my idea of cooking consists of a take-out menu and telephone.

“Then what?”

She opens her mouth, but nothing comes out. The words are all jammed tightly in between her vocal cords.

Then I see it, poking out on the couch from under a pillow: green cover, 9 by 6 inches, 300 pages long. It’s that damn book!

“You’re reading Pride and Prejudice again,” I say. It’s an observation, not a question.

“No I’m not.”

“Yes you are. You’re reading that damn book again. I know it. I can see it. It’s right there.” I point at it, just so that there can’t possibly be any ambiguity about which book I’m talking about.

She gazes around innocently, looking at everything but the book.

“What is that, like the fourteenth time this year you’ve read it?”

“Twelfth,” she says, looking guilty. “But really, go to bed without me; I might be up for a while still. I have another fifty pages or so I need to finish before I can go to sleep. I just have to find out if Elizabeth Bennet and Mr. Darcy end up together.”

I look at her mournfully, quietly shut my laptop’s lid, and bid the both of them–wife and laptop–good night. My wife grudgingly nods, but doesn’t look away from Jane Austen’s pages. My RSS feeds don’t say anything either.

“Yes,” I mumble to no one in particular, as I slowly climb up the stairs and head for my toothbrush.

“Yes, they do end up together.”

repost: narrative tips from a grad school applicant

Since it’s grad school application season for undergraduates, I thought I’d repost some narrative tips about how to go about writing a personal statement for graduate programs in psychology. This is an old, old post from a long-deceased blog; it’s from way back in 2002 when I was applying to grad school. It’s kind of a serious piece; if I were to rewrite it today, the tone would be substantially lighter. I can’t guarantee that following these tips will get you into grad school, but I can promise that you’ll be amazed at the results.

The first draft of my personal statement was an effortful attempt to succinctly sum up my motivation for attending graduate school. I wanted to make my rationale for applying absolutely clear, so I slaved over the statement for three or four days, stopping only for the occasional bite of food and hour or two of sleep every night. I was pretty pleased with the result. For a first draft, I thought it showed great promise. Here’s how it started:

I want to go to,o grajit skool cuz my frend steve is in grajit and he says its ez and im good at ez stuff

When I showed this to my advisor he said, “I don’t know if humor is the way to go for this thing.”

I said, “What do you mean, humor?”

After that I took a three month break from writing my personal statement while I completed a grade 12 English equivalency exam and read a few of the classics to build up my vocabulary. My advisor said that even clever people like me needed help sometimes. I read Ulysses, The Odyssey, and a few other Greek sounding books, and a book called The Cat in the Hat which was by the same author as the others, but published posthumously. Satisfied that I was able to write a letter that would impress every graduate admissions committee in the world, I set about writing a second version of my personal statement. Here’s how that went:

Dear Dirty Admissions Committee,
Solemn I came forward and mounted the round gunrest. I faced about and blessed gravely thrice the Ivory Tower, the surrounding country, and all the Profs. Then catching sight of the fMRI machine, I bent towards it and made rapid crosses in the air, gurgling in my throat and shaking my head.

“Too literary,” said my advisor when I showed him.

“Mud,” I said, and went back to the drawing board.

The third effort was much better. I had weaned myself off the classics and resolved to write a personal statement that fully expressed what a unique human being I was and why I would be an asset to the program. I talked about how I could juggle three bean bags and almost four, I was working on four, and how I’d stopped biting my fingernails last year so I had lots of free time to do psychology now. To show that I was good at following through on things that I started, I said,

p.s. when I can juggle four bean bags ( any day now) I will write you to let you know so you can update your file.

Satisfied that I had written the final copy of my statement, I showed it to my advisor. He was wild-eyed about it.

“You just don’t get it, do you,” he said, ripping my statement in two and throwing it into the wastepaper basket. “Tell you what. Why don’t I write a statement for you. And then you can go through it and make small changes to personalize it. Ok?”

“Sure,” I said. So the next day my advisor gave me a two-page personal statement he had written for me. Now I won’t bore you with all of the details, but I have to say, it was pretty bad. Here’s how it started:

After studying psychology for nearly four years at the undergraduate level, I have decided to pursue a Ph.D. in the field. I have developed a keen interest in [list your areas of interest here] and believe [university name here] will offer me outstanding opportunities.

“Now go make minor changes,” said my advisor.

“Mud,” I said, and went to make minor changes.

I came back with the final version a week later. It was truly a masterpiece; co-operating with my advisor had really helped. At first I had been skeptical because what he wrote was so bad the way he gave it to me, but with a judicious sprinkling of helpful clarifications, it turned into something really good. It was sort of like an ugly cocoon (his draft) bursting into a beautiful rainbow (my version). It went like this:

After studying psychology (and juggling!) for nearly four years at the undergraduate level (of university), I have decided to pursue a Ph.D. in the field. Cause I need it to become a Prof. I have developed a keen interest in [list your areas of interest here Vision, Language, Memory, Brain] and believe [university name hereStanford Princeton Mishigan] will offer me outstanding opportunities in psychology and for the juggling society.

“Brilliant,” said my advisor when I showed it to him. “You’ve truly outdone yourself.”

“Mud,” I said, and went to print six more copies.

the perils of digging too deep

Another in a series of posts supposedly at the intersection of fiction and research methods, but mostly just an excuse to write ridiculous stories and pretend they have some sort of moral.


Dr. Rickles the postdoc looked a bit startled when I walked into his office. He was eating a cheese sandwich and watching a chimp on a motorbike on his laptop screen.

“YouTube again?” I asked.

“Yes,” he said. “It’s lunch.”

“It’s 2:30 pm,” I said, pointing to my watch.

“Still my lunch hours.”

Lunch hours for Rickles were anywhere from 11 am to 4 pm. It depended on exactly when you walked in on him doing something he wasn’t supposed to; that was the event that marked the onset of Lunch.

“Fair enough,” I said. “I just stopped by to see how things were going.”

“Oh, quite well.” said Rickles. “Things are going well. I just found a video of a chimp and a squirrel riding a motorbike together. They aren’t even wearing helmets! I’ll send you the link.”

“Please don’t. I don’t like squirrels. But I meant with work. How’s the data looking.”

He shot me a pained look, like I’d just caught him stealing video game money from his grandmother.

“The data are TERRIBLE,” he said in all capital letters.

I wasn’t terribly surprised at the revelation; I’d handed Rickles the dataset only three days prior, taking care not to  tell him it was the dataset from hell. Rickles was the fourth or fifth person in the line of succession; the data had been handed down from postdoc to graduate student to postdoc for several years now. Everyone in the lab wanted to take a crack at it when they first heard about it, and no one in the lab wanted anything to do with it once they’d taken a peek. I’d given it to Rickles in part to teach him a lesson; he’d been in the lab for several weeks now and somehow still seemed happy and self-assured.

“Haven’t found anything interesting yet?” I asked. “I thought maybe if you ran the Flimflan test on the A-trax, you might get an effect. Or maybe if you jimmied the cryptos on the Borgatron…”

“No, no,” Rickles interrupted, waved me off. “The problem isn’t that there’s nothing interesting in the data; it’s that there’s too MUCH stuff. There are too MANY results. The story is too COMPLEX.”

That didn’t compute for me, so I just stared at him blankly. No one ever found COMPLEX effects in my lab. We usually stopped once we found SIMPLE effects.

Rickles was unimpressed.

“You follow what I’m saying, Guy? There are TOO-MANY-EFFECTS. There’s too much going on in the data.”

“I don’t see how that’s possible,” I said. “Keith, Maria, and Lakshmi each spent weeks on this data and found nothing.”

“That,” said Rickles, “is because Keith, Maria, and Lakshmi never thought to apply the Epistocene Zulu transform to the data.”

The Epistocene Zulu transform! It made perfect sense when you thought about it; so why hadn’t I ever thought about it? Who was Rickles cribbing analysis notes from?

“Pull up the data,” I said excitedly. “I want to see what you’re talking about.”

“Alright, alright. Lunch hours are over now anyway.”

He grudgingly clicked on the little X on his browser. Then he pulled up a spreadsheet that must have had a million columns in it. I don’t know where they’d all come from; it had only had sixteen thousand or so when I’d had the hard drives delivered to his office.

“Here,” said Rickles, showing me the output of the Pear-sampled Tea test. “There’s the A-trax, and there’s its Nuffton index, and there’s the Zimming Range. Look at that effect. It’s bigger than the zifflon correlation Yehudah’s group reported in Nature last year.”

“Impressive,” I said, trying to look calm and collected. But in my head, I was already trying to figure out how I’d ask the department chair for a raise once this finding was published. Each point on that Zimming Range is worth at least $500, I thought.

“Are there any secondary analyses we could publish alongside that,” I asked.

“Oh, I don’t think you want to publish that,” Rickles laughed.

“Why the hell not? It could be big! You just said yourself it was a giant effect!”

“Oh sure. It’s a big effect. But I don’t believe it for one second.”

“Why not? What’s not to like? This finding make’s Yehudah’s paper look like a corn dog!”

I recognized, in the course of uttering those words, that they did not constitute the finest simile ever produced.

“Well, there are two massive outliers, for one. If you eliminate them, the effect is much smaller. And if you take into consideration the Gupta skew because the data were collected with the old reverberator, there’s nothing left at all.”

“Okay, fine,” I muttered. “Is there anything else in the data?”

“Sure, tons of things. Like, for example, there’s a statistically significant gamma reduction.”

“A gamma reduction? Are you sure? Or do you mean beta,” I asked.

“Definitely gamma,” said Rickles. “There’s nothing in the betas, deltas, or thetas. I checked.”

“Okay. That sounds potentially interesting and publishable. But I bet you’re going to tell me why we shouldn’t believe that result, either, right?”

“Well,” said Rickles, looking a bit self-conscious, “it’s just that it’s a pretty fine-grained analysis; you’re not really leaving a lot of observations when you slice it up that thin. And the weird thing about the gamma reduction is that it is essentially tantamount to accepting a null effect; this was Jayaraman’s point in that article in Statistica Splenda last month.”

“Sure, the Gerryman article, right. I read that. Forget the gamma reduction. What else?”

“There are quite a few schweizels,” Rickles offered, twisting the cap off a beer that had appeared out of the minibar under his desk.

I looked at him suspiciously. I suspected it was a trap; Rickels knew how much I loved Schweizel units. But I still couldn’t resist. I had to know.

“How many schweizels are there,” I asked, my hand clutching at the back of a nearby chair to help keep me steady.

“Fourteen,” Rickles said matter-of-factedly.

“Fourteen!” I gasped. “That’s a lot of schweizels!”

“It’s not bad,” said Rickles. “But the problem is, if you look at the B-trax, they also have a lot of schweizels. Seventeen of them, actually.”

“Seventeen schweizels!” I exclaimed. “That’s impossible! How can there be so many Schweizel units in one dataset!”

“I’m not sure. But… I can tell you that if you normalize the variables based on the Smith-Gill ratio, the effect goes away completely.”

There it was; the sound of the other shoe dropping. My heart gave a little cough–not unlike the sound your car engine makes in the morning when it’s cold and it wants you to stop provoking it and go back to bed. It was aggravating, but I understood what Rickles was saying. You couldn’t really say much about the Zimming Range unless your schweizel count was properly weighted. Still, I didn’t want to just give up on the schweizels entirely. I’d spent too much of my career delicately massaging schweizels to give up without one last tug.

“Maybe we can just say that the A-trax/Nuffton relationship is non-linear?” I suggested.

“Non-linear?” Rickles snorted. “Only if by non-linear you mean non-real! If it doesn’t survive Smith-Gill, it’s not worth reporting!”

I grudgingly conceded the point.

“What about the zifflons? Have you looked at them at all? It wouldn’t be so novel given Yehudah’s work, but we might still be able to get it into some place like Acta Ziffletica if there was an effect…”

“Tried it. There isn’t really any A-trax influence on zifflons. Or a B-trax effect, for that matter. There is a modest effect if you generate the Mish component for all the trax combined and look only at that. But that’s a lot of trax, and we’re not correcting for multiple Mishing, so I don’t really trust it…”

I saw that point too, and was now nearing despondency. Rickles had shot down all my best ideas one after the other. I wondered how I’d convince the department chair to let me keep my job.

Then it came to me in a near-blinding flash of insight. Near blinding, because I smashed my forehead on the overhead chandelier jumping out of my chair. An inch lower, and I’d have lost both eyes.

“We need to get that chandelier replaced,” I said, clutching my head in my hands. “It has no business hanging around in an office like this.”

“We need to get it replaced,” Rickles agreed. “I’ll do it tomorrow during my lunch hours.”

I knew that meant the chandelier would be there forever–or at least as long as Rickles inhabited the office.

“Have you tried counting the Dunams,” I suggested, rubbing my forehead delicately and getting back to my brilliant idea.

“No,” he said, leaning forward in his chair slightly. “I didn’t count Dunams.”

Ah-hah! I thought to myself. Not so smart are we now! The old boy’s still got some tricks up his sleeve.

“I think you should count the Dunams,” I offered sagely. “That always works for me. I do believe it might shed some light on this problem.”

“Well…” said Rickles, shaking his head slightly, “maaaaaybe. But Li published a paper in Psykometrika last year showing that Dunam counting is just a special case of Klein’s occidental protrusion method. And Klein’s method is more robust to violations of normality. So I used that. But I don’t really know how to interpret the results, because the residual is negative.”

I really had no idea either. I’d never come across a negative Dunam residual, and I’d never even heard of occidental protrusion. As far as I was concerned, it sounded like a made-up method.

“Okay,” I said, sinking back into my chair, ready to give up. “You’re right. This data… I don’t know. I don’t know what it means.”

I should have expected it, really; it was, after all, the dataset from hell. I was pretty sure my old RA had taken a quick jaunt through purgatory every morning before settling into the bench to run some experiments.

“I told you so,” said Rickles, putting his feet up on the desk and handing me a beer I didn’t ask for. “But don’t worry about it too much. I’m sure we’ll figure it out eventually. We probably just haven’t picked the right transformation yet. There’s Nordstrom, El-Kabir, inverse Zulu…”

He turned to his laptop and double-clicked an icon on the desktop that said “YouTube”.

“…or maybe you can just give the data to your new graduate student when she starts in a couple of weeks,” he said as an afterthought.

In the background, a video of a chimp and a puppy driving a Jeep started playing on a discolored laptop screen.

I mulled it over. Should I give the data to Josephine? Well, why not? She couldn’t really do any worse with it, and it would be a good way to break her will quickly.

“That’s not a bad idea, Rickles,” I said. “In fact, I think it might be the best idea you’ve had all week. Boy, that chimp is a really aggressive driver. Don’t drive angry, chimp! You’ll have an accid–ouch, that can’t be good.”

The

perils of digging too deep

Dr. Rickles the postdoc looked a bit startled when I walked into his office. He was eating a cheese sandwich and watching a chimp on a motorbike on his laptop screen.
“YouTube again?” I asked.
“Yes,” he said. “It’s lunch.”
“It’s 2:30 pm,” I said, pointing to my watch.
“Still my lunch hours.”
Lunch hours for Rickles were anywhere from 11 am to 4 pm. It depended on exactly when you walked in on him doing something he wasn’t supposed to; that was the event that marked the onset of Lunch.
“Fair enough,” I said. “I just stopped by to see how things were going.”
“Oh, quite well.” said Rickles. “Things are going well. I just found a video of a chimp and a squirrel riding a motorbike together. They aren’t even wearing helmets! I’ll send you the link.”
“Please don’t. I don’t like squirrels. But I meant with work. How’s the data looking.”
He shot me a pained look, like I’d just caught him stealing video game money from his grandmother.
“The data are TERRIBLE,” he said in all capital letters.
I wasn’t terribly surprised at that revelation; I’d handed Rickles the dataset only three days prior, taking care not to  tell him it was the dataset from hell. Rickles was the fourth or fifth person in the line of succession; the data had been handed down from postdoc to graduate student to postdoc for several years now. Everyone in the lab wanted to take a crack at it when they first heard about it, and no one in the lab wanted anything to do with it once they’d taken a peek. I’d given it to Rickles in part to teach him a lesson; he’d been in the lab for several weeks now and somehow still seemed happy and self-assured.
“Haven’t found anything interesting yet?” I asked. “I thought maybe if you ran the Flimflan test on the A-trax, you might get an effect. Or maybe if you jimmied the cryptos on the Borgatron…”
“No, no,” Rickles interrupted, waved me off. “The problem isn’t that there’s nothing interesting in the data; it’s that there’s too MUCH stuff. There are too MANY results. The story is too COMPLEX.”
That didn’t compute for me, so I just stared at him blankly. No one ever found COMPLEX effects in my lab. We usually stopped once we found SIMPLE effects.
Rickles was unimpressed.
“You follow what I’m saying, Guy? There are TOO-MANY-EFFECTS. There’s too much going on in the data.”
“I don’t see how that’s possible,” I said. “Keith, Maria, and Lakshmi each spent weeks on this data and found *nothing*.”
“That,” said Rickles, “is because Keith, Maria, and Lakshmi never thought to apply the Epistocene Zulu transform to the data.”
The Epistocene Zulu transform! It made perfect sense when you thought about it; so why hadn’t I ever thought about it? Who was Rickles cribbing analysis notes from?
“Pull up the data,” I said excitedly. “I want to see what you’re talking about.”
“Alright, alright. Lunch hours are over now anyway.”
He grudgingly clicked on the little X on his browser. Then he pulled up a spreadsheet that must have had a million columns in it. I don’t know where they’d all come from; it had only had sixteen thousand or so when I’d had the hard drives delivered to his office.
“Here,” said Rickles, showing me the output of the Pear-sampled Tea test. “There’s the A-trax, and there’s its Nuffton index, and there’s the Zimming Range. Look at that effect. It’s bigger than the zifflon correlation Yehudah’s group reported in Nature last year.”
“Impressive,” I said, trying to look calm and collected. But in my head, I was already trying to figure out how I’d ask the department chair for a raise once this finding was published. *Each point on that Zimming Range is worth at least $500*, I thought.
“Are there any secondary analyses we could publish alongside that,” I asked.
“Oh, I don’t think you want to publish *that*,” Rickles laughed.
“Why the hell not? It could be big! You just said yourself it was a giant effect!”
“Oh *sure*. It’s a big effect. But I don’t believe it for one second.”
“Why not? What’s not to like? This finding make’s Yehudah’s paper look like a corn dog!”
I recognized, in the course of uttering those words, that they did not constitute the finest simile ever.
“Well, there are two massive outliers, for one. If you eliminate them, the effect is much smaller. And if you take into consideration the Gupta skew because the data were collected with the old reverberator, there’s nothing left at all.”
“Okay, fine,” I muttered. “Is there anything else in the data?”
“Sure, tons of things. Like, for example, there’s a statistically significant Gamma reduction.”
“A gamma reduction? Are you sure? Or do you mean Beta,” I asked.
“Definitely gamma,” said Rickles. “There’s nothing in the betas, deltas, or thetas. I looked.”
“Okay. That sounds potentially interesting and publishable. But I bet you’re going to tell me why we shouldn’t believe that result, either, right?”
“Well,” said Rickles, looking a bit self-conscious, “it’s just that it’s a pretty fine-grained analysis; you’re not really leaving a lot of observations when you slice it up that thin. And the weird thing about the gamma reduction is that it is essentially tantamount to accepting a null effect; this was Jayaraman’s point in that article in *Statistica Splenda* last month.”
“Sure, the Gerryman article, right. Okay. Forget the gamma reduction. What else?”
“There are quite a few Schweizels,” Rickles offered, twisting the cap off a beer that had appeared out of the minibar under his desk.
I looked at him suspiciously. I suspected it was a trap; Rickels knew how much I loved Schweizel units. But I still couldn’t resist. I had to know.
“How many Schweizels are there,” I asked, my hand clutching at the back of a nearby chair to help me stay upright.
“Fourteen,” Rickles said matter-of-factedly.
“Fourteen!” I gasped. “That’s a lot of Schweizels!”
“It’s not bad,” said Rickles. “But the problem is, if you look at the B-trax, they also have a lot of Schweizels. Seventeen of them, actually.”
“Seventeen Schweizels!” I exclaimed. “That’s impossible! How can there be so many Schweizel units in one dataset!”
“I’m not sure. But… I can tell you that if you normalize the variables based on the Smith-Gill ratio, the effect goes away completely.”
There it was; the sound of the other shoe dropping. My heart gave a little cough–not unlike the sound your car engine makes in the morning when it’s cold and it wants you to go back to bed and stop stressing it out. It was aggravating, but I understood what Rickles was saying. You couldn’t really say much about the Zimming Range unless your Schweizel count was properly weighted. Still, I didn’t want to just give up on the Schweizels entirely.
“Maybe we can just say that the A-trax/Nuffton relationship is non-linear,” I proposed.
“Non-linear?” Rickles snorted. “Only if by non-linear you mean non-real! If it doesn’t survive Smith-Gill, it’s not worth reporting!”
I grudgingly conceded the point.
“What about the zifflons? Have you looked at them at all? It wouldn’t be so novel given Yehudah’s work, but we might still be able to get it into some place like *Acta Ziffletica* if there was an effect…”
“Tried it. There isn’t really any A-trax influence on zifflons. Or a B-trax effect, for that matter. There *is* a modest effect if you generate the Mish component for all the trax combined and look only at that. But that’s a lot of trax, and we’re not correcting for multiple Mishing, so I don’t really trust it…”
I saw that point too, and was now nearing despondency. Rickles had shot down all my best ideas one after the other. What else was left?
Then it came to me in a near-blinding flash of insight. *Near* blinding, because I smashed my forehead on the overhead chandelier jumping out of my chair. An inch lower, and I’d have lost both eyes.
“We need to get that chandelier replaced,” I said, clutching my head in my hands. “It has no business hanging around in an office like this.”
“We need to get it replaced,” Rickles agreed. “I’ll do it tomorrow during my lunch hours.”
I knew that meant the chandelier would be there forever–or at least as long as Rickles inhabited the office.
“Have you tried counting the Dunams,” I suggested, rubbing my forehead delicately and getting back to my brilliant idea.
“No,” he said, leaning forward in his chair slightly. “I didn’t count Dunams.”
Ah-hah! I thought to myself. Not so smart are we now! The old boy’s still got some tricks up his sleeve.
“I think you should count the Dunams,” I offered sagely. “That always works for me. I do believe it might shed some light on this problem.”
“Well…” said Rickles, shaking his head slightly, “maaaaaybe. But Li published a paper in Psychometrika last year showing that Dunam counting is just a special case of Klein’s occidental protrusion method. And Klein’s method is more robust to violations of normality. So I used that. But I don’t really know how to interpret the results, because the residual is *negative*.”
I really had no idea either. I’d never come across a negative Dunam residual, and I’d never even heard of occidental protrusion. As far as I was concerned, it sounded like a made-up method.
“Okay,” I said, sinking back into my chair, ready to give up. “You’re right. This data… I don’t know. I don’t know what it means.” I should have expected it, really; it was, after all, the dataset from hell. I was pretty sure my old RA had collected it after taking a quick jaunt through purgatory every morning.
“I told you so,” said Rickles, putting his feet up on the desk and handing me a beer I didn’t ask for. “But don’t worry about it too much. I’m sure we’ll figure it out eventually. We probably just haven’t picked the right transformation yet.”
He turned to his laptop and double-clicked an icon on the desktop that said “YouTube”.
“Maybe you can give the data to your new graduate student when she starts in a couple of weeks,” he said as an afterthought.
In the background, a video of a chimp and a puppy driving a Jeep started playing on a discolored laptop screen.
I mulled it over. Should I give the data to Josephine? Well, why not? She couldn’t really do any *worse* with it, and it *would* be a good way to break her will in a hurry.
“That’s not a bad idea, Rickles,” I said. “In fact, I think it might be the best idea you’ve had all week. Boy, that chimp is a really aggressive driver. Don’t drive angry, chimp! You’ll have an accid–ouch, that can’t be good.”

the capricious nature of p < .05, or why data peeking is evil

There’s a time-honored tradition in the social sciences–or at least psychology–that goes something like this. You decide on some provisional number of subjects you’d like to run in your study; usually it’s a nice round number like twenty or sixty, or some number that just happens to coincide with the sample size of the last successful study you ran. Or maybe it just happens to be your favorite number (which of course is forty-four). You get your graduate student to start running the study, and promptly forget about it for a couple of weeks while you go about writing up journal reviews that are three weeks overdue and chapters that are six months overdue.

A few weeks later, you decide you’d like to know how that Amazing New Experiment you’re running is going. You summon your RA and ask him, in magisterial tones, “how’s that Amazing New Experiment we’re running going?” To which he falteringly replies that he’s been very busy with all the other data entry and analysis chores you assigned him, so he’s only managed to collect data from eighteen subjects so far. But he promises to have the other eighty-two subjects done any day now.

“Not to worry,” you say. “We’ll just take a peek at the data now and see what it looks like; with any luck, you won’t even need to run any more subjects! By the way, here are my car keys; see if you can’t have it washed by 5 pm. Your job depends on it. Ha ha.”

Once your RA’s gone to soil himself somewhere, you gleefully plunge into the task of peeking at your data. You pivot your tables, plyr your data frame, and bravely sort your columns. Then you extract two of the more juicy variables for analysis, and after some careful surgery a t-test or six, you arrive at the conclusion that your hypothesis is… “marginally” supported. Which is to say, the magical p value is somewhere north of .05 and somewhere south of .10, and now it’s just parked by the curb waiting for you to give it better directions.

You briefly contemplate reporting your result as a one-tailed test–since it’s in the direction you predicted, right?–but ultimately decide against that. You recall the way your old Research Methods professor used to rail at length against the evils of one-sample tests, and even if you don’t remember exactly why they’re so evil, you’re not willing to take any chances. So you decide it can’t be helped; you need to collect some more data.

You summon your RA again. “Is my car washed yet?” you ask.

“No,” says your RA in a squeaky voice. “You just asked me to do that fifteen minutes ago.”

“Right, right,” you say. “I knew that.”

You then explain to your RA that he should suspend all other assigned duties for the next few days and prioritize running subjects in the Amazing New Experiment. “Abandon all other tasks!” you decree. “If it doesn’t involve collecting new data, it’s unimportant! Your job is to eat, sleep, and breathe new subjects! But not literally!”

Being quite clever, your RA sees an opening. “I guess you’ll want your car keys back, then,” he suggests.

“Nice try, Poindexter,” you say. “Abandon all other tasks… starting tomorrow.”

You also give your RA very careful instructions to email you the new data after every single subject, so that you can toss it into your spreadsheet and inspect the p value at every step. After all, there’s no sense in wasting perfectly good data; once your p value is below .05, you can just funnel the rest of the participants over to the Equally Amazing And Even Newer Experiment you’ve been planning to run as a follow-up. It’s a win-win proposition for everyone involved. Except maybe your RA, who’s still expected to return triumphant with a squeaky clean vehicle by 5 pm.

Twenty-six months and four rounds of review later, you publish the results of the Amazing New Experiment as Study 2 in a six-study paper in the Journal of Ambiguous Results. The reviewers raked you over the coals for everything from the suggested running head of the paper to the ratio between the abscissa and the ordinate in Figure 3. But what they couldn’t argue with was the p value in Study 2, which clocked in at just under p < .05, with only 21 subjects’ worth of data (compare that to the 80 you had to run in Study 4 to get a statistically significant result!). Suck on that, Reviewers!, you think to yourself pleasantly while driving yourself home from work in your shiny, shiny Honda Civic.

So ends our short parable, which has at least two subtle points to teach us. One is that it takes a really long time to publish anything; who has time to wait twenty-six months and go through four rounds of review?

The other, more important point, is that the desire to peek at one’s data, which often seems innocuous enough–and possibly even advisable (quality control is important, right?)–can actually be quite harmful. At least if you believe that the goal of doing research is to arrive at the truth, and not necessarily to publish statistically significant results.

The basic problem is that peeking at your data is rarely a passive process; most often, it’s done in the context of a decision-making process, where the goal is to determine whether or not you need to keep collecting data. There are two possible peeking outcomes that might lead you to decide to halt data collection: a very low p value (i.e., p < .05), in which case your hypothesis is supported and you may as well stop gathering evidence; or a very high p value, in which case you might decide that it’s unlikely you’re ever going to successfully reject the null, so you may as well throw in the towel. Either way, you’re making the decision to terminate the study based on the results you find in a provisional sample.

A complementary situation, which also happens not infrequently, occurs when you collect data from exactly as many participants as you decided ahead of time, only to find that your results aren’t quite what you’d like them to be (e.g., a marginally significant hypothesis test). In that case, it may be quite tempting to keep collecting data even though you’ve already hit your predetermined target. I can count on more than one hand the number of times I’ve overheard people say (often without any hint of guilt) something to the effect of “my p value’s at .06 right now, so I just need to collect data from a few more subjects.”

Here’s the problem with either (a) collecting more data in an effort to turn p < .06 into p < .05, or (b) ceasing data collection because you’ve already hit p < .05: any time you add another subject to your sample, there’s a fairly large probability the p value will go down purely by chance, even if there’s no effect. So there you are sitting at p < .06 with twenty-four subjects, and you decide to run a twenty-fifth subject. Well, let’s suppose that there actually isn’t a meaningful effect in the population, and that p < .06 value you’ve got is a (near) false positive. Adding that twenty-fifth subject can only do one of two things: it can raise your p value, or it can lower it. The exact probabilities of these two outcomes depends on the current effect size in your sample before adding the new subject; but generally speaking, they’ll rarely be very far from 50-50. So now you can see the problem: if you stop collecting data as soon as you get a significant result, you may well be capitalizing on chance. It could be that if you’d collected data from a twenty-sixth and twenty-seventh subject, the p value would reverse its trajectory and start rising. It could even be that if you’d collected data from two hundred subjects, the effect size would stabilize near zero. But you’d never know that if you stopped the study as soon as you got the results you were looking for.

Lest you think I’m exaggerating, and think that this problem falls into the famous class of things-statisticians-and-methodologists-get-all-anal-about-but-that-don’t-really-matter-in-the-real-world, here’s a sobering figure (taken from this chapter):

data_peeking

The figure shows the results of a simulation quantifying the increase in false positives associated with data peeking. The assumptions here are that (a) data peeking begins after about 10 subjects (starting earlier would further increase false positives, and starting later would decrease false positives somewhat), (b) the researcher stops as soon as a peek at the data reveals a result significant at p < .05, and (c) data peeking occurs at incremental steps of either 1 or 5 subjects. Given these assumptions, you can see that there’s a fairly monstrous rise in the actual Type I error rate (relative to the nominal rate of 5%). For instance, if the researcher initially plans to collect 60 subjects, but peeks at the data after every 5 subjects, there’s approximately a 17% chance that the threshold of p < .05 will be reached before the full sample of 60 subjects is collected. When data peeking occurs even more frequently (as might happen if a researcher is actively trying to turn p < .07 into p < .05, and is monitoring the results after each incremental participant), Type I error inflation is even worse. So unless you think there’s no practical difference between a 5% false positive rate and a 15 – 20% false positive rate, you should be concerned about data peeking; it’s not the kind of thing you just brush off as needless pedantry.

How do we stop ourselves from capitalizing on chance by looking at the data? Broadly speaking, there are two reasonable solutions. One is to just pick a number up front and stick with it. If you commit yourself to collecting data from exactly as many subjects as you said you would (you can proclaim the exact number loudly to anyone who’ll listen, if you find it helps), you’re then free to peek at the data all you want. After all, it’s not the act of observing the data that creates the problem; it’s the decision to terminate data collection based on your observation that matters.

The other alternative is to explicitly correct for data peeking. This is a common approach in large clinical trials, where data peeking is often ethically mandated, because you don’t want to either (a) harm people in the treatment group if the treatment turns out to have clear and dangerous side effects, or (b) prevent the control group from capitalizing on the treatment too if it seems very efficacious. In either event, you’d want to terminate the trial early. What researchers often do, then, is pick predetermined intervals at which to peek at the data, and then apply a correction to the p values that takes into account the number of, and interval between, peeking occasions. Provided you do things systematically in that way, peeking then becomes perfectly legitimate. Of course, the downside is that having to account for those extra inspections of the data makes your statistical tests more conservative. So if there aren’t any ethical issues that necessitate peeking, and you’re not worried about quality control issues that might be revealed by eyeballing the data, your best bet is usually to just pick a reasonable sample size (ideally, one based on power calculations) and stick with it.

Oh, and also, don’t make your RAs wash your car for you; that’s not their job.

the fifty percent sleeper

That’s the title of a short fiction piece I have up at lablit.com today; it’s about brain scanning and beef jerky, among other things. It starts like this:

Day 1, 6 a.m.

Ok, I’m locked into this place now. I’ve got ten pounds of beef jerky, fifty dollars for the vending machine, and a flash drive full of experiments to run. If I can get eighteen usable subjects’ worth of data in five days, Yezerski mows my lawn, does my dishes for a week, and walks my dog three times a week for two months. If I don’t get eighteen subjects done, I mow his lawn, do his dishes, and drive his disabled grandmother to physiotherapy once a week for six months. Also: if I don’t get any subjects scanned, I have to tattoo Yezerski’s grandmother’s name on my back in 50-point font. We both know it’s not going to come to that, but Yezerski insisted we make it a part of the bet anyway.

And then goes on in a similar vein. You might enjoy it if you like MRI machines and cerebellums. If you don’t care for brains, you’ll probably just find it silly.

the parable of zoltan and his twelve sheep, or why a little skepticism goes a long way

What follows is a fictional piece about sheep and statistics. I wrote it about two years ago, intending it to serve as a preface to an article on the dangers of inadvertent data fudging. But then I decided that no journal editor in his or her right mind would accept an article that started out talking about thinking sheep. And anyway, the rest of the article wasn’t very good. So instead, I post this parable here for your ovine amusement. There’s a moral to the story, but I’m too lazy to write about it at the moment.

A shepherd named Zoltan lived in a small village in the foothills of the Carpathian Mountains. He tended to a flock of twelve sheep: Soffia, Krystyna, Anastasia, Orsolya, Marianna, Zigana, Julinka, Rozalia, Zsa Zsa, Franciska, Erzsebet, and Agi. Zoltan was a keen observer of animal nature, and would often point out the idiosyncracies of his sheep’s behavior to other shepherds whenever they got together.

“Anastasia and Orsolya are BFFs. Whatever one does, the other one does too. If Anastasia starts licking her face, Orsolya will too; if Orsolya starts bleating, Anastasia will start harmonizing along with her.”

“Julinka has a limp in her left leg that makes her ornery. She doesn’t want your pity, only your delicious clovers.”

“Agi is stubborn but logical. You know that old saying, spare the rod and spoil the sheep? Well, it doesn’t work for Agi. You need calculus and rhetoric with Agi.”

Zoltan’s colleagues were so impressed by these insights that they began to encourage him to record his observations for posterity.

“Just think, Zoltan,” young Gergely once confided. “If something bad happened to you, the world would lose all of your knowledge. You should write a book about sheep and give it to the rest of us. I hear you only need to know six or seven related things to publish a book.”

On such occasions, Zoltan would hem and haw solemnly, mumbling that he didn’t know enough to write a book, and that anyway, nothing he said was really very important. It was false modestly of course; in reality, he was deeply flattered, and very much concerned that his vast body of sheep knowledge would disappear along with him one day. So one day, Zoltan packed up his knapsack, asked Gergely to look after his sheep for the day, and went off to consult with the wise old woman who lived in the next village.

The old woman listened to Zoltan’s story with a good deal of interest, nodding sagely at all the right moments. When Zoltan was done, the old woman mulled her thoughts over for a while.

“If you want to be taken seriously, you must publish your findings in a peer-reviewed journal,” she said finally.

“What’s Pier Evew?” asked Zoltan.

“One moment,” said the old woman, disappearing into her bedroom. She returned clutching a dusty magazine. “Here,” she said, handing the magazine to Zoltan. “This is peer review.”

That night, after his sheep had gone to bed, Zoltan stayed up late poring over Vol. IV, Issue 5 of Domesticated Animal Behavior Quarterly. Since he couldn’t understand the figures in the magazine, he read it purely for the articles. By the time he put the magazine down and leaned over to turn off the light, the first glimmerings of an empirical research program had begun to dance around in his head. Just like fireflies, he thought. No, wait, those really were fireflies. He swatted them away.

“I like this… science,” he mumbled to himself as he fell asleep.

In the morning, Zoltan went down to the local library to find a book or two about science. He checked out a volume entitled Principia Scientifica Buccolica—a masterful derivation from first principles of all of the most common research methods, with special applications to animal behavior. By lunchtime, Zoltan had covered t-tests, and by bedtime, he had mastered Mordenkainen’s correction for inestimable herds.

In the morning, Zoltan made his first real scientific decision.

“Today I’ll collect some pilot data,” he thought to himself, “and tomorrow I’ll apply for an R01.”

His first set of studies tested the provocative hypothesis that sheep communicate with one another by moving their ears back and forth in Morse code. Study 1 tested the idea observationally. Zoltan and two other raters (his younger cousins), both blind to the hypothesis, studied sheep in pairs, coding one sheep’s ear movements and the other sheep’s behavioral responses. Studies 2 through 4 manipulated the sheep’s behavior experimentally. In Study 2, Zoltan taped the sheep’s ears to their head; in Study 3, he covered their eyes with opaque goggles so that they couldn’t see each other’s ears moving. In Study 4, he split the twelve sheep into three groups of four in order to determine whether smaller groups might promote increased sociability.

That night, Zoltan minded the data. “It’s a lot like minding sheep,” Zoltan explained to his cousin Griga the next day. “You need to always be vigilant, so that a significant result doesn’t get away from you.”

Zoltan had been vigilant, and the first 4 studies produced a number of significant results. In Study 1, Zoltan found that sheep appeared to coordinate ear twitches: if one sheep twitched an ear several times in a row, it was a safe bet that other sheep would start to do the same shortly thereafter (p < .01). There was, however, no coordination of licking, headbutting, stamping, or bleating behaviors, no matter how you sliced and diced it. “It’s a highly selective effect,” Zoltan concluded happily. After all, when you thought about it, it made sense. If you were going to pick just one channel for sheep to communicate through, ear twitching was surely a good one. One could make a very good evolutionary argument that more obvious methods of communication (e.g., bleating loudly) would have been detected by humans long ago, and that would be no good at all for the sheep.

Studies 2 and 3 further supported Zoltan’s story. Study 2 demonstrated that when you taped sheep’s ears to their heads, they ceased to communicate entirely. You could put Rozalia and Erzsebet in adjacent enclosures and show Rozalia the Jack of Spades for three or four minutes at a time, and when you went to test Erzsebet, she still wouldn’t know the Jack of Spades from the Three of Diamonds. It was as if the sheep were blind! Except they weren’t blind, they were dumb. Zoltan knew; he had made them that way by taping their ears to their heads.

In Study 3, Zoltan found that when the sheep’s eyes were covered, they no longer coordinated ear twitching. Instead, they now coordinated their bleating—but only if you excluded bleats that were produced when the sheep’s heads were oriented downwards. “Fantastic,” he thought. “When you cover their eyes, they can’t see each other’s ears any more. So they use a vocal channel. This, again, makes good adaptive sense: communication is too important to eliminate entirely just because your eyes happen to be covered. Much better to incur a small risk of being detected and make yourself known in other, less subtle, ways.”

But the real clincher was Study 4, which confirmed that ear twitching occurred at a higher rate in smaller groups than larger groups, and was particularly common in dyads of well-adjusted sheep (like Anastasia and Orsolya, and definitely not like Zsa Zsa and Marianna).

“Sheep are like everyday people,” Zoltan told his sister on the phone. “They won’t say anything to your face in public, but get them one-on-one, and they won’t stop gossiping about each other.”

It was a compelling story, Zoltan conceded to himself. The only problem was the F test. The difference in twitch rates as a function of group size wasn’t quite statistically significant. Instead, it hovered around p = .07, which the textbooks told Zoltan meant that he was almost right. Almost right was the same thing as potentially wrong, which wasn’t good enough. So the next morning, Zoltan asked Gergely to lend him four sheep so he could increase his sample size.

“Absolutely not,” said Gergely. “I don’t want your sheep filling my sheep’s heads with all of your crazy new ideas.”

“Look,” said Zoltan. “If you lend me four sheep, I’ll let you drive my Cadillac down to the village on weekends after I get famous.”

“Deal,” said Gergely.

So Zoltan borrowed the sheep. But it turned out that four sheep weren’t quite enough; after adding Gergely’s sheep to the sample, the effect only went from p < .07 to p < .06. So Zoltan cut a deal with his other neighbor, Yuri: four of Yuri’s sheep for two days, in return for three days with Zoltan’s new Lexus (once he bought it). That did the trick. Once Zoltan repeated the experiment with Yuri’s sheep, the p-value for Study 2 now came to .046, which the textbooks assured Zoltan meant he was going to be famous.

Data in hand, Zoltan spent the next two weeks writing up his very first journal article. He titled it “Baa baa baa, or not: Sheep communicate via non-verbal channels”—a decidedly modest title for the first empirical work to demonstrate that sheep are capable of sophisticated propositional thought. The article was published to widespread media attention and scientific acclaim, and Zoltan went on to have a productive few years in animal behavioral research, studying topics as interesting and varied as giraffe calisthenics and displays of affection in the common leech.

Much later, it turned out that no one was able to directly replicate his original findings with sheep (though some other researchers did manage to come up with conceptual replications). But that didn’t really matter to Zoltan, because by then he’d decided science was too demanding a career anyway; it was way more fun to lay under trees counting his sheep. Counting sheep, and occasionally, on Saturdays, driving down to the village in his new Lexus,  just to impress all the young cowgirls.