the perils of digging too deep

Another in a series of posts supposedly at the intersection of fiction and research methods, but mostly just an excuse to write ridiculous stories and pretend they have some sort of moral.


Dr. Rickles the postdoc looked a bit startled when I walked into his office. He was eating a cheese sandwich and watching a chimp on a motorbike on his laptop screen.

“YouTube again?” I asked.

“Yes,” he said. “It’s lunch.”

“It’s 2:30 pm,” I said, pointing to my watch.

“Still my lunch hours.”

Lunch hours for Rickles were anywhere from 11 am to 4 pm. It depended on exactly when you walked in on him doing something he wasn’t supposed to; that was the event that marked the onset of Lunch.

“Fair enough,” I said. “I just stopped by to see how things were going.”

“Oh, quite well.” said Rickles. “Things are going well. I just found a video of a chimp and a squirrel riding a motorbike together. They aren’t even wearing helmets! I’ll send you the link.”

“Please don’t. I don’t like squirrels. But I meant with work. How’s the data looking.”

He shot me a pained look, like I’d just caught him stealing video game money from his grandmother.

“The data are TERRIBLE,” he said in all capital letters.

I wasn’t terribly surprised at the revelation; I’d handed Rickles the dataset only three days prior, taking care not to  tell him it was the dataset from hell. Rickles was the fourth or fifth person in the line of succession; the data had been handed down from postdoc to graduate student to postdoc for several years now. Everyone in the lab wanted to take a crack at it when they first heard about it, and no one in the lab wanted anything to do with it once they’d taken a peek. I’d given it to Rickles in part to teach him a lesson; he’d been in the lab for several weeks now and somehow still seemed happy and self-assured.

“Haven’t found anything interesting yet?” I asked. “I thought maybe if you ran the Flimflan test on the A-trax, you might get an effect. Or maybe if you jimmied the cryptos on the Borgatron…”

“No, no,” Rickles interrupted, waved me off. “The problem isn’t that there’s nothing interesting in the data; it’s that there’s too MUCH stuff. There are too MANY results. The story is too COMPLEX.”

That didn’t compute for me, so I just stared at him blankly. No one ever found COMPLEX effects in my lab. We usually stopped once we found SIMPLE effects.

Rickles was unimpressed.

“You follow what I’m saying, Guy? There are TOO-MANY-EFFECTS. There’s too much going on in the data.”

“I don’t see how that’s possible,” I said. “Keith, Maria, and Lakshmi each spent weeks on this data and found nothing.”

“That,” said Rickles, “is because Keith, Maria, and Lakshmi never thought to apply the Epistocene Zulu transform to the data.”

The Epistocene Zulu transform! It made perfect sense when you thought about it; so why hadn’t I ever thought about it? Who was Rickles cribbing analysis notes from?

“Pull up the data,” I said excitedly. “I want to see what you’re talking about.”

“Alright, alright. Lunch hours are over now anyway.”

He grudgingly clicked on the little X on his browser. Then he pulled up a spreadsheet that must have had a million columns in it. I don’t know where they’d all come from; it had only had sixteen thousand or so when I’d had the hard drives delivered to his office.

“Here,” said Rickles, showing me the output of the Pear-sampled Tea test. “There’s the A-trax, and there’s its Nuffton index, and there’s the Zimming Range. Look at that effect. It’s bigger than the zifflon correlation Yehudah’s group reported in Nature last year.”

“Impressive,” I said, trying to look calm and collected. But in my head, I was already trying to figure out how I’d ask the department chair for a raise once this finding was published. Each point on that Zimming Range is worth at least $500, I thought.

“Are there any secondary analyses we could publish alongside that,” I asked.

“Oh, I don’t think you want to publish that,” Rickles laughed.

“Why the hell not? It could be big! You just said yourself it was a giant effect!”

“Oh sure. It’s a big effect. But I don’t believe it for one second.”

“Why not? What’s not to like? This finding make’s Yehudah’s paper look like a corn dog!”

I recognized, in the course of uttering those words, that they did not constitute the finest simile ever produced.

“Well, there are two massive outliers, for one. If you eliminate them, the effect is much smaller. And if you take into consideration the Gupta skew because the data were collected with the old reverberator, there’s nothing left at all.”

“Okay, fine,” I muttered. “Is there anything else in the data?”

“Sure, tons of things. Like, for example, there’s a statistically significant gamma reduction.”

“A gamma reduction? Are you sure? Or do you mean beta,” I asked.

“Definitely gamma,” said Rickles. “There’s nothing in the betas, deltas, or thetas. I checked.”

“Okay. That sounds potentially interesting and publishable. But I bet you’re going to tell me why we shouldn’t believe that result, either, right?”

“Well,” said Rickles, looking a bit self-conscious, “it’s just that it’s a pretty fine-grained analysis; you’re not really leaving a lot of observations when you slice it up that thin. And the weird thing about the gamma reduction is that it is essentially tantamount to accepting a null effect; this was Jayaraman’s point in that article in Statistica Splenda last month.”

“Sure, the Gerryman article, right. I read that. Forget the gamma reduction. What else?”

“There are quite a few schweizels,” Rickles offered, twisting the cap off a beer that had appeared out of the minibar under his desk.

I looked at him suspiciously. I suspected it was a trap; Rickels knew how much I loved Schweizel units. But I still couldn’t resist. I had to know.

“How many schweizels are there,” I asked, my hand clutching at the back of a nearby chair to help keep me steady.

“Fourteen,” Rickles said matter-of-factedly.

“Fourteen!” I gasped. “That’s a lot of schweizels!”

“It’s not bad,” said Rickles. “But the problem is, if you look at the B-trax, they also have a lot of schweizels. Seventeen of them, actually.”

“Seventeen schweizels!” I exclaimed. “That’s impossible! How can there be so many Schweizel units in one dataset!”

“I’m not sure. But… I can tell you that if you normalize the variables based on the Smith-Gill ratio, the effect goes away completely.”

There it was; the sound of the other shoe dropping. My heart gave a little cough–not unlike the sound your car engine makes in the morning when it’s cold and it wants you to stop provoking it and go back to bed. It was aggravating, but I understood what Rickles was saying. You couldn’t really say much about the Zimming Range unless your schweizel count was properly weighted. Still, I didn’t want to just give up on the schweizels entirely. I’d spent too much of my career delicately massaging schweizels to give up without one last tug.

“Maybe we can just say that the A-trax/Nuffton relationship is non-linear?” I suggested.

“Non-linear?” Rickles snorted. “Only if by non-linear you mean non-real! If it doesn’t survive Smith-Gill, it’s not worth reporting!”

I grudgingly conceded the point.

“What about the zifflons? Have you looked at them at all? It wouldn’t be so novel given Yehudah’s work, but we might still be able to get it into some place like Acta Ziffletica if there was an effect…”

“Tried it. There isn’t really any A-trax influence on zifflons. Or a B-trax effect, for that matter. There is a modest effect if you generate the Mish component for all the trax combined and look only at that. But that’s a lot of trax, and we’re not correcting for multiple Mishing, so I don’t really trust it…”

I saw that point too, and was now nearing despondency. Rickles had shot down all my best ideas one after the other. I wondered how I’d convince the department chair to let me keep my job.

Then it came to me in a near-blinding flash of insight. Near blinding, because I smashed my forehead on the overhead chandelier jumping out of my chair. An inch lower, and I’d have lost both eyes.

“We need to get that chandelier replaced,” I said, clutching my head in my hands. “It has no business hanging around in an office like this.”

“We need to get it replaced,” Rickles agreed. “I’ll do it tomorrow during my lunch hours.”

I knew that meant the chandelier would be there forever–or at least as long as Rickles inhabited the office.

“Have you tried counting the Dunams,” I suggested, rubbing my forehead delicately and getting back to my brilliant idea.

“No,” he said, leaning forward in his chair slightly. “I didn’t count Dunams.”

Ah-hah! I thought to myself. Not so smart are we now! The old boy’s still got some tricks up his sleeve.

“I think you should count the Dunams,” I offered sagely. “That always works for me. I do believe it might shed some light on this problem.”

“Well…” said Rickles, shaking his head slightly, “maaaaaybe. But Li published a paper in Psykometrika last year showing that Dunam counting is just a special case of Klein’s occidental protrusion method. And Klein’s method is more robust to violations of normality. So I used that. But I don’t really know how to interpret the results, because the residual is negative.”

I really had no idea either. I’d never come across a negative Dunam residual, and I’d never even heard of occidental protrusion. As far as I was concerned, it sounded like a made-up method.

“Okay,” I said, sinking back into my chair, ready to give up. “You’re right. This data… I don’t know. I don’t know what it means.”

I should have expected it, really; it was, after all, the dataset from hell. I was pretty sure my old RA had taken a quick jaunt through purgatory every morning before settling into the bench to run some experiments.

“I told you so,” said Rickles, putting his feet up on the desk and handing me a beer I didn’t ask for. “But don’t worry about it too much. I’m sure we’ll figure it out eventually. We probably just haven’t picked the right transformation yet. There’s Nordstrom, El-Kabir, inverse Zulu…”

He turned to his laptop and double-clicked an icon on the desktop that said “YouTube”.

“…or maybe you can just give the data to your new graduate student when she starts in a couple of weeks,” he said as an afterthought.

In the background, a video of a chimp and a puppy driving a Jeep started playing on a discolored laptop screen.

I mulled it over. Should I give the data to Josephine? Well, why not? She couldn’t really do any worse with it, and it would be a good way to break her will quickly.

“That’s not a bad idea, Rickles,” I said. “In fact, I think it might be the best idea you’ve had all week. Boy, that chimp is a really aggressive driver. Don’t drive angry, chimp! You’ll have an accid–ouch, that can’t be good.”

The

perils of digging too deep

Dr. Rickles the postdoc looked a bit startled when I walked into his office. He was eating a cheese sandwich and watching a chimp on a motorbike on his laptop screen.
“YouTube again?” I asked.
“Yes,” he said. “It’s lunch.”
“It’s 2:30 pm,” I said, pointing to my watch.
“Still my lunch hours.”
Lunch hours for Rickles were anywhere from 11 am to 4 pm. It depended on exactly when you walked in on him doing something he wasn’t supposed to; that was the event that marked the onset of Lunch.
“Fair enough,” I said. “I just stopped by to see how things were going.”
“Oh, quite well.” said Rickles. “Things are going well. I just found a video of a chimp and a squirrel riding a motorbike together. They aren’t even wearing helmets! I’ll send you the link.”
“Please don’t. I don’t like squirrels. But I meant with work. How’s the data looking.”
He shot me a pained look, like I’d just caught him stealing video game money from his grandmother.
“The data are TERRIBLE,” he said in all capital letters.
I wasn’t terribly surprised at that revelation; I’d handed Rickles the dataset only three days prior, taking care not to  tell him it was the dataset from hell. Rickles was the fourth or fifth person in the line of succession; the data had been handed down from postdoc to graduate student to postdoc for several years now. Everyone in the lab wanted to take a crack at it when they first heard about it, and no one in the lab wanted anything to do with it once they’d taken a peek. I’d given it to Rickles in part to teach him a lesson; he’d been in the lab for several weeks now and somehow still seemed happy and self-assured.
“Haven’t found anything interesting yet?” I asked. “I thought maybe if you ran the Flimflan test on the A-trax, you might get an effect. Or maybe if you jimmied the cryptos on the Borgatron…”
“No, no,” Rickles interrupted, waved me off. “The problem isn’t that there’s nothing interesting in the data; it’s that there’s too MUCH stuff. There are too MANY results. The story is too COMPLEX.”
That didn’t compute for me, so I just stared at him blankly. No one ever found COMPLEX effects in my lab. We usually stopped once we found SIMPLE effects.
Rickles was unimpressed.
“You follow what I’m saying, Guy? There are TOO-MANY-EFFECTS. There’s too much going on in the data.”
“I don’t see how that’s possible,” I said. “Keith, Maria, and Lakshmi each spent weeks on this data and found *nothing*.”
“That,” said Rickles, “is because Keith, Maria, and Lakshmi never thought to apply the Epistocene Zulu transform to the data.”
The Epistocene Zulu transform! It made perfect sense when you thought about it; so why hadn’t I ever thought about it? Who was Rickles cribbing analysis notes from?
“Pull up the data,” I said excitedly. “I want to see what you’re talking about.”
“Alright, alright. Lunch hours are over now anyway.”
He grudgingly clicked on the little X on his browser. Then he pulled up a spreadsheet that must have had a million columns in it. I don’t know where they’d all come from; it had only had sixteen thousand or so when I’d had the hard drives delivered to his office.
“Here,” said Rickles, showing me the output of the Pear-sampled Tea test. “There’s the A-trax, and there’s its Nuffton index, and there’s the Zimming Range. Look at that effect. It’s bigger than the zifflon correlation Yehudah’s group reported in Nature last year.”
“Impressive,” I said, trying to look calm and collected. But in my head, I was already trying to figure out how I’d ask the department chair for a raise once this finding was published. *Each point on that Zimming Range is worth at least $500*, I thought.
“Are there any secondary analyses we could publish alongside that,” I asked.
“Oh, I don’t think you want to publish *that*,” Rickles laughed.
“Why the hell not? It could be big! You just said yourself it was a giant effect!”
“Oh *sure*. It’s a big effect. But I don’t believe it for one second.”
“Why not? What’s not to like? This finding make’s Yehudah’s paper look like a corn dog!”
I recognized, in the course of uttering those words, that they did not constitute the finest simile ever.
“Well, there are two massive outliers, for one. If you eliminate them, the effect is much smaller. And if you take into consideration the Gupta skew because the data were collected with the old reverberator, there’s nothing left at all.”
“Okay, fine,” I muttered. “Is there anything else in the data?”
“Sure, tons of things. Like, for example, there’s a statistically significant Gamma reduction.”
“A gamma reduction? Are you sure? Or do you mean Beta,” I asked.
“Definitely gamma,” said Rickles. “There’s nothing in the betas, deltas, or thetas. I looked.”
“Okay. That sounds potentially interesting and publishable. But I bet you’re going to tell me why we shouldn’t believe that result, either, right?”
“Well,” said Rickles, looking a bit self-conscious, “it’s just that it’s a pretty fine-grained analysis; you’re not really leaving a lot of observations when you slice it up that thin. And the weird thing about the gamma reduction is that it is essentially tantamount to accepting a null effect; this was Jayaraman’s point in that article in *Statistica Splenda* last month.”
“Sure, the Gerryman article, right. Okay. Forget the gamma reduction. What else?”
“There are quite a few Schweizels,” Rickles offered, twisting the cap off a beer that had appeared out of the minibar under his desk.
I looked at him suspiciously. I suspected it was a trap; Rickels knew how much I loved Schweizel units. But I still couldn’t resist. I had to know.
“How many Schweizels are there,” I asked, my hand clutching at the back of a nearby chair to help me stay upright.
“Fourteen,” Rickles said matter-of-factedly.
“Fourteen!” I gasped. “That’s a lot of Schweizels!”
“It’s not bad,” said Rickles. “But the problem is, if you look at the B-trax, they also have a lot of Schweizels. Seventeen of them, actually.”
“Seventeen Schweizels!” I exclaimed. “That’s impossible! How can there be so many Schweizel units in one dataset!”
“I’m not sure. But… I can tell you that if you normalize the variables based on the Smith-Gill ratio, the effect goes away completely.”
There it was; the sound of the other shoe dropping. My heart gave a little cough–not unlike the sound your car engine makes in the morning when it’s cold and it wants you to go back to bed and stop stressing it out. It was aggravating, but I understood what Rickles was saying. You couldn’t really say much about the Zimming Range unless your Schweizel count was properly weighted. Still, I didn’t want to just give up on the Schweizels entirely.
“Maybe we can just say that the A-trax/Nuffton relationship is non-linear,” I proposed.
“Non-linear?” Rickles snorted. “Only if by non-linear you mean non-real! If it doesn’t survive Smith-Gill, it’s not worth reporting!”
I grudgingly conceded the point.
“What about the zifflons? Have you looked at them at all? It wouldn’t be so novel given Yehudah’s work, but we might still be able to get it into some place like *Acta Ziffletica* if there was an effect…”
“Tried it. There isn’t really any A-trax influence on zifflons. Or a B-trax effect, for that matter. There *is* a modest effect if you generate the Mish component for all the trax combined and look only at that. But that’s a lot of trax, and we’re not correcting for multiple Mishing, so I don’t really trust it…”
I saw that point too, and was now nearing despondency. Rickles had shot down all my best ideas one after the other. What else was left?
Then it came to me in a near-blinding flash of insight. *Near* blinding, because I smashed my forehead on the overhead chandelier jumping out of my chair. An inch lower, and I’d have lost both eyes.
“We need to get that chandelier replaced,” I said, clutching my head in my hands. “It has no business hanging around in an office like this.”
“We need to get it replaced,” Rickles agreed. “I’ll do it tomorrow during my lunch hours.”
I knew that meant the chandelier would be there forever–or at least as long as Rickles inhabited the office.
“Have you tried counting the Dunams,” I suggested, rubbing my forehead delicately and getting back to my brilliant idea.
“No,” he said, leaning forward in his chair slightly. “I didn’t count Dunams.”
Ah-hah! I thought to myself. Not so smart are we now! The old boy’s still got some tricks up his sleeve.
“I think you should count the Dunams,” I offered sagely. “That always works for me. I do believe it might shed some light on this problem.”
“Well…” said Rickles, shaking his head slightly, “maaaaaybe. But Li published a paper in Psychometrika last year showing that Dunam counting is just a special case of Klein’s occidental protrusion method. And Klein’s method is more robust to violations of normality. So I used that. But I don’t really know how to interpret the results, because the residual is *negative*.”
I really had no idea either. I’d never come across a negative Dunam residual, and I’d never even heard of occidental protrusion. As far as I was concerned, it sounded like a made-up method.
“Okay,” I said, sinking back into my chair, ready to give up. “You’re right. This data… I don’t know. I don’t know what it means.” I should have expected it, really; it was, after all, the dataset from hell. I was pretty sure my old RA had collected it after taking a quick jaunt through purgatory every morning.
“I told you so,” said Rickles, putting his feet up on the desk and handing me a beer I didn’t ask for. “But don’t worry about it too much. I’m sure we’ll figure it out eventually. We probably just haven’t picked the right transformation yet.”
He turned to his laptop and double-clicked an icon on the desktop that said “YouTube”.
“Maybe you can give the data to your new graduate student when she starts in a couple of weeks,” he said as an afterthought.
In the background, a video of a chimp and a puppy driving a Jeep started playing on a discolored laptop screen.
I mulled it over. Should I give the data to Josephine? Well, why not? She couldn’t really do any *worse* with it, and it *would* be a good way to break her will in a hurry.
“That’s not a bad idea, Rickles,” I said. “In fact, I think it might be the best idea you’ve had all week. Boy, that chimp is a really aggressive driver. Don’t drive angry, chimp! You’ll have an accid–ouch, that can’t be good.”