no free lunch in statistics

Simon and Tibshirani recently posted a short comment on the Reshef et al MIC data mining paper I blogged about a while back:

The proposal of Reshef et. al. (“MIC“) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no free lunch in Statistics: tests which strive to have high power against all alternatives can have low power in many important situations.

They then report some simulation results clearly demonstrating that MIC is (very) underpowered relative to Pearson correlation in most situations, and performs even worse relative to Székely & Rizzo’s distance correlation (which I hadn’t heard about, but will have to look into now). I mentioned low power as a potential concern in my own post, but figured it would be an issue under relatively specific circumstances (i.e., only for certain kinds of associations in relatively small samples). Simon & Tibshirani’s simulations pretty clearly demonstrate that isn’t so. Which, needless to say, rather dampens the enthusiasm for the MIC statistic.

One thought on “no free lunch in statistics”

Leave a Reply