The proposal of Reshef et. al. (“MIC“) is an interesting new approachÂ for discovering non-linear dependencies among pairs of measurementsÂ in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for someÂ alternatives over others, but as the authors know, there is no free lunchÂ in Statistics: tests which strive to have high power against all alternatives can have low power in many important situations.
They then report some simulation results clearly demonstrating that MIC is (very) underpowered relative to Pearson correlation in most situations, and performs even worse relative toÂ SzÃ©kely & Rizzo’s distance correlation (which I hadn’t heard about, but will have to look into now). I mentioned low power as a potential concern in my own post, but figured it would be an issue under relatively specific circumstances (i.e., only for certain kinds of associations in relatively small samples). Simon & Tibshirani’s simulations pretty clearly demonstrate that isn’t so. Which, needless to say, rather dampens the enthusiasm for the MIC statistic.