The experiment consists of drawing green and red balls at random, with the probabilities rigged so that greens occur 75 percent of the time. The subject is asked to watch for a while and then predict whether the next ball will be green or red. The rats followed the optimal strategy of always predicting green (I am a little unclear how the rats communicated, but never mind). But the human subjects did not always predict green, they usually want to do better and predict when red will come up too, engaging in reasoning like “after three straight greens, we are due for a red.” As Mlodinow says, “humans usually try to guess the pattern, and in the process we allow ourselves to be outperformed by a rat."
Unfortunately, spurious patterns show up in some important real world settings, like research on the effect of foreign aid on growth.research looks for an association between economic growth and some measure of foreign aid, controlling for other likely determinants of economic growth. Of course, since there is some random variation in both growth and aid, there is always the possibility that an association appears by pure chance. The usual statistical procedures are designed to keep this possibility small. The convention is that we believe a result if there is only a 1 in 20 chance that the result arose at random. So if a researcher does a study that finds a positive effect of aid on growth and it passes this “1 in 20” test (referred to as a “statistically significant” result), we are fine, right? Alas, not so fast. A researcher is very eager to find a result, and such eagerness usually involves running many statistical exercises (known as “regressions”). But the 1 in 20 safeguard only applies if you only did ONE regression. What if you did 20 regressions? Even if there is no relationship between growth and aid whatsoever, on average you will get one “significant result” out of 20 by design. Suppose you only report the one significant result and don’t mention the other 19 unsuccessful attempts. You can do twenty different regressions by varying the definition of aid, the time periods, and the control variables.
This practice is known as “data mining.” It is NOT acceptable practice, but this is very hard to enforce since nobody is watching when a researcher runs multiple regressions. It is seldom intentional dishonesty by the researcher. Because of our non-rat-like propensity to see patterns everywhere, it is easy for researchers to convince themselves that the failed exercises were just done incorrectly, and that they finally found the “real result” when they get the “significant” one. Even more insidious, the 20 regressions could be spread across 20 different researchers. Each of these obediently does only one pre-specified regression, 19 of whom do not publish a paper since they had no significant results, but the 20th one does publish their spuriously “significant” finding (this is known as “publication bias.”)So, there would be 20 researchers multiplied per 20 regressions each. That's 400 regression, of which only 1 was statistically significant and 399 were not. So then we have a published paper stating a significant relation and 399 unfairly unpublished.
But don’t give up on all damned lies and statistics, there ARE ways to catch data mining. A “significant result” that is really spurious will only hold in the original data sample, with the original time periods, with the original specification. If new data becomes available as time passes you can test the result with the new data, where it will vanish if it was spurious “data mining”. You can also try different time periods, or slightly different but equally plausible definitions of aid and the control variables.Unfortunately, journals are not keen to publish reviewing papers.