I have just started reading Nate Silver‘s book The Signal and the Noise. The book is divided roughly into halves. The first seven chapters diagnose some prediction problems while the final six explore and apply Bayes’s solution. It is always nice to see Bayesian statistics being applied to solve real-life problems, showing that its success goes beyond academia. As pointed out by Nicholas Chopin in his G+ account, 8 out of 10 papers in the Applications and Case Studies section of the latest issue of JASA were Bayesian.
Following are some interesting points mentioned in the Introduction of the book.
“You can see the computer age everywhere but in the productivity statistics.” – Robert Solow
The productivity paradox basically means that there is a gap between the time we experience a growth of information and the time we start benefiting from the progress produced by this increase of information. Besides, there are indications showing it is possible even to regress during this time period. One possible explanation for this phenomenon is that the amount of new information outpaces our ability to process it. For example,
- The explosion in information produced by the printing press brought us a lot of goods but it took 330 years for those advantages to take hold. There is a nice figure on the book illustrating this point. Even though it brought us a lot of good, it also changed the way in which we made mistakes. Routine errors of transcription became less common. But when there was a mistake, it would be reproduced many times over, as in the case of the Wicked Bible.
- Computer began to be used more commonly in laboratories and academic settings around 1970. It didn’t take 330 years for the increase in information technology to produce tangible benefit but it took 15-20. As said by Paul Krugman “The 1970s were the high point for vast amounts of theory applied to extremely small amounts of data”. The book shows for example that the amount of money per patent spent by the U.S. government actually increased during the 1970-1990 period. Capitalism and the Internet, both of which are incredibly efficient at propagating information, create the potential for bad ideas as well as good ones to spread. The bad ideas may produce disproportionate effects.
The new trend is now “Big Data”. It seems quite obvious that we are going to get a lot of benefit from it, but it can take a while. Working on the field of approximate inference, it is easy to notice that most of us still don’t know how to properly use these big amounts of data. A trade-off between computational time and accuracy needs to be taken into account. This is hard. To be honest, I believe that we are still lost even with moderate amounts of data. Variable selection, model selection, prediction, prior distributions, etc are still very hard topics to discuss. The implications of many of the practices used today are far from being completely understood. Adding more data to that will not necessarily help. At least not right away. It turns out that we have not been as successful in applied statistics as we like to think we are, which takes us to prediction failures.
Nate Silver talks about our love to predict things, and how we are not very good at it. For Popper, a hypothesis was not scientific unless it was falsifiable, meaning that it could be tested in the real world by means of a prediction. The book goes on to show that the few ideas we have tested aren’t doing so well, and many of our ideas have not or cannot be tested at all. Even though calling those ideas unscientific might be a little too much, the fact that the few theories we can test have produced quite poor results suggests that many of the ideas we haven’t tested are very wrong as well. One good example that illustrate well our weakness in extracting signal from noise information is the paper “Why Most Published Research Findings Are False” by John P. A. Ioannidis in 2005.
– Silver, N. 2012. The Signal and the Noise. The Penguin Press. (Introduction)
– Brynjolfsson, E. 1993. The productivity paradox of information technology. Communications of the ACM 36(12): 66–77.
– Ioannidis, J. 2005. Why most published research findings are false. PLoS medicine.
– Karl, P. 1934. The Logic of Scientific Discovery.