Hotz, Robert Lee. “The Really Smart Phone.” Wall Street Journal, April 22, 2011 (http://online.wsj.com/article/SB10001424052748704547604576263261679848814.html).
Hutchins, John. “The First Public Demonstration of Machine Translation: The Georgetown-IBM System, 7th January 1954.” November 2005 (http://www.hutchinsweb.me.uk/GU-IBM-2005.pdf).
Inglehart, R., and H. D. Klingemann. Genes, Culture and Happiness. MIT Press, 2000.
Isaacson, Walter. Steve Jobs. Simon and Schuster, 2011.
Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011.
Kaplan, Robert S., and David P. Norton. Strategy Maps: Converting Intangible Assets into Tangible Outcomes. Harvard Business Review Press, 2004.
Karnitschnig, Matthew, and Mylene Mangalindan. “AOL Fires Technology Chief After Web-Search Data Scandal.” Wall Street Journal, August 21, 2006.
Keefe, Patrick Radden. “Can Network Theory Thwart Terrorists?” New York Times, March 12, 2006 (http://www.nytimes.com/2006/03/12/magazine/312wwln_essay.html).
Kinnard, Douglas. The War Managers. University Press of New England, 1977.
Kirwan, Peter. “This Car Drives Itself.” Wired UK, January 2012 (http://www.wired.co.uk/magazine/archive/2012/01/features/this-car-drives-itself).
Kliff, Sarah. “A Database That Could Revolutionize Health Care.” Washington Post, May 21, 2012.
Kruskal, William, and Frederick Mosteller. “Representative Sampling, IV: The History of the Concept in Statistics, 1895–1939.” International Statistical Review 48 (1980), pp. 169–195.
Laney, Doug. “To Facebook You’re Worth $80.95.” Wall Street Journal, May 3, 2012 (http://blogs.wsj.com/cio/2012/05/03/to-facebook-youre-worth-80-95/).
Latour, Bruno. The Pasteurization of France. Harvard University Press, 1993.
Levitt, Steven D., and Stephen J. Dubner. Freakonomics: A Rogue Economist Explores the Hidden Side of Everything. William Morrow, 2009.
Levy, Steven. In the Plex. Simon and Schuster, 2011.
Lewis, Charles Lee. Matthew Fontaine Maury: The Pathfinder of the Seas. U.S. Naval Institute, 1927.
Lohr, Steve. “Can Apple Find More Hits Without Its Tastemaker?” New York Times, January 18, 2011, p. B1 (http://www.nytimes.com/2011/01/19/technology/companies/19innovate.html).
Lowrey, Annie. “Economists’ Programs Are Beating U.S. at Tracking Inflation.” Washington Post, December 25, 2010 (http://www.washingtonpost.com/wp-dyn/content/article/2010/12/25/AR2010122502600.html).
Macrakis, Kristie. Seduced by Secrets: Inside the Stasi’s Spy-Tech World. Cambridge University Press, 2008.
Manyika, James, et al. “Big Data: The Next Frontier for Innovation, Competition, and Productivity.” McKinsey Global Institute, May 2011 (http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation).
Marcus, James. Amazonia: Five Years at the Epicenter of the Dot.Com Juggernaut. The New Press, 2004.
Margolis, Joel M. “When Smart Grids Grow Smart Enough to Solve Crimes.” Neustar, March 18, 2010 (http://energy.gov/sites/prod/files/gcprod/documents/Neustar_Comments_DataExhibitA.pdf).
Maury, Matthew Fontaine. The Physical Geography of the Sea. Harper, 1855.
Mayer-Schönberger, Viktor. “Beyond Privacy, Beyond Rights: Towards a ‘Systems’ Theory of Information Governance.” 98 California Law Review 1853 (2010).
———. Delete: The Virtue of Forgetting in the Digital Age. Princeton University Press, 2nd ed., 2011.
McGregor, Carolyn, Christina Catley, Andrew James, and James Padbury. “Next Generation Neonatal Health Informatics with Artemis.” In European Federation for Medical Informatics, User Centred Networked Health Care, ed. A. Moen et al. (IOS Press, 2011), p. 117 et seq.
McNamara, Robert S., with Brian VanDeMark. In Retrospect: The Tragedy and Lessons of Vietnam. Random House, 1995.
Mehta, Abhishek. “Big Data: Powering the Next Industrial Revolution.” Tableau Software White Paper, 2011.
Michel, Jean-Baptiste, et al. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science 331 (January 14, 2011), pp. 176–182 (http://www.sciencemag.org/content/331/6014/176.abstract).
Miller, Claire Cain. “U.S. Clears Google Acquisition of Travel Software.” New York Times, April 8, 2011 (http://www.nytimes.com/2011/04/09/technology/09google.html?_r=0).
Mills, Howard. “Analytics: Turning Data into Dollars.” Forward Focus, December 2011 (http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/FSI/US_FSI_Forward%20Focus_Analytics_Turn ing%20data%20into%20dollars_120711.pdf).
Mindell, David A. Digital Apollo: Human and Machine in Spaceflight. MIT Press, 2008.
Minkel, J. R. “The U.S. Census Bureau Gave Up Names of Japanese-Americans in WW II.” Scientific American, March 30, 2007 (http://www.scientificamerican.com/article.cfm?id=confirmed-the-us-census-b).
Murray, Alexander. Reason and Society in the Middle Ages. Oxford University Press, 1978.
Nalimov, E. V., G. McC. Haworth, and E. A. Heinz. “Space-Efficient Indexing of Chess Endgame Tables.” ICGA Journal 23, no. 3 (2000), pp. 148–162.
Narayanan, Arvind, and Vitaly Shmatikov. “How to Break the Anonymity of the Netflix Prize Dataset.” October 18, 2006, arXiv:cs/0610105 (http://arxiv.org/abs/cs/0610105).
———. “Robust De-Anonymization of Large Sparse Datasets.” Proceedings of the 2008 IEEE Symposium on Security and Privacy, p. 111 (http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf).
Nazareth, Rita, and Julia Leite. “Stock Trading in U.S. Falls to Lowest Level Since 2008.” Bloomberg, August 13, 2012 (http://www.bloomberg.com/news/2012-08-13/stock-trading-in-u-s-hits-lowest-level-since-2008-as-vix-falls.html).
Negroponte, Nicholas. Being Digital. Alfred Knopf, 1995.
Neyman, Jerzy. “On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection.” Journal of the Royal Statistical Society 97, no. 4 (1934), pp. 558–625.
Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” 57 UCLA Law Review 1701 (2010).
Onnela, J. P., et al. “Structure and Tie Strengths in Mobile Communication Networks.” Proceedings of the National Academy of Sciences of the United States of America (PNAS) 104 (May 2007), pp. 7332–36 (http://nd.edu/~dddas/Papers/PNAS0610245104v1.pdf).
Palfrey, John, and Urs Gasser. Interop: The Promise and Perils of Highly Interconnected Systems. Basic Books, 2012.
Pearl, Judea. Causality: Models, Reasoning and Inference, 2nd ed. Cambridge University Press, 2009.
President’s Council of Advisors on Science and Technology. “Report to the President and Congress, Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology.” December 2010 (http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-nitrd-report-2010.pdf).
Priest, Dana and William Arkin. “A Hidden World, Growing Beyond Control.” Washington Post, July 19, 2010 (http://projects.washingtonpost.com/top-secret-america/articles/a-hidden-world-growing-beyond-control/print/).
Query, Tim. “Grade Inflation and the Good-Student Discount.” Contingencies Magazine, American Academy of Actuaries, May-June 2007 (http://www.contingencies.org/mayjun07/tradecraft.pdf).
Quinn, Elias Leake. “Smart Metering and Privacy: Existing Law and Competing Policies; A Report for the Colorado Public Utility Commission.” Spring 2009 (http://www.w4ar.com/Danger_of_Smart_Meters_Colorado_Report.pdf).
Reshef, David, et al. “Detecting Novel Associations in Large Data Sets.” Science (2011), pp. 1518–24.
Rosenthal, Jonathan. “Banking Special Report.” The Economist, May 19, 2012, pp. 7–8.
Rosenzweig, Phil. “Robert S. McNamara and the Evolution of Modern Management.” Harvard Business Review, December 2010, pp. 87–93 (http://hbr.org/2010/12/robert-s-mcnamara-and-the-evolution-of-modern-management/ar/pr).
Rudin, Cynth
ia, et al. “21st-Century Data Miners Meet 19th-Century Electrical Cables.” Computer, June 2011, pp. 103–105.
———. “Machine Learning for the New York City Power Grid.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34.2 (2012), pp. 328–345 (http://hdl.handle.net/1721.1/68634).
Rys, Michael. “Scalable SQL.” Communications of the ACM, June 2011, 48, pp. 48–53.
Salathé, Marcel, and Shashank Khandelwal. “Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control.” PlOS Computational Biology 7, no. 10 (October 2011).
Savage, Mike, and Roger Burrows. “The Coming Crisis of Empirical Sociology.” Sociology 41 (2007), pp. 885–899.
Schlie, Erik, Jörg Rheinboldt, and Niko Waesche. Simply Seven: Seven Ways to Create a Sustainable Internet Business. Palgrave Macmillan, 2011.
Scanlon, Jessie. “Luis von Ahn: The Pioneer of ‘Human Computation.’” Businessweek, November 3, 2008 (http://www.businessweek.com/stories/2008-11-03/luis-von-ahn-the-pioneer-of-human-computation-businessweek-business-news-stock-market-and-financial-advice).
Scism, Leslie, and Mark Maremont. “Inside Deloitte’s Life-Insurance Assessment Technology.” Wall Street Journal, November 19, 2010 (http://on line.wsj.com/article/SB10001424052748704104104575622531084755588.html).
———. “Insurers Test Data Profiles to Identify Risky Clients.” Wall Street Journal, November 19, 2010 (http://online.wsj.com/article/SB10001424052748704648604575620750998072986.html).
Scott, James. Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Yale University Press, 1998.
Seltzer, William, and Margo Anderson. “The Dark Side of Numbers: The Role of Population Data Systems in Human Rights Abuses.” Social Research 68 (2001) pp. 481–513.
Silver, Nate. The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t. Penguin, 2012.
Singel, Ryan. “Netflix Spilled Your Brokeback Mountain Secret, Lawsuit Claims.” Wired, December 17, 2009 (http://www.wired.com/threatlevel/2009/12/netflix-privacy-lawsuit/).
Smith, Adam. The Wealth of Nations (1776). Reprinted Bantam Classics, 2003. A free electronic version is available (http://www2.hn.psu.edu/faculty/jmanis/adam-smith/Wealth-Nations.pdf).
Solove, Daniel J. The Digital Person: Technology and Privacy in the Information Age. NYU Press, 2004.
Surowiecki, James. “A Billion Prices Now.” New Yorker, May 30, 2011 (http://www.newyorker.com/talk/financial/2011/05/30/110530ta_talk_surowiecki).
Taleb, Nassim Nicholas. Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets. Random House, 2008.
———. The Black Swan: The Impact of the Highly Improbable. 2nd ed., Random House, 2010.
Thompson, Clive. “For Certain Tasks, the Cortex Still Beats the CPU.” Wired, June 25, 2007 (http://www.wired.com/techbiz/it/magazine/15-07/ff_humancomp?currentPage=all).
Thurm, Scott. “Next Frontier in Credit Scores: Predicting Personal Behavior.” Wall Street Journal, October 27, 2011 (http://online.wsj.com/article/SB10001424052970203687504576655182086300912.html).
Tsotsis, Alexia. “Twitter Is at 250 Million Tweets per Day, iOS 5 Integration Made Signups Increase 3x.” TechCrunch, October 17, 2011 (http://techcrunch.com/2011/10/17/twitter-is-at-250-million-tweets-per-day/).
Valery, Nick. “Tech.View: Cars and Software Bugs.” The Economist, May 16, 2010 (http://www.economist.com/blogs/babbage/2010/05/techview_cars_and_software_bugs).
Vlahos, James. “The Department Of Pre-Crime.” Scientific American 306 (January 2012), pp. 62–67.
Von Baeyer, Hans Christian. Information: The New Language of Science. Harvard University Press, 2005.
von Ahn, Luis, et al. “reCAPTCHA: Human-Based Character Recognition via Web Security Measures.” Science 321 (September 12, 2008), pp. 1465–68 (http://www.sciencemag.org/content/321/5895/1465.abstract).
Watts, Duncan. Everything Is Obvious Once You Know the Answer: How Common Sense Fails Us. Atlantic, 2011.
Weinberger, David. Everything Is Miscellaneous: The Power of the New Digital Disorder. Times, 2007.
Weinberger, Sharon. “Intent to Deceive.” Nature 465 (May 2010), pp. 412–415 (http://www.nature.com/news/2010/100526/full/465412a.html).
———. “Terrorist ‘Pre-crime’ Detector Field Tested in United States.” Nature, May 27, 2011 (http://www.nature.com/news/2011/110527/full/news.2011.323.html).
Whitehouse, David. “UK Science Shows Cave Art Developed Early.” BBC News Online, October 3, 2001 (http://news.bbc.co.uk/1/hi/sci/tech/1577421.stm).
Wigner, Eugene. “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” Communications on Pure and Applied Mathematics 13, no. 1 (1960), pp. 1–14.
Wilks, Yorick. Machine Translation: Its Scope and Limits. Springer, 2008.
Wingfield, Nick. “Virtual Products, Real Profits: Players Spend on Zynga’s Games, but Quality Turns Some Off.” Wall Street Journal, September 9, 2011 (http://online.wsj.com/article/SB10001424053111904823804576502442835413446.html).
Acknowledgments
We both have been fortunate to work with and learn from an early giant in the field of information networks and innovation, Lewis M. Branscomb. His intellect, eloquence, energy, professionalism, wit, and never-ending curiosity continue to inspire us. And to his congenial and wise partner, Connie Mullin, we apologize for not heeding her suggestion to call the book “Superdata.”
Momin Malik has been an excellent research assistant with his exceptional intellect and industriousness. We have the privilege of being represented by Lisa Adams and David Miller of Garamond Agency, who have simply been superb in every aspect. Eamon Dolan, our editor, has been phenomenal—a representative of the rare breed of editors who have an almost perfect sense of how to edit text and challenge our thinking, so that the result is much better than we ever could have hoped for. We thank everyone at Houghton Mifflin Harcourt, in particular Beth Burleigh Fuller and Ben Hyman. Also, Camille Smith for her expert copyediting. We are grateful to James Fransham of The Economist for his excellent fact-checking and shrewd criticisms of the manuscript.
We are especially thankful to all those big-data practitioners who spent time explaining their work, notably Oren Etzioni, Cynthia Rudin, Carolyn McGregor, and Mike Flowers.
For Viktor’s individual acknowledgments: I thank Philip Evans, who is always thinking two steps ahead and expressing his ideas with precision and eloquence, for conversations spanning more than a decade.
I am also grateful to my former colleague David Lazer, who has been an early and strong big-data academic, and whose counsel I have sought many times.
I thank the participants of the 2011 Oxford Digital Data Dialogue (which focused on big data), and especially its co-chair Fred Cate, for most valuable discussions.
The Oxford Internet Institute, where I work, offered just the right environment for this book, with so many of my colleagues engaged in big-data research. I could not think of a better place to have written it. I also acknowledge with gratitude the support of Keble College, where I am a professorial fellow. Without that support, I would not have gotten access to some of the important primary sources used in the book.
The family always pays the biggest toll when one is writing a book. It is not only the many hours I have spent in front of the computer screen, away in the office, but also the many, many hours I have been physically present but lost in thought for which I need to ask forgiveness from my wife Birgit and from little Viktor. I promise I will try harder.
As for Kenn’s individual acknowledgments: I am grateful to many great data scientists who helped, in particular Jeff Hammerbacher, Amr Awadallah, DJ Patil, Michael Driscoll, Michael Freed, and many folks at Google over the years (including Hal Varian, Jeremy Ginsberg, Peter Norvig, and Udi Manber, among others, while all-too-brief chats with Eric Schmidt and Larry Page were invaluable).
My thinking has been enriched by Tim O’Reilly, a
savant of the Internet age. Also by Marc Benioff of Salesforce.com, who has been a teacher. Matthew Hindman’s insights were immeasurable, as always. James Guszcza of Deloitte was incredibly helpful, as was Geoff Hyatt, an old friend and serial data entrepreneur. Special thanks go to Pete Warden, who is both a philosopher and a practitioner of big data.
Many friends offered ideas and advice, including John Turner, Angelika Wolf, Niko Waesche, Katia Verresen, David Wishart, Anna Petherick, Blaine Harden and Jessica Kowal. Others who inspired themes in the book include Blaise Aguera y Arcas, Eric Horvitz, David Auerbach, Gil Elbaz, Tyler Bell, Andrew Wyckoff and many others at the OECD, Stephen Brobst and the team at Teradata, Anthony Goldbloom and Jeremy Howard at Kaggle, Edd Dumbill, Roger Magoulas and the team at O’Reilly Media, and Edward Lazowska. James Cortada is pantheonic. Thanks also to Ping Li of Accel Partners and Roger Ehrenberg of IA Ventures.
At The Economist, my colleagues offered tremendous ideas and support. I particularly thank my editors Tom Standage, Daniel Franklin, and John Micklethwait, as well as Barbara Beck, who edited the special report “Data, Data Everywhere,” which was the genesis of this book. Henry Tricks and Dominic Zeigler, my colleagues in Tokyo, were role models for always seeking out the novel and expressing it beautifully. Oliver Morton provided his customary wisdom when it was most needed.
The Salzburg Global Seminar in Austria offered the perfect combination of idyllic repose and intellectual inquisition that helped me write and think. An Aspen Institute roundtable in July 2011 sparked many ideas, for which I thank the participants and the organizer, Charlie Firestone. Also, my appreciation goes to Teri Elniski for her tremendous support.
Frances Cairncross, the Rector of Exeter College, Oxford, offered a tranquil place to stay and great encouragement. It is humbling to fix one’s mind upon questions of technology and society that build on those she raised a decade and a half earlier in The Death of Distance, a work that inspired me as a young journalist. It was satisfying to cross the Exeter courtyard each morning knowing that I might pass along a torch she carried, though the flame burned so much more brightly in her hands.
Big Data: A Revolution That Will Transform How We Live, Work, and Think Page 26