Big Data: A Revolution That Will Transform How We Live, Work, and Think
Page 27
My deepest appreciation goes to my family, who put up with me—or more commonly, with my absence. My parents, sister, and other relatives deserve thanks, but I reserve most of my gratitude for my wife Heather and our children Charlotte and Kaz, without whose support, encouragement, and ideas this book would not have been possible.
Both of us are grateful to so many people who have discussed the theme of big data with us, long before the term was even popularized. On that note, we reserve our particular thanks to the participants over the years at the Rueschlikon Conference on Information Policy, which Viktor co-organized and where Kenn was the rapporteur. We especially thank Joseph Alhadeff, Bernard Benhamou, John Seely Brown, Herbert Burkert (who introduced us to Commodore Maury), Peter Cullen, Ed Felten, Urs Gasser, Joi Ito, Jeff Jonas, Nicklas Lundblad, Douglas Merrill, Rick Murray, Cory Ondrejka, and Paul Schwartz.
Oxford/London, August 2012
abacuses, [>]
Accenture, [>], [>]
accountability, individual: big data and, [>]–[>], [>]
Acxiom, [>], [>]
Etzioni analyzes fare pricing patterns, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>]
flight delay predictions, [>]–[>], [>]–[>]
AirSage, [>], [>]
“algorithmists,” [>]–[>]
algorithms, computer, [>]–[>], [>]
improvements in, [>]–[>]
transparency of, [>]
Alta Vista, [>]
Amazon, [>], [>], [>], [>], [>], [>], [>]
as big-data company, [>]
book reviews at, [>]–[>], [>]
collaborative filtering at, [>]–[>], [>]
data-reuse by, [>], [>]
and e-books, [>]–[>]
Amazonia (Marcus), [>]
ancient world: record-keeping in, [>]–[>]
Anderson, Chris: on “end of theory,” [>]–[>]
anonymization: big data defeats, [>]–[>], [>]
of data, [>], [>]–[>]
privacy and, [>]–[>]
antitrust regulation: big data and, [>]–[>]
AOL: fails to understand data-reuse, [>]
releases personal data, [>]–[>]
Apple, [>], [>]
and cell phone data, [>]
Arabic numerals, [>]–[>]
Arnold, Thelma, [>]–[>]
artificial intelligence: big data and, [>]–[>]
at Google, [>]
Asthmapolis, [>]
astronomy: big data in, [>]
automobiles: anti-theft systems, [>]
data-gathering by, [>]–[>], [>]–[>], [>], [>], [>]–[>]
automobiles, electric: big data and, [>]–[>]
IBM and, [>]–[>]
automobiles, self-driving, [>], [>], [>], [>]
Aviva, [>]–[>]
Ayres, Ian: Super Crunchers, [>]
Bacon, Francis, [>]
Banko, Michele, [>], [>]
Barabási, Albert-László, [>]–[>]
Barnes & Noble, [>]–[>]
Basis, [>]
Beane, Billy, [>]–[>]
Being Digital (Negroponte), [>]
Bell Labs, [>]
Berners-Lee, Tim, [>]
Bezos, Jeff, [>], [>], [>], [>]
big data. See also data; information; open data
and antitrust regulation, [>]–[>]
and artificial intelligence, [>]–[>]
in astronomy, [>]
as based on theory, [>]–[>]
and business models, [>], [>]–[>], [>], [>]–[>]
and calculation of inflation, [>]–[>]
and causality, [>], [>], [>], [>], [>], [>]–[>], [>]
changes human perceptions, [>]
and climate change, [>]
collection and management of, [>]–[>]
in consumer loans, [>]–[>]
correlation analysis and, [>]–[>]
and creation of economic value, [>], [>], [>], [>], [>]–[>], [>]–[>], [>]–[>], [>]–[>]
and credit card fraud, [>]–[>], [>]–[>]
“dark side” of, [>], [>]–[>], [>]
defeats anonymization, [>]–[>], [>]
in DNA sequencing, [>]
in e-commerce, [>]–[>]
in economic development, [>]–[>]
and electric automobiles, [>]–[>]
ethics of, [>]–[>]
exactitude and, [>]–[>], [>], [>], [>], [>]
and explainability, [>]–[>]
in finance, [>]–[>], [>], [>]
and government regulation, [>]–[>], [>]
in health care, [>]–[>], [>], [>]
imprecision as positive feature of, [>]–[>], [>]–[>], [>]–[>], [>], [>], [>]
and individual accountability, [>]–[>], [>]
laws against misuse of, [>], [>]–[>]
“mindset.” See data applications
nature of, [>]–[>], [>]–[>], [>], [>], [>], [>], [>]–[>]
in network theory, [>]–[>]
in oil refining, [>]
potential misuse of, [>]–[>], [>]–[>]
and predictive analytics, [>]–[>], [>], [>]–[>]
privacy and, [>]–[>], [>], [>], [>]
psychological effects, [>]–[>]
replaces statistical sampling, [>]–[>], [>], [>]–[>], [>]–[>]
role of subject-area expertise in, [>]–[>]
social & economic effects of, [>]–[>], [>]–[>], [>]–[>], [>]–[>], [>], [>], [>], [>]–[>], [>]–[>]
as source of competitive advantage, [>]–[>]
value chain, [>], [>], [>]–[>], [>]–[>]
“big-data companies,” [>]–[>], [>]
Billion Prices Project, [>]–[>]
Bing, [>]
Binney, William, [>]
births, premature: McGregor and, [>]–[>], [>], [>]
Bloomberg, Michael, [>]–[>]
bookkeeping, double-entry: history of, [>]–[>]
Pacioli and, [>]–[>]
books. See also e-books
digitization & datafication of, [>]–[>]
reviews at Amazon, [>]–[>], [>]
Boston Consulting Group, [>]
Bowman, Douglas, [>]–[>]
boyd, danah, [>]
Brahe, Tycho, [>]
Brill, Eric, [>], [>]
Brin, Sergey, [>]
British Petroleum, [>]
Brynjolfsson, Erik, [>], [>]
business, online. See e-commerce
business models: big data and, [>], [>]–[>], [>], [>]–[>]
cancer: cell phones and, [>]–[>]
Captcha & ReCaptcha: von Ahn invents, [>]–[>]
Cate, Fred, [>]
categorization: vs. tagging, [>]–[>]
causality: big data and, [>], [>], [>], [>], [>], [>], [>]–[>], [>]
vs. correlation, [>], [>], [>], [>], [>], [>], [>]–[>], [>]–[>], [>], [>], [>], [>]–[>]
intuitive preference for, [>]–[>], [>]
Kahneman on, [>]
nature of, [>]–[>]
Cavallo, Alberto, [>]
cell phone data: Apple and, [>]
and geospatial location, [>]–[>], [>]–[>]
in health care, [>], [>]–[>]
predicts spread of flu
privacy and, [>], [>]
reuse of, [>]–[>], [>]–[>]
cell phones: and cancer, [>]–[>]
censuses: data-gathering for, [>]–[>]
CERN laboratory, [>]
Chagall, Marc, [>]
ClearForest, [>]
climate change: big data and, [>]
Code for America, [>]
collaborative filtering: at Amazon, [>]–[>], [>]
at Netflix, [>]
consumer loans: big data in, [>]–[>]
consumer price index (CPI), [>]–[>]
consumer product prices: prediction of, [>]–[>], [>]
correlation: vs. causality, [>], [>],
[>], [>], [>], [>], [>]–[>], [>]–[>], [>], [>], [>], [>]–[>]
nature of, [>]–[>]
correlation analysis. See also data anaysis; predictive analytics
and big data, [>]–[>]
and credit scores, [>]
as driven by hypotheses, [>]–[>], [>], [>]
and “end of theory,” [>]–[>]
of information, [>]–[>], [>]
in marine navigation, [>]–[>]
of medical records, [>], [>]–[>], [>]
non-linear, [>]–[>]
proxies in, [>]–[>], [>], [>]
of sales data, [>]
vs. scientific method, [>]–[>]
and subprime mortgage scandal (2009), [>]
of text, [>]–[>]
in video game design, [>]–[>]
Coursera, [>], [>]
Craigslist, [>]
Crawford, Kate, [>]
credit card fraud: big data and, [>]–[>], [>]–[>]
Kunze on, [>]
credit scores: correlation analysis and, [>]
datafication and, [>]
credit transactions: analysis of, [>]
crime prevention: predictive policing and, [>]–[>]
Crosby, Alfred, [>], [>]
Cross, Bradford, [>]–[>]
“culturomics,” [>]–[>]
data. See also big data; information; open data
aggregation of, [>], [>], [>], [>], [>]–[>], [>]–[>], [>], [>], [>], [>], [>], [>]–[>], [>], [>]–[>]
anonymization of, [>], [>]–[>]
brokering, [>]
compared to energy, [>]
decision-making driven by, [>]–[>]
depreciating value of, [>]–[>]
“dictatorship” of, [>], [>]–[>], [>]–[>]
economic value of reusing, [>]–[>], [>]–[>], [>]–[>], [>]–[>], [>], [>]
extensibility of, [>]–[>]
fallibility of, [>]–[>]
fetishizing of, [>], [>], [>]–[>], [>]
imprecision in processing, [>]–[>]
mining, [>]–[>], [>], [>], [>]
misuse of, [>], [>], [>]–[>]
nature of, [>]–[>]
option value of, [>], [>]–[>], [>], [>]
recombining of, [>]–[>], [>]
scale in, [>]–[>]
storage costs of, [>], [>], [>]–[>]
as truth, [>], [>]
valuation of, [>]–[>]
data analysis. See also correlation analysis; predictive analytics
companies focusing on, [>], [>]–[>], [>]
vs. intuition, [>], [>], [>]–[>], [>], [>], [>], [>]
McNamara and, [>]–[>], [>]–[>], [>]
data applications: companies focusing on innovations in, [>], [>]–[>], [>]–[>], [>]
“data exhaust,” [>]–[>]
data-gathering, [>], [>]–[>]
by automobiles, [>]–[>], [>]–[>], [>]
by cell phones, [>]–[>], [>]–[>]
for censuses, [>]–[>]
companies focusing on, [>], [>]–[>], [>]–[>]
in election of 2008, [>]
by electrical meters, [>]–[>]
innovations by U.S. Census Bureau, [>]–[>]
notice & consent in, [>], [>], [>]–[>]
by NSA, [>]–[>]
opting out in, [>], [>]
in social sciences, [>], [>]
data intermediaries: companies functioning as, [>]–[>], [>]–[>]
data, personal: corporate release of, [>]–[>]
ownership of, [>]–[>]
privacy and, [>]–[>], [>]–[>], [>], [>], [>]
data scientists, [>]
“data tombs,” [>]
database design: exactitude in, [>]–[>], [>]
datafication: of books, [>]–[>]
and credit scores, [>]
vs. digitization, [>]–[>], [>]–[>]
e-books and, [>]–[>]
by Facebook, [>], [>]
of geospatial location, [>]–[>]
and human behavior, [>]–[>], [>]–[>]
as infrastructure project, [>]
measurement in, [>]
metadata in, [>]–[>]
nature of, [>], [>], [>]–[>]
by social media, [>]–[>]
in stock market investment, [>]–[>]
of text, [>], [>]
touch-sensitive floor covering and, [>]
by Twitter, [>]–[>]
DataMarket, [>]
DataSift, [>]
Davenport, Thomas, [>], [>], [>]–[>], [>]
decision-making: driven by data, [>]–[>], [>]
Delano, Robert, [>]
Deloitte Consulting, [>]
Derawi Biometrics, [>]
Derwent Capital, [>]
digitization: vs. datafication, [>]–[>], [>]–[>]
revolution in, [>], [>], [>]
DNA sequencing: big data in, [>]
cost of, [>]
Steve Jobs and, [>]–[>], [>]
Domesday Book, [>]–[>]
Dostert, Leon, [>]
Duhigg, Charles: The Power of Habit, [>]–[>]
Eagle, Nathan, [>]–[>]
eBay, [>], [>]
e-books. See also books
Amazon and, [>]–[>]
and datafication, [>]–[>]
and data-reuse, [>]–[>], [>]–[>]
e-commerce: big data in, [>]–[>]
economic development: big data in, [>]–[>]
education: misuse of data in, [>]
online, [>]
edX, [>]
Eisenstein, Elizabeth, [>]
Elbaz, Gil, [>]
election of 2008: data-gathering in, [>]
electrical meters: data-gathering by, [>]–[>]
energy: data compared to, [>]
Equifax, [>], [>], [>]
Eratosthenes, [>], [>]
ergonomic data: Koshimizu analyzes, [>], [>], [>], [>]–[>]
ethics: of big data, [>]–[>]
Etzioni, Oren, [>], [>], [>], [>]
analyzes airline fare pricing patterns, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>]
Euclid, [>]
European Union: open data in, [>]
Evans, Philip, [>]
exactitude. See also imprecision
and big data, [>]–[>], [>], [>], [>], [>]
in database design, [>]–[>], [>]
and measurement, [>]–[>], [>]
necessary in sampling, [>], [>]–[>]
Excite, [>]
Experian, [>], [>], [>], [>], [>]
expertise, subject-area: role in big data, [>]–[>]
explainability: big data and, [>]–[>]
Facebook, [>], [>], [>]–[>], [>]–[>], [>], [>], [>], [>]
data processing by, [>]
datafication by, [>], [>]
IPO by, [>]–[>]
market valuation of, [>]–[>]
uses “data exhaust,” [>]
Factual, [>]
Fair Isaac Corporation (FICO), [>], [>]
Farecast, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>]
finance: big data in, [>]–[>], [>], [>]
Fitbit, [>]
Flickr, [>]–[>], [>]–[>]
floor covering, touch-sensitive: and datafication, [>]
Flowers, Mike: and government use of big data, [>]–[>], [>]
flu: cell phone data predicts spread of, [>]–[>]
Google predicts spread of, [>]–[>], [>], [>], [>], [>], [>], [>], [>]
vaccine shots, [>]–[>], [>]–[>], [>]–[>]
Ford, Henry, [>]
Ford Motor Company, [>]–[>]
Foursquare, [>], [>]
Freakonomics (Leavitt), [>]–[>]
free will: justice based on, [>]–[>]
vs. predictive analytics, [>], [>], [>], [>]–[>]
Galton, Sir Francis, [>]
Gasser, Urs, [>]
Gates, Bill, [>]
Geographia (Ptolemy), [>]
geospatial location: cell phone data and, [>]–[>], [>]–[>]
commercial data applications, [>]–[>]
datafication of, [>]–[>]
insurance industry uses data, [>]
UPS uses data, [>]–[>]
Germany, East: as police state, [>], [>], [>]
Global Positioning System (GPS) satellites, [>]–[>], [>], [>], [>]
Gnip, [>]
Goldblum, Anthony, [>]
Google, [>], [>], [>], [>], [>], [>], [>], [>]
artificial intelligence at, [>]
as big-data company, [>]
Books project, [>]–[>]
data processing by, [>]
data-reuse by, [>]–[>], [>], [>]
Flu Trends, [>], [>], [>], [>], [>], [>]
gathers GPS data, [>], [>], [>]
Gmail, [>], [>]
Google Docs, [>]
and language translation, [>]–[>], [>], [>], [>], [>]
MapReduce, [>], [>]
maps, [>]
PageRank, [>]
page-ranking by, [>]
predicts spread of flu, [>]–[>], [>], [>], [>], [>], [>], [>], [>]
and privacy, [>]–[>]
search-term analytics by, [>], [>], [>], [>], [>], [>]
speech-recognition at, [>]–[>]
spell-checking system, [>]–[>]
Street View vehicles, [>], [>]–[>], [>], [>]
uses “data exhaust,” [>]–[>]
uses mathematical models, [>]–[>], [>]
government: and open data, [>]–[>]
regulation and big data, [>]–[>], [>]
surveillance by, [>]–[>], [>]–[>]
Graunt, John: and sampling, [>]
Great Britain: open data in, [>]
guilt by association: profiling and, [>]–[>]
Gutenberg, Johannes, [>]
Hadoop, [>], [>]
Hammerbacher, Jeff, [>]
Harcourt, Bernard, [>]
health care: big data in, [>]–[>], [>], [>]
cell phone data in, [>], [>]–[>]
predictive analytics in, [>]–[>], [>]
Health Care Cost Institute, [>]
Hellend, Pat: “If You Have Too Much Data, Then ‘Good Enough’ Is Good Enough,” [>]
Hilbert, Martin: attempts to measure information, [>]–[>]
Hitwise, [>], [>]
Hollerith, Herman: and punch cards, [>], [>]
Hollywood films: profits predicted, [>]–[>]
Honda, [>]
Huberman, Bernardo: and social networking analysis, [>]
human behavior: datafication and, [>]–[>], [>]–[>]