Chapter 10. BIGGER DATA
Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world.
Atul Butte, MD, PhD, Professor
Summary
Blockchains could one day open a broader, deeper data layer than has ever been available. However, because it would no longer be proprietary, organizations would have to work harder than ever to extract competitive advantage from that data.
Blockchain technology holds the potential to unleash more comprehensive data—data that could drive deeper insight, and help to realize big data’s promise to advance business and society. However, it comes with a key tradeoff. “Ownership” of the data shifts from organizational silos to the individual contributor, who, in the vision of blockchain entrepreneurs, will be able to safely “vault” a great range of data, including, potentially, new datasets that they had not previously been willing to make available.
Decisions to use or sell that data would also be increasingly controlled by the contributor. Those organizations that learn to operate successfully within this new paradigm could access a new breadth and depth of data to improve products, market smarter, and discover insights that may very well change our world. However, when fewer datasets are proprietary, businesses will have to work harder than ever to sustain competitive advantage.
The Setting
Businesses Fueled by Big Data
We live among a business landscape obsessed with big data. For years a diverse array of companies has relentlessly dug into building out their data stores and predictive analytic capabilities. It’s not just internet giants, but companies in industries as broad as retail, agriculture, health care, education, and transportation. They’ve accelerated data collection, from credit card swipes to location tracking, entertainment preferences, biometrics, and social media connections. Data fuels decisions in the C-suite, marketing, product development, finance, production, human resources, and research and development.
Some businesses are even investing billions in strategies to get as much data as they can about consumers’ health, location, preferences, and behaviors. Google bought Nest, Facebook acquired WhatsApp, and UnderArmour has made multiple acquisitions to become one of the largest digital fitness companies in the world. These investments have been driven in part by an assumption that the organization collecting the data owns it, and has unfettered access to its use—that the one with the most data will win.
Falling Short of the Big Data Promise
Companies are doing amazing things with data. T-mobile cuts customer churn, Walmart stocks stores based on predictions of what will sell, and Kaiser has improved patient outcomes. But today’s insights are only as powerful as the data silos in which they sit. Each business works to develop its own data view—data it gets from customers, acquisitions, partnerships, and data brokers. But the resulting pool is still filled with an incomplete picture of a customer or condition, and data of variable quality and cleanliness.
The real promise of big data is realized when data is broken out of corporate fiefdoms. When data scientists talk about the most promising contribution big data could make to our world—things like fighting hunger, slowing climate change, and developing personalized medicines—they are talking about data that’s bigger, better, cleaner, and richer than what we have today. They are talking about breaking data out of silos so they can discover the insights hidden within this vaster data ocean.
Even companies with much more modest goals of building better products, deepening understanding of their customers, and identifying new markets crave much more holistic—and more accurate—data than they have access to today. Their holy grail is a unified data profile on customers.
Fighting the Idea of Ownership
But the very idea of a corporation owning our data is under attack. Certainly, we’ve granted them permission, in the reams of legalese we rarely review when signing up with a new company, and we’ve expanded that by ignoring the legal updates filling our email inboxes. And we’ve given freely: Facebook, the largest online social network, has collected 300 petabytes of personal data since its inception136—a hundred times the amount the Library of Congress has collected in over 200 years.137
But it is becoming increasingly obvious that these organizations are poor stewards of our data, and it’s hurting us. An employee at an Anthem health insurance subsidiary opened a phishing email and, because the company had failed to patch a known vulnerability, 80 million member and employee records were stolen. Equifax traced the theft of Social Security numbers and other sensitive information for 143 million Americans to a software flaw that could have been fixed. And the misuse of Facebook data by Cambridge Analytica is surely not going to be the last scandal that drives attention to the danger of allowing companies unfettered access and permissive usage of consumers’ data.
The public’s awareness of this threat is growing. But they have little choice but to hand over their data if they want access to modern basics like credit, health care, and social networks. They may not trust these companies to be good guardians, but they don’t have an attractive alternative.
Regulators are starting to step in with power the public doesn’t have. General Data Protection Regulation (GDPR) is a new European Union law with strict requirements on data protection and privacy that corporations have been scrambling to meet. GDPR focuses on ensuring that people who use online services know not only exactly what data those companies will take, but how they put it to use.
After the Cambridge Analytica scandal, Facebook CEO Mark Zuckerberg was summoned to Congress, where he was grilled for 10 hours by nearly 100 legislators in the House and Senate. It’s still unclear what regulation will come of the growing data privacy inquisition, but it is clear the pressure for change is intense. Representative Billy Long of Missouri told Zuckerberg, “You’re the guy to fix this. You need to save your ship.” While even Zuckerberg said regulation of his industry is “inevitable,” David Vladeck, former director of the US Federal Trade Commission’s Bureau of Consumer Protection, said, “We do not have an omnibus privacy legislation at the federal level. We don’t have a statute that recognizes generally that privacy is a right that’s secured by federal law. And that puts us at the opposite end of the spectrum from some of the other major economies in the world.”138
This same movement to protect citizens’ privacy can also make companies reluctant to try to find a way to share data, even if doing so could lead to a positive outcome for an individual patient, or for broader society. We can see this quite clearly in health care, where laws like the United States’ Health Insurance Portability and Accountability Act (HIPAA) provide important protections, but make it harder to realize the promise of such lofty initiatives as personalized medicine, in which a person’s genetics, environment, and lifestyle can help determine the best approach to prevent or treat disease.
Andy Coravos is passionate about how advancements in cryptography—of which, she emphasizes, blockchains are just one aspect—combine with advancements in data science. She is the CEO and cofounder of Elektra Labs, a company working at the forefront of digital medicine (specifically, biomarkers and sensors). In a recent conversation, Andy emphasized the tension that exists today between the two arguments of “we need privacy,” and “we need sharing.”
“What matters most,” she explains, “is data governance. If I am sick, I want my doctor to have access to my medical records—I don’t want them to be private. What we need now is more secure sharing, more control, and more governance.”139
Across the spectrum, it is clear that pressure is building.
IoT Fuels the Fire
Much of the scrutiny from both consumers and regulators so far has focused on data created, retyped, or reposted by humans. We are about to see an explosion in new classes and types of data coming from the billions of IoT devices that will be surrounding us, many of them measuring and reporting on everything from how we move to our biometrics and
even our emotional state. As mentioned previously, Gartner estimates that more than 20 billion connected “things” will be in use by 2020.140
Is It Time for a New Deal on Data?
In 2007, Sandy Pentland first proposed a “New Deal on Data” to the World Economic Forum (WEF). Sandy is one of the preeminent thinkers on data. He was named a world’s most powerful data scientist by Forbes, is a founding member of Google’s advisory board, created and directed the MIT Media Lab, and is on the board of the UN Sustainable Global Partnership for Sustainable Development Data.
Through multiple discussions at the WEF, the New Deal took shape. The key insight, he said, is that, “Our data are worth more when shared. Aggregate data—averaged, combined across populations, and often distilled to high-level features—can be used to inform improvements in systems such as public health, transportation, and government.”141
The cornerstone of the proposal was that to have a successful data-driven society, we must have the guarantee that our data will not be abused: ownership of data must be returned to the people. The New Deal puts people at the center, not corporations: “The current personal data ecosystem is feudal, fragmented, and inefficient. Too much leverage is currently accorded to [organizations] that enroll and register end users. Their siloed repositories . . . contain data of varying qualities; some are attributes of persons that are unverified . . . For many individuals, the risks and liabilities of the current ecosystem exceed the economic returns . . . Personal privacy concerns are thus addressed inadequately at best, or simply overlooked in the majority of cases.”142
Ultimately, the Chairman of the Federal Trade Commission (who was part of the working group) put forward the US “Consumer Data Bill of Rights” (which has not yet resulted in actual controls), and the EU Justice Commissioner declared a version of the New Deal to be a basic human right.
But the New Deal considered ownership the minimal guideline. “There needs to be one more principle . . . to adopt policies that encourage the combination of massive amounts of anonymous data to promote the Common Good.” The New Deal continued: “to enable the sharing of personal data and experiences, we need secure technology and regulation that allows individuals to safely and conveniently share personal information with each other, with corporations, and with government . . . we must promote greater idea flow among individuals, not just within corporations or government departments . . . the entities who should be empowered to share and make decisions about their data are the people themselves: users, participants, citizens.”143
The New Deal envisioned a more highly functioning digital economy: a world in which people could choose to share data—thus making big data safer and more transparent, as well as more liquid and available. It anticipated an environment in which true data-driven breakthroughs would be possible, by breaking data out of silos where nobody even knows it’s there, and making it available to data scientists while concurrently protecting the person who provided it.
What the New Deal on Data did not include was a solution for how to make this possible.
What’s Coming Next Could Change Everything
While it has moved slowly, it’s clear that the cry for giving data ownership back to the people has been building. Consumers are more aware, and regulators have begun to mobilize. But there hasn’t been a technology that could enable a real alternative. Until now.
With blockchains, an answer is possible.
What Blockchains Make Possible
Who Really Owns the Data?
Blockchains set loose a tectonic shift in the idea of data ownership. This will be a two-edged sword for companies fueled by consumer data. One edge sharply limits the access businesses have to data. The other could give some organizations bigger and better data than they’ve ever had before.
It starts by putting the consumer at the center.
Your Very Own Black Box
Imagine this: one day, you could safely place an anonymized, encrypted version of your identity data in a “black box” on a blockchain. You could choose to have a trusted third party verify that your identity is a “real” person. You could also choose to tie different datasets to this “prime identity.” For example, the data flowing out of your connected home could be gathered in a “home” subidentity, and you could create subidentities for your connected car, commerce, banking, academic accreditations, health care records, social networks, and so on. You have essentially created a personal data vault. Your data has not only become yours, but it has become much more secure—and much more useful to you.
There are many emerging companies that are seeking to become a trusted protector of user data and identity data. In a banking relationship, the bank holds your money, but does not spend it at their discretion—it is yours and you decide how it is spent. With an “identity” bank, a user could safely store their verified identity (or an anonymous proxy of their identity) on a blockchain. You could choose how it is used. The vault could properly function as a safe, universal “key” for identity, and you could approve specific applications to access it and use it.
Consumer at the Controls
In this vision, you choose who sees and uses your data, granting access only as necessary. Whether this party can attribute the data you share to your actual physical world identity is up to you. And you can even choose to sell your data to a brand, or donate it to a research study you support. Blockchain entrepreneurs term this self-sovereign identity because it lives outside of the sovereign of any central organization or state—the sovereignty comes from the individual.
Only what’s needed for a specific purpose would be shared. Today, when you go out for a drink with a friend, you may be asked for an ID. You hand a stranger, some guy leaning against the door of a bar, a lot of personal information: your full name, home address, license number, birth date, height, weight, and eye color. But all they really need is a single data point: that you meet the drinking age requirement, yes or no. With a blockchain-driven black box, you could divulge only what is required for a specific digital interaction. And users could grant—or revoke—access to a specific company on a need-to-know basis.
For convenience, you may choose to cluster the data in your black box into different profiles that you share with different organizations. Just as you may act differently with your friends on the weekend than you do with colleagues in a meeting, you may have what some blockchain identity companies are calling “avatars.” You might have a private avatar for managing private affairs, family, and wealth; a business avatar for managing bank accounts and loans; and a social media avatar for managing the data that you share with your social networks. There are several pioneering companies developing a vision for making this an accessible, easy experience for users. There is a long road ahead, however, to develop both the technology and governance that truly protects users. As Meltem Demirors has said, “Self-sovereign identity sounds good, but the implementation will likely be the battle of our lifetime.”144
Securing and Protecting the Data
This vision is also more secure than today’s corporate “honeypots” filled with millions of records that invite continuous cyberattacks. Not only is your data in its own vault, it is kept private through various cryptography approaches.
The spread of data is controlled as well. Today, data you don’t even realize has been gathered about you is scattered across databases across the world owned by companies you can’t name. With this blockchain vision, individuals could audit what data has been collected, and trace how it has been used. Not only could just your relevant data be made available to a third party, but in addition technologists have made great progress on solutions to enable a third party to identify patterns in encrypted data without decrypting it first (essentially, enabling data science to unearth insights without “seeing” the underlying data or compromising the privacy of those who contributed it). This is a powerful evolution in what is possible with data.
The Potential of New, Powerful Data Layers
/>
As we discussed earlier, big data holds the promise of great advancements in business and society. However, with organizations considering data they “own” as proprietary, data scientists are typically limited to insights only as powerful as the data in their organization. Today, breaking data out of these silos would violate increasing regulation and privacy norms, and increase risk of compromise and attack. But as users control more of their data, the concept of a new universal “data layer” arises. In this vision, the data lives in the data layer, but is accessible—as needed and if permission is granted by the contributor of the data—across applications and digital interfaces. Over time, this layer could become a trusted repository and source for the data of millions—or billions—of users and devices.
In the hopes of many blockchain natives working in this space, the data layer is interoperable across platforms and protocols. Because the data would not be exclusive to a particular application, this could have significant impact to companies whose products use data to lock in their customers—if this future is realized. You may have invested a great deal of effort, for example, in developing your LinkedIn profile. While it is easy to cut and paste the text from a colleague’s recommendation, you cannot similarly transfer the veracity of that recommendation—that it came from a real person who wrote it about you. That is something that only LinkedIn can provide. However, in the blockchain future, elements of reputation like reviews, endorsements, and networks could theoretically be made portable, changing the drivers of competitive advantage for some companies (and there are blockchain entrepreneurs who have already introduced early versions of products that do this).
Unblocked Page 19