by Jaron Lanier
The spread of a flu outbreak can be tracked online faster than it can be tracked through the traditional medical system.12 A research project at Google found that flu outbreaks could be tracked well by noting relevant searches in geographical zones. If there’s a sudden lift in concern about flu symptoms in a particular place, for instance, there is probably flu there. The signal is observable even before doctors receive the first wave of complaints.
Tracking the flu online is science. That means it isn’t automatic. Scientists must scrutinize the analysis. Maybe a rise in flu-related queries is actually in response to a popular movie in which the lead character has a bad flu. Without scrutiny, data isn’t trusted.
However, even in the world of big scientific data, magical-seeming results can come before the understanding. Big data can occasionally reverse the sequence and confuse the incentives that have driven science and commerce since the beginnings of each.
A spectacular recent example is the dawn of mind reading. In the first decade of the century there was a sequence of increasingly impressive examples of “brain reading.” This might involve a person learning to control a robotic arm through direct brain measurement. But would it be possible to measure what a person was seeing or imagining from reading the brain? That would be more properly described as “mind reading.”
Results started to appear early in the second decade of our century. Psychologist Jack Gallant and other researchers at UC Berkeley showed they could approximately determine what a person was watching simply by analyzing brain activity. It was as if computers became psychic, though a better way to understand the work is as an example of the challenges of scientific big data.
In Gallant’s experiment, a movie was computed of what someone was seeing, based on nothing but fMRI* scans of the activity of the person’s brain. The images looked blurry and otherworldly, but did conform to what was actually seen.
*fMRI, or functional MRI, is a higher-power version of the familiar MRI scanner. fMRI is usually used to detect blood flow in the brain, which reveals which parts of the brain are most activated moment to moment.
The way it worked was approximately this: Each subject was shown a batch of movie clips. Their brain activation patterns were recorded each time. Then, when the person watched a new, previously unseen clip, activation patterns were once again recorded. Then the original clips were mixed into a new clip proportionally, according to how similar the activation pattern for the new clip was to each original clip. With enough previously seen clips mixed together, a fuzzy new clip emerges that does look like what the subject is watching.
This was a remarkable result, of great importance, but it was only the first step of scientific inquiry. It didn’t reveal how the brain codes visual memories. It did achieve something very important, which was that researchers had found a way to measure the brain that was relevant to specific visual cognition. Furthermore, similar techniques turn out to work for sound, speech, and other domains of experience and action. The age of high-tech mind reading has begun.
Jack Gallant is the first to point out that as spectacular as it is, the achievement is a beginning, not an end. The full cycle of scientific understanding will hopefully include additional attainments of insight and theory.
A Method in Waiting
You never know how long it will take for scientific conclusions about big data to form. Science gives up the best punch lines ever, but delivers them with the most inconsistent timing.
Big business data happens fast, as fast as people can take it in, or usually faster. Faster feedback loops make big business data ever more influential. We have become used to treating big business data as legitimate, even though it might really only seem so because of its special position in a network. Such data is valid by dint of tautology to an unknowable degree.
Science demands a different approach to big data, but we don’t know as much about that approach as we will soon. Scientific method for big data is not yet entirely codified. Once practices are established for big data science, there will be uncontroversial answers to questions like:
• What standard would have to be met to allow for the publication of replication of a result? To what degree must replication require the gathering of different, but similar big data, and not just the reuse of the same data with different algorithms?
• What is publication? Is it just a description of the code used? The code itself? The code in some standardized form or framework that makes it reusable and tweakable?
• Must analysis be performed in a way that anticipates standard practices of meta-analysis?
• What documentation of the chain of custody of data must be standardized?
• Must there be new practices established, analogous to double-blind tests or placebos, that help prevent big data scientists from fooling themselves? Should there be multiple groups developing code to analyze big data that remain completely insulated from each other in order to arrive at independent results?
Before long, all these questions will be answered, but for now, practices are still in flux. Though the details need to mature, the core commitment to testing hypotheses unites all scientists whether their data is big or small.
Wise or Feared?
In the world of business, big data often works whether it’s true or not. People pay for dating services even though, on examination, the algorithms purporting to pair perfect mates probably don’t work. It doesn’t matter if the science is right so long as customers will pay for it, and they do.
Therefore, there is no need to distinguish whether statistics were valid in an a priori scientific sense, or if they were made valid because of social engineering. An example of social engineering is when two people meet through a dating site because they both expect the algorithms to be valid. People adapt to the presence of information systems, whether the adaptation is conscious or not, and whether the information system is functioning as expected or not. The science of it becomes moot.
This is a modern reflection of an ancient conundrum: It’s hard to tell if a king is wise or feared. Either explanation suffices, on those occasions when what the king predicts is what turns out to happen.
Suppose a book vendor pitches an eBook on a tablet and the user clicks to pay for it. To a degree that might be because the vendor has cloud software that includes a scientifically valid prediction algorithm that has modeled the user correctly. Or it might be because users have been told the algorithms are smart, or maybe the user’s attention is monopolized through a proprietary tablet. Perhaps the user would have equally been ready to buy any number of other books. It’s not easy to tell which cause is more important.
Engineers will tend to assume it’s the smartness of the software, and engineers are very good at fooling themselves into believing this is always so. In my previous book I described how it’s empirically difficult to distinguish an artificial-intelligence success from people adjusting themselves to make a program look smart.
When the runners of a Siren Server are convinced it is providing a scientifically genuine computational service—that it is analyzing and predicting events that enlighten the human world—while it is actually just proving it has accumulated power, then nothing useful has been accomplished.
Occasionally an objective test of big business data reveals that the castles in the clouds were never real. For instance, there is no end to the braggadocio of a social network trying to sell advertising. The salespeople trumpet their system’s ability to minutely model and target consumers as if they were Taliban in the crosshairs of a military drone. And yet, the same service, when it must simply detect if a user is underage, will turn out to be unable to counter the deceptions of children.
Yet the fantasy of precision persists. In that moment of fervor when you launch a Siren Server, you can practically taste the luscious swell of power. You will have information superiority because of your listening post on the ’net. This is one of the great illusions of our times: that you can game without bei
ng gamed.
The Nature of Big Data Defies Intuition
On a simplistic level, it is true that there are two versions of you on Facebook: the one you obsessively tend, and the hidden, deepest secret in the world, which is the data about you that is used to sell access to you to third parties like advertisers. You will never see that second kind of data about you.
But it isn’t as if that secret version could be sent to you for review anyway. It wouldn’t make sense by itself. It isn’t separable from the rest of the global data that Facebook collects. The most precious and protected data, given the way we are doing things these days, are statistical correlations that are used by algorithms but are rarely seen or understood by people.
It might be a truth that people with bushy eyebrows who like purple toadstools in autumn are more likely to try hot sauce on their mashed potatoes in the spring. That might even turn out to be a truth with commercial value, but there would never be a purpose to explicitly revealing that such a correlation had been detected. Instead, a hot sauce vendor will in theory be able to automatically place a link in front of someone’s eyes and increase the chance it will be well targeted, and no one need ever know why.
Big data commercial correlations are almost always eternally hidden; they are no more than tiny atoms of mathematics in the programs that spit out profits or power for certain kinds of cloud-based concerns. If a particular unexpected correlation were isolated, articulated, and revealed, what use would it be? Unlike an atom of scientific data, it is not rooted in an articulate framework and is not necessarily meaningful in isolation at all.
The Problem with Magic
To the degree big data can seem magical it can also be spectacularly misleading. Is this not clear? Perceiving magic is precisely the same thing as perceiving the limits of your own understanding.
When correlation is mistaken for understanding, we pay a heavy price. An example of this type of failure was the string of early 21st century financial crises in which correlations created gigantic investment packages that turned out to be duds in aggregate, bringing the world to indebtedness and austerity. Yet few financiers were blamed, at least in part because the schemes were complex and automated to such a high degree.
Naturally, one might ask why big business data is still so often used on faith, even after it has failed spectacularly. The answer is of course that big business data happens to facilitate superquick and vast near-term accumulations of wealth and influence.
Game On
Why is big business data often flawed? The unreliability of big business data is a collective project we all participate in. Blame the hive mind.
A wannabe Siren Server might enjoy honest access to data at first, as if it were an invisible observer, but if it becomes successful enough to become a real Siren Server, then everything changes. A tide of manipulation rises, and the data gathered becomes suspect.
If the server is based on reviews, many of them will suddenly start to be fakes. If it’s based on people trying to be popular, then suddenly there will be fictitious fawning multitudes inflating illusions of popularity. If the server is trying to identify the most creditworthy or datable individuals, expect the profiles of those individuals to be mostly phony. Such illusions might be erected by clever third parties trying to get a little of the action, or they might be wielded by individuals trying to get some small personal advantage out of the online world.
In either case, once a Siren Server starts to get fooled by phony data, a dance begins. The Server hires mathematicians and Artificial Intelligence experts who try to use pure logic at a distance to filter out the lies. But to lie is not to be dumb. An arms race inevitably ensues, in which the hive mind of fakers attempts to outsmart a few clever programmers, and the balance of power shifts day to day.
What is remarkable is not that the same old games people have always played continue to be played over digital networks, but that smart entrepreneurs continue to be drawn into the illusion that this time they’ll be the only one playing the game, while everyone else will passively accept being studied for the profit of a distant observer. It is never so simple.
The Kicker
Since I have long been concerned that the Internet has killed more jobs than it has created, I have been keenly interested in ventures that might reverse the trend. Kickstarter is a relevant experiment. Its original motivation was to make philanthropy more efficient, but here the focus will be on the way Kickstarter facilitates finance for new business ventures. Entrepreneurs raise money from the multitudes in advance of doing something they propose to do, but in a way that bypasses traditional ideas about finance.* Early supporters don’t get equity, but they do often get something concrete, like a “first edition” of a new product. Is this not a sterling example of how the ’net can make capital available to unconventional innovators in nontraditional ways? What’s not to like?
*Kickstarter is just one example of many. The idea is trendy, and is promoted in recent legislation such the 2012 JOBS act. See http://www.forbes.com/sites/work-in-progress/2012/09/21/the-jobs-act-what-startups-and-small-businesses-need-to-know-infographic/.
Indeed I like it, and I especially like that my friend Keith McMillen was able to launch an innovative music controller using it. Keith has been a celebrated musical instrument designer for years, and he had an idea for a new kind of digital musical device called the QuNeo. Instead of going the usual route of pitching investors, he used Kickstarter to pitch his future customers directly. They loved the idea, and his QuNeo controller became one of Kickstarter’s fine early success stories. Hordes of customers lined up and prepaid for a device that didn’t exist yet, turning into pseudo-investors and customers at the same time.
Kickstarter as a tool for funding product development isn’t perfect. It would be even better if it supported the creation of risk pools for multiple projects, and an insurance or risk management system for customers. Siren Servers suffer the delusion that someone else can always take all the risk, that ignored risk will never come around to bite you. Even so, what a lovely case of the Internet making capitalism broader than it used to be.
But wait, all is not well. The same month that QuNeo units were shipped to the earliest adopters, the tech blog Gizmodo announced a boycott of coverage of Kickstarter proposals.* The reason was that the site was so flooded with poor-quality proposals that it had become impractical to dig through so many fakes and flakes to find a few true gems.
*Perhaps Gizmodo is not a definitive source of criticism, but I choose to link there since it and its parent network were the victims of a link boycott from parts of Reddit over an appalling issue while this book was being finalized. Subreddits gathered men who took surreptitious photos of women, or compiled suggestive pictures of underage girls. These men wanted to be able to enjoy the information advantage of being able to do these things to strangers while remaining anonymous. A Gawker reporter (in the parent organization of Gizmodo) revealed a ringleader, and that was considered unforgivable. The desire to manipulate others while remaining invulnerable is just the ordinary person’s way of pretending to be a Siren Server for a moment. The ringleader, once revealed, turned out to be a rather vulnerable working-class fellow. Whenever you see a den of iniquity on the Web, look closer and you’ll find a den of inequity. See http://www.newstatesman.com/blogs/internet/2012/10/reddit-blocks-gawker-defence-its-right-be-really-really-creepy, and http://gawker.com/5950981/unmasking-reddits-violentacrez-the-biggest-troll-on-the-web.
This is an instance in which a classic problem in pre-digital markets should have been put to rest to a significant degree by digital designs. The supposed transparency of the way we have structured our present information economy turned out to be unusable.
The problem in question is known as the “Market for Lemons,” after the title of the famous paper, which helped earn its author, George Akerlof, a Nobel Prize13 in Economics. The lemons in the paper were not from the lemonade stand we encountered earlier, but were instead cru
mmy used cars for sale. The paper detailed how a prevalence of bad used cars distorted markets through the mechanism of information asymmetry.
Buyers worried that sellers knew more about a used car’s problems than they were letting on, which put a pervasive burden on the market, stunted it, and made it less efficient. A truly transparent form of digital market might perhaps offer a reduced occurrence of this sort of degradation. At least that was the hope in the air in the early years of network research, before the advent of Siren Servers.
In fact, digital networks have been helpful in reducing the fear of lemons in the physical used car market. You can now get instant information about a car’s history, for example.14 But that sort of improvement has been avoided by Siren Servers. Instead, the need of Siren Servers to radiate risk to everyone but themselves has the perverse effect of reinstating the lemon dilemma.
Every QuNeo provides cover for lousy projects that gradually tarnish the prospects of the next QuNeo. What happens if a project isn’t completed? What if a supporter never receives a gadget that was supposed to be manufactured? Is there any recourse? Can an innovation hub really radiate all risk away from itself?
Kickstarter has experimented with changing the rules to reduce the risks taken by supporters of projects. For instance, inventors were at one point suddenly forbidden from showing realistic renderings of what an end product might look like. That rule supposedly reduced the risk that a supporter would perceive a project to be closer to fruition than it really was. Even if the rule had the desired effect, is it not absurd to deny inventors the ability to show pictures of what they intend to create? But it’s the sort of strategy a Siren Server must resort to in order to retain an arm’s-length, risk-free state of being. Here is the question and answer about the policy from the Kickstarter website: