Most of this data – searches, software downloads, music purchases, tweets, Skype calls, et cetera – comes from ordinary people going about their ordinary lives. In 2011, 200 million tweets were posted every day (and over 30 billion have been written and sent since Twitter’s launch in 2006). Every sixty seconds, 168 million emails were sent, nearly 700,000 Google searches and Facebook status updates made, 375,000 Skype calls initiated, and 13,000 new iPhone apps downloaded.
Mobile forms of connectivity, including smartphones and tablets, have massively increased this volume of data. Being untethered to a fixed location allows us to be always on, always connected, always communicating. According to the multinational telecommunications company Cisco Systems, in 2012, and for the fifth year in a row, mobile data traffic more than doubled. It is expected that the number of mobile-connected devices will exceed the world’s population by 2013; that is, there will be at least one operative mobile device for every human on the planet, and people will be constantly searching, texting, linking, networking, sharing, photographing, recording, purchasing. The proliferation of highend handsets, tablets, and laptops on mobile networks, all major generators of data owing to the more detailed information experiences they support, will make up a greater proportion of the market. Says Cisco, a “single smartphone can generate as much traffic as 35 basic-feature phones; a tablet as much traffic as 121 basic-feature phones; and a single laptop can generate as much traffic as 498 basic-feature phones.” Mobile data traffic is likely to grow at a compound annual rate of nearly 80 percent, reaching some eleven exabytes per month by 2016. We are immersed in a weightless but dense cloud of bits and bytes, percolating everywhere.
• • •
The data we create contains not just the information we send or interact with, but data about the data, or metadata. We rarely experience metadata directly, as it is buried in instructions and communications several layers below our interactions with our devices. But we can see it when we download photographs onto our computers or upload them to a photo-sharing site like Flickr. When we do so, we might notice that embedded in that digital photograph is data on the model of camera used to take the picture, the exact time it was taken, and the longitude and latitude of where it was taken (should such settings be activated by the user). Digital music and movie files typically contain metadata on the artist, album, date of the recording, and copyright information. Metadata on a mobile phone can contain information about a user’s number, receiver’s number, geographic coordinates of where the message was sent, date and time of the message, duration of any particular call, amount of data transmitted, and the cost of the transmission. Average users may have thousands of data points like these collected from them every day as they communicate through cyberspace. A typical smartphone emits a signal every few seconds, a “beacon” to nearby cellphone towers or wifi hotspots in order to triangulate the most efficient connection for the device. (It was this automatic beaconing that led Mark Wuergler to identify the security vulnerabilities on Apple products described in the previous chapter). Every call, text, or email we send via mobile phones yields space-time coordinates accurate to within metres of where we are and with whom we communicate. This information is stored on a server somewhere, or in multiple places – on “clouds” of computers – spread out across the physical infrastructure of cyberspace. It is an embodiment of us, a kind of cyberspace biography and activity chart, and we have little control over it.
Enthusiasts say that this world of big data is a gift. Google engineers, for instance, show us through their Flu Trends project how they can harness the information collected from millions of real-time queries to predict the location and timing of disease outbreaks. Simply by collating the number, location, and frequency of search queries for symptoms, insight of planetary significance and proportions is gained. If enough users in Chicago or San Francisco simultaneously search for information about a fever, Google can spot a virus before it spreads with a greater degree of accuracy than tools specifically designed to issue early-warning alerts employed by the U.S. Centers for Disease Control.
This ability to identify large-scale patterns can lead to new opportunities for humanitarian aid and development assistance, even in the most impoverished and dangerous of environments. In Haiti, for example, researchers used mobile-data patterns to monitor the movement of refugees and health risks following the massive hurricanes that slammed into that small island country in 2010. Crowd-sourcing data through the Ushahidi platform – a free and open-source software tool developed for information collection, visualization, and interactive mapping after the 2008 Kenyan election – is used to monitor elections, conflicts, and numerous other issues around the world. The LRA Crisis Tracker uses crowd-sourced data plotted on Ushahidi from radios distributed to local communities and other means to monitor atrocities undertaken by the Lord’s Resistance Army (LRA), responsible for one of the most ruthless insurgencies in Africa. Each LRA-related incident is plotted on a map by type – civilian death, injury, abduction, looting – and once consolidated, the map shows the movements of the LRA across the region, and the scope, scale, and frequency of its actions. Incidents captured by cellphone cameras are linked to specific events on the website as corresponding evidence.
In Kibera, Nairobi, Kenya’s largest slum, an experiment in crowd-sourcing data may revolutionize access to basic health care and sanitary services. Conditions in Kibera are dire: most residents are illegal squatters, and local officials regularly withhold basic services, including electricity, sewage treatment, and garbage collection. The most important commodity, water, is extremely scarce – turned on and off by capricious officials, and grossly overpriced by private dealers. Despite the poverty, over 70 percent of Kiberans have mobile phones. They are cheap, plentiful, and can save lives. Researchers at Stanford University are testing an app called M-Maji (“mobile water” in Swahili), which sends users text messages with up-to-date information on the location, price, and quality of water available from different vendors. They believe that this project can be replicated in impoverished communities around the world.
There are countless examples of big data being used to achieve such social goods, but such a rapid transformation of a global communications environment rarely avoids unanticipated negative consequences. To understand these, we need first to understand the political economy of big data, and this boils down to a simple question: Why are we able to use Gmail, Facebook, Twitter, and other cyber services for free?
• • •
“There is no free lunch,” the old saying goes, and to that we should add “and no free tweet, either.” The business model of big data rests on the repurposing of that which all of us routinely give away. Not surprisingly, the market to harvest the digital grains of sand on that constantly expanding beach has exploded: companies of all shapes and sizes systematically pick through our digital droppings, collating them, passing them around, inspecting them, and feeding them back to us. And this market shows no sign of slowing. In 2012, the open-source analyst firm Wikibon reported that the big-data market stood at just over $5 billion and predicted that it will grow to $50 billion by 2017. ISPs, web-hosting companies, cloud and mobile providers, massive telecommunications and financial companies, and a host of other new digital market organisms digest and process unimaginably large volumes of information about each and every one of us, each and every day, and it is then sold back to us as “value-added” products, services, or advertisements for yet more products and services!
Social networks may seem like secure, even cozy, playgrounds, but they are more like vacuum cleaners that hoover up every click and shared link, every status change, every tag and piece of personal history. As Facebook states frankly in its data-use policy, the company uses “the information we receive to deliver ads and to make them more relevant to you. This includes all of the things you share and do on Facebook, such as the Pages you like or key words from your stories, and the things we infer from your use of Facebook.�
�� Facebook “likes” are translated into customized dating and vacation ads; geolocation data is used to advertise local products. Not a single bit or byte is ignored: the companies involved reap what we sow. Freedom in cyberspace is just another word for nothing left unused.
Many network service companies stress the protections they put in place around customers’ data. They insist that what is “theirs” is “yours” and use “I” and “my” as descriptors of their products and services. In practice, however, they treat our data as proprietary business records that they can retain, manipulate, and repurpose indefinitely. They see our habits (and us) as resources in the same way energy companies see untapped reserves of oil, for one simple reason: the online advertising industry is worth $30 billion annually. Whenever we surf the Internet today, depending on the browser we use and the settings we put in place on that browser, we give away pieces of ourselves. A tracking-awareness project, Collusion, has developed a plug-in for browsers that demonstrates how often such “sharing” takes place, usually without our knowledge. If I were to visit, say, http://www.washingtonpost.com, the Collusion plug-in shows that it shares information about my visit with twenty-one other websites. One of those sites is Scorecardresearch.com, and it sells beacons to participating websites (like washingtonpost.com), which place a cookie in visitor browsers. Cookies are small bits of text deposited on your browser that act as “unique identifiers” or signatures that give website owners details about visitors to their sites: their browsing histories, locations (based on IP addresses), and so on.
In 2012, the Wall Street Journal conducted a study of one of the “fastest-growing businesses on the Internet” – spying on Internet users. In their look at surveillance technologies that companies use to track consumers, they highlighted fifty of the most popular websites in the U.S., analyzed all the tracking files and programs these websites downloaded onto their test computers, and found that on average each website installed sixty-four tracking files, generally without warning. The website that downloaded the most tracking software was http://www.dictionary.com: 234 files onto the Wall Street Journal’s test computer. A Dictionary.com spokesperson said, “Whether it’s one or ten cookies, it doesn’t have any impact on the customer experience, and we disclose that we do it. So what’s the beef?” Users concerned about leaving digital traces of themselves all over the Internet might disagree.
The small print included with many applications and/or service contracts provides a window into the underlying reality of this market. By agreeing to terms and conditions contained in documents that scroll by on the way to the “I agree” button, users give the companies involved nearly unlimited permission to handle their data. In many cases involving mobile apps, users even give the developers the right to collect whatever images a camera happens to be focusing on, the image itself, as well as the phone’s location. For example, the Facebook app developed by the Google Android smartphone, which has been downloaded more than 100 million times, has written into its terms of service the right for Facebook “to read SMS messages stored on your device or SIM card.” The Flickr app can access location data, text messages, contact books, online account IDS, who a person is calling, and even the device’s camera. In fact, the Flickr, Facebook, Badoo, Yahoo! Messenger, My Fitness Pal, and My Remote Lock apps can all access a user’s entire contacts book and record who that user is calling. To repeat, the reason behind this data collection is advertising. As Daniel Rosenfield, director of the app company Sun Products testified in 2012: “The revenue you get from selling your apps doesn’t touch the revenue you get from giving your apps away for free and just loading them with advertisements.”
• • •
Few users realize how quickly big data about their communications accumulates in the hands of third-party operators. Malte Spitz is an exception. Max Schrems, an Austrian law student, is another. In 2011, Schrems asked Facebook to send him all of the data the company had stored on him. As he is European and Facebook’s European headquarters is in Dublin, Ireland, Schrems had the right to make such a request. Facebook dutifully sent him a CD containing 1,222 individual PDFS they had collected about him. The company had stored information on all of his logins, “pokes,” chat messages, and postings, even those he had deleted. On a detailed map, it had also stored the precise geographical coordinates for all the holiday pictures (in which Schrems was tagged) that a friend of Schrems had taken and posted using her iPhone.
Schrems discovered that Facebook stores dozens of categories of data about its users so that it can accurately commodify its customers’ digital persona for targeted advertisements. Some examples: the exact latitude, longitude, and altitude of every check-in to Facebook, which is given a unique ID number and a time stamp; every Facebook event to which a user has been invited, including all invitations ignored or rejected; and data on the machines used to connect to Facebook, so that Facebook can connect individuals to the hardware and software they use. Schrems eventually formed an activist group, Europe vs. Facebook, to launch complaints. This led to an inquiry by Irish privacy regulators and widespread media attention about the company’s privacy policies. The battles continue.
This relentless drive for personal information leads to extraordinary encroachments on privacy by social networking companies and ISPs. Over the years, Facebook’s default privacy settings have been continuously adjusted downwards, mostly in increments but sometimes dramatically. In 2005, only you and your friends could see your contact information and other profile data. Only your personal networks could see your wall posts and photographs, and nothing about you was shared by Facebook through the Internet. In 2007, an adjustment was made such that your personal network could see more of your profile data. And then, in early 2009, a major shift took place: suddenly, all Facebook users were permitted to see all of your friends, and the entire Internet could see your gender, name, networks, and profile picture. Another dramatic change took place in December 2009: Facebook settings were modified such that users’ “likes” went from something exclusively seen by friends and friends-of-friends to the entire Internet. Months later, the same “all of the Internet” was extended to users’ photos, wall posts, and friends. Like a giant python that has consumed a rat, Facebook captures, swallows, and slowly digests its users.
The search for new sources of personal information has led down other frightening paths. In 2010, the Sleep Cycle app was thrown onto the market. It monitors the sleep patterns of users from their mobile phones, and works when the phone is placed on the bed of the user. The app monitors movements and other patterns that determine periods of deep sleep, dreaming, and light sleep. Thirty minutes before the alarm is set to go off, it begins monitoring for the lightest periods of the sleep cycle and then gently nudges users awake with soothing sounds, instead of honking alarm bells. Data about the night’s sleep is recorded and stored on the app’s servers. (Naturally, the app also has an option to “share on Facebook.”) Perhaps our dreams will be next, and then, worse, our nightmares.
The desire for big data is relentless, the temptations irresistibly strong, and in their lust for information about us many companies have disregarded basic privacy protections. Path, a popular social network, was caught uploading members’ mobile phone contacts to its servers without permission. Twitter has admitted that it copied lists of email addresses and phone numbers from people who used its smartphone application. (And, again, the information was stored on its servers without users’ permission.) A 2012 study by the mobile security company Lookout found that 11 percent of the free applications in Apple’s iTunes Store could access users’ contacts. In 2012, a class action lawsuit was launched against more than a dozen companies for selling mobile apps that uploaded users’ contact lists without their knowledge or consent. Facebook announced in December 2011 that it would post archived user information, making old posts available under new downgraded privacy settings as part of a new Timeline feature. Users were given just one week to clean up their histories bef
ore Timeline went live. The extraordinary (and brazen) announcement came only a few short weeks after a decision by the U.S. Federal Trade Commission found that Facebook had engaged in “unfair and deceptive” trade practices when it changed the privacy settings of its users without properly notifying them.
Google’s 2010 collection of private wifi data (described in the last chapter) was but one of several concerns users have had about the company’s ambitious data collection practices. If a user employs the full range of Google products – from Search to the Android mobile operating system to Gmail, Google Docs, Google Calendar, Google Hangout, and others (all of which are free) – Google’s consolidated management of the precise detailed information about each of its user’s movements, social relations, habits, and even private thoughts is truly frightening in scope and scale, especially in the event that any of these capabilities is abused, compromised in some way, or subject to external controls and manipulation. Such a scenario is not far-fetched. In the 2009 Operation Aurora attacks Google’s networks – including many Gmail accounts and some of the company’s source code – were compromised by China-based attackers. After the attacks, Google entered into a secret agreement with the NSA to review its security. “The company pinkie-swears that its agreement with the NSA won’t violate the company’s privacy policies or compromise user data,” wrote Wired’s Noah Shachtman, adding: “Those promises are a little hard to believe, given the NSA’S track record of getting private enterprises to co-operate, and Google’s willingness to take this first step.” Critics were hardly mollified when the U.S. Electronic Privacy Information Center’s (EPIC) freedom of information request to find out more about the secret agreement was rejected in May 2012 by a U.S. federal appeals court, which said that the NSA need neither “confirm nor deny” the existence of any relationship with Google. The world’s largest data collection company secretly partnered with the world’s most powerful spy agency, and no one outside of either institution knows the full details? It would be hard to conjure up a more frightening scenario.
Black Code: Inside the Battle for Cyberspace Page 6