Half the people in the room were there on purpose, like me—tech enthusiasts, privacy advocates, civil servants working on open-data initiatives. The other half had simply wandered in on a cold Friday evening, and found themselves interested enough to stay. That’s who was sitting next to me in the back of the room, listening to the staff explain how to disentangle your online identity from all those trackers and data brokers.
Before I left, I was handed a “Data Detox Kit”: a thick envelope stuffed with eight days’ worth of activities designed to help people see how they’re being tracked, and change both their security settings and their technology habits to minimize it in the future. On the train home, I cracked open the kit. Day one was “discovery”: clearing out browser cookies, that sort of thing. I moved on to day two: “being social.”
“Is Facebook your BFF?” the headline asked. The kit prompted me to find out by downloading a browser extension: What Facebook Thinks You Like. The extension trawls Facebook’s ad-serving settings, and spits out a list of keywords the site thinks you’re interested in, and why. There’s the expected stuff: travel, lipstick, Danish design. I’ve probably clicked on ads for those things before. Then there’s a host of just plain bizarre connections: “Neighbors (1981 Film),” a film I’ve never seen and don’t know anything about. A host of no-context nouns: “Girlfriend,” “Brand,” “Wall,” “Extended essay,” “Eternity.” I have no idea where any of this comes from—or what sort of advertising it would make me a target for. Then it gets creepy: “returned from trip 1 week ago,” “frequent international travelers.” I rarely post anything to Facebook, but it knows where I go, and how often.
As of December 2016, ProPublica—the nonprofit news organization that made the What Facebook Thinks You Like browser extension—said its project had collected 52,000 unique attributes like these. But it also revealed something else: while Facebook will show you what it has inferred about you from your activity on the site, it won’t show you all the other information it has about you—information purchased from third-party data brokers.1 That additional data set might include all sorts of things: where you shop (from loyalty card usage), whether you own a car or a home (public records), which publications you get (subscriber lists).
In 2012, one of the largest of these brokers, Acxiom, bragged that it had an average of 1,500 data points for each of the 500 million consumers, including most of the adult population in the United States, in its database.2 Think about that: 1,500 individual tidbits of information about you, all stored in a database somewhere and handed out to whoever will pay the price. Then there’s the practice of segmenting: grouping you into one of Acxiom’s seventy marketing clusters, each with a branded name and a peppy description.
Judging by the information readily available about me, I’m almost certain Acxiom’s database has me pegged as a segment 6, “Casual Comfort”: a city-living, upwardly mobile type who enjoys “socializing, attending concerts and participating in fantasy sports leagues, as well as adventurous outdoor recreation. This cluster also appreciates fine dining and fitness.” 3 Minus the fantasy sports, this is all pretty accurate. And why wouldn’t it be? After all, Acxiom knows which credit cards I have, how much I paid for my house, and where I spend my money. When this data is assembled into a dossier and matched up with my Facebook habits, boom: you get a detailed view of not just how much I make and what I like to do, but also where I am at almost any given moment, what I believe in and care about, and who my friends are.
Data collection may be creepy, but it’s certainly not just tech companies that are doing it: everyone from your grocery store to your bank to your insurance company is neck-deep in detailed information about you and your habits. In many ways, this is simply our modern reality, particularly in the United States, where regulations are piecemeal. It’s a reality that we’d be smart to acknowledge, if we hope to stop its abuses or regulate its effects.4 But while our digital products aren’t solely to blame, they do enable this data collection—and maybe more important, they normalize it in our daily lives.
As we’ll see in this chapter, every bit of the design process—from the default settings we talked about in Chapter 3, to the form fields in Chapter 4, to the cute features and clever copy in Chapter 5—creates an environment where we’re patronized, pushed, and misled into providing data; where the data collected is often incorrect or based on assumptions; and where it’s almost impossible for us to understand what’s being done by whom.
DEFAULTING TO DECEPTION
In November of 2016, ride-hailing company Uber released a new update to its iPhone app—and with it, a new detail you might have glossed right over while you were tapping “accept” and “continue”: rather than tracking your location while you’re using the app (which makes sense, since GPS location helps your driver find you easily), Uber now defaults to tracking your location all the time, even when the app is closed. Did that make your eyes widen? It should: this means that if you’re an iPhone user and an Uber customer, you’ve probably given Uber permission to track you wherever you go, whenever the company wants.
Uber promises it won’t do anything malevolent, of course. The company says its new data collection practices are all about improving service (which is the stock PR line). In this case, Uber says the changes are designed to make pickups and drop-offs safer and easier. For example, the company says it wants to know how often riders cross the street directly after leaving their driver’s vehicle, which it thinks could indicate that drivers are dropping passengers off in unsafe locations. Uber also wants to do a better job of tracking people’s locations while they’re waiting for their car to arrive, because the top reason drivers and customers contact each other before pickup is to confirm precise location. To do all this, Uber says it wants to use a rider’s location only from the moment they request a car until five minutes after they’ve been dropped off—not all the time. Phew, right?
But iPhone settings don’t actually work that way. There are only three options: you can allow an app to use your location at all times, only while you’re using the app, or never. This Uber update purposely disables the middle option, so all you’re left with are the extremes. And while you can select “never,” Uber strongly discourages it, even misleading you along the way: doing so triggers a pop-up in the app with a large headline, “Uber needs your location to continue,” followed by a big button that says “enable location services.” By all appearances, you don’t have a choice: Uber needs it, after all. But just below that, if you’re paying close attention (something I’m sure we all do when we’re tapping through screens, right? Right?), you’ll see a much smaller text link: “enter pickup address.” That little link bypasses the location-based option, and instead allows you to just type in the street address where you want to be picked up. Because, it turns out, you certainly can use Uber without location data. The company just doesn’t want you to.
When we look at these choices together, a clear picture emerges: Uber designed its application to default to the most permissive data collection settings. It disabled the option that would have allowed customers to use the app in the most convenient way, while still retaining some control over how much of their data Uber has permission to access. And it created a screen that is designed expressly to deceive you into thinking you have to allow Uber to track your location in order to use the service, even though that’s not true. The result is a false dichotomy: all or nothing, in or out. But that’s the thing about defaults: they’re designed to achieve a desired outcome. Just not yours.
Uber claims it’s not out to track your every move; it only wants just a bit more data after your ride is over. Should we take the company at its word? No, one anonymous user told tech publication the Verge: “I simply don’t trust Uber to limit their location tracking to ‘five minutes after the trip ends.’ There’s nothing I can do to enforce this as an end user.” 5
That’s true: Apple’s settings just say “always.” And while Uber might not be do
ing anything else with our data today, I’m not sure the “just trust us” approach is a great long-term plan when you’re talking about aggressively growing startups backed by millions in venture capital—money they’ll need to show investors that they can recoup in the not-too-distant future. Particularly when we’re talking about Uber—a company with a truly abysmal privacy record.
In 2014, an executive used the software’s “god view”—which shows customers’ real-time movements, as well as their ride history—to track a journalist who was critical of the company. Other sources have claimed that most corporate employees have unfettered access to customers’ ride data—and have, at times, tracked individual users just for fun.6 In 2012, Uber even published an official blog post bragging about how it can tell when users have had a one-night stand—dubbing them, somewhat grossly, “rides of glory.” (Uber deleted the post a couple years later, when major stories started breaking about its disregard for privacy.)7 All said, Uber is well known for having a male-dominated workplace that sees no problem playing fast and loose with ethics (a problem we’ll look at more closely in Chapter 9). And as we’ve seen, that culture has made its way right into the default design of its app.
Default settings—particularly those in the apps on our always-with-us smartphones—not only encode bias, as we saw earlier. They also play a massive role in the data that can be collected about us. And often we don’t even notice it’s happening, because each little screen is designed to encourage us along: yes, I agree, continue. Tap, tap, tap. It all feels simple, inevitable almost—until you open your phone’s settings and realize that dozens of apps are quietly collecting your location data.
Back in 2010, technologist Tim Jones, then of the Electronic Frontier Foundation, wrote that he had asked his Twitter followers to help him come up with a name for this method of using “deliberately confusing jargon and user-interfaces” to “trick your users into sharing more info about themselves than they really want to.” 8 One of the most popular suggestions was Zuckering. The term, of course, refers to Mark Zuckerberg, Facebook’s founder—who, a few months before, had dramatically altered Facebook’s default privacy settings. Facebook insisted that the changes were empowering—that its new features would give the 350 million users it had at the time the tools to “personalize their privacy,” 9 by offering more granular controls over who can see what.
The way Facebook implemented this change isn’t quite as inspirational, though. During the transition, users were given only two options: if they had customized their settings in the past, they could preserve them; otherwise, they would default to a “recommended” setting from Facebook. Only, the new default settings weren’t anything like the old ones. In the past, profile information like gender and relationship status was visible to friends only by default. The new settings made that information public. Perhaps even more worrisome, anything you posted to Facebook, which had previously been viewable by “Your Networks and Friends” only, was also made public. None of this was explained in the transition tool either; it was instead presented as a simple default choice—just click yes and be done with it. Finding out what you were really agreeing to took extra clicks out of the transition tool, and into Facebook’s complicated privacy settings page. Because, ultimately, Facebook didn’t want you to actually customize your settings. It wanted to Zucker you out of as much data as possible.
As I write this, seven years later, nothing has changed—except that as mobile usage has skyrocketed, the amount of data that can be harvested from us has exploded. And companies like Uber? They’re taking Zuckering further than ever.
APPROXIMATING YOU
Back in 2012, Google thought I was a man.
Let me back up. Back in January of that year, the search giant released a new privacy policy that, for the first time, sought to aggregate your usage data from across its array of products—including Google Search, Gmail, Google Calendar, YouTube, and others—into a single profile.10 This change caused quite a stir, both inside and outside of tech circles, and as a result, users flocked to the “ad preferences” section of their profiles, where Google had listed the categories that a user seemed to be interested in, as inferred from their web usage patterns—like “Computers & Electronics,” or “Parenting.” But in addition to those categories, Google listed the age range and gender it thought you were. It thought I was a man, and somewhere between thirty-five and forty-four. I was twenty-eight.
Pretty soon, I realized it wasn’t just me: Tons of the women in my professional circle were buzzing about it on Twitter—all labeled as men. So were female writers at Mashable, a tech media site; the Mary Sue, which covers geek pop culture from a feminist perspective; and Forbes, the business magazine. So, what did all of us have in common? Our search histories were littered with topics like web development, finance, and sci-fi. In other words, we searched like men. Or at least, that’s what Google thought.11
What Google was doing is something that’s now commonplace for tech products: it was using proxies. A proxy is a stand-in for real knowledge—similar to the personas that designers use as a stand-in for their real audience. But in this case, we’re talking about proxy data: when you don’t have a piece of information about a user that you want, you use data you do have to infer that information. Here, Google wanted to track my age and gender, because advertisers place a high value on this information. But since Google didn’t have demographic data at the time, it tried to infer those facts from something it had lots of: my behavioral data.
The problem with this kind of proxy, though, is that it relies on assumptions—and those assumptions get embedded more deeply over time. So if your model assumes, from what it has seen and heard in the past, that most people interested in technology are men, it will learn to code users who visit tech websites as more likely to be male. Once that assumption is baked in, it skews the results: the more often women are incorrectly labeled as men, the more it looks like men dominate tech websites—and the more strongly the system starts to correlate tech website usage with men.
In short, proxy data can actually make a system less accurate over time, not more, without you even realizing it. Yet much of the data stored about us is proxy data, from ZIP codes being used to predict creditworthiness, to SAT scores being used to predict teens’ driving habits.
It’s easy to say it doesn’t really matter that Google often gets gender wrong; after all, it’s just going to use that information to serve up more “relevant” advertising. If most of us would rather ignore advertising anyway, who cares? But consider the potential ramifications: if, for example, Google frequently coded women who worked in technology in 2012 as men, then it could have skewed data about the readership of tech publications to look more male than it actually was. People who run media sites pay close attention to their audience data, and use it to make decisions. If they believed their audiences were more male than they were, they might think, “Well, maybe women do just care less about technology”—an argument they’ve no doubt heard before. That might skew publications’ reporting on the gender gap in tech companies to focus more on the “pipeline,” and less on structural and cultural problems that keep women out. After all, if women interested in technology don’t exist, how could employers hire them?
This is theoretical, sure: I don’t know how often Google got gender wrong back then, and I don’t know how much that affected the way the tech industry continued to be perceived. But that’s the problem: neither does Google. Proxies are naturally inexact, writes data scientist Cathy O’Neil in Weapons of Math Destruction. Even worse, they’re self-perpetuating: they “define their own reality and use it to justify their results.” 12
Now, Google doesn’t think I’m a man anymore. Sometime in the last five years, it sorted that out (not surprising, since Google now knows a lot more about me, including how often I shop for dresses and search for haircut ideas). But that doesn’t stop other tech companies from relying on proxies—including Facebook. In the fall of 2016, j
ournalists at ProPublica found that Facebook was allowing advertisers to target customers according to their race, even when they were advertising housing—something that’s been blatantly illegal since the federal Fair Housing Act of 1968. To test the system, ProPublica posted an ad with a $50 budget, and chose to target users who were tagged as “likely to move” or as having an interest in topics like “buying a house” (some of those zillions of attributes we talked about earlier), while excluding users who were African American, Asian American, and Hispanic. The ad was approved right away. Then they showed the result to civil rights lawyer John Relman. He gasped. “This is horrifying,” he told them. “This is massively illegal.” 13
But hold up: Facebook doesn’t actually let us put our race on our profile. So how can it allow advertisers to segment that way? By proxies, of course. See, what Facebook offers advertisers isn’t really the ability to target by race and ethnicity. It targets by ethnic affinity. In other words, if you’ve liked posts or pages that, according to Facebook’s algorithm, suggest you’re interested in content about a particular racial or ethnic group, then you might be included. Except Facebook didn’t really position it that way for advertisers: when ProPublica created its ad, Facebook had placed the ethnic-affinity menu in the “demographics” section—a crystal-clear sign that this selection wasn’t just about interests, but about identity.
There are legitimate reasons for Facebook to offer ethnicity-based targeting—for example, so that a hair product designed for black women is actually targeted at black women, or so that a Hispanic community group reaches Hispanic people. That makes sense. And since ProPublica’s report, Facebook has started excluding certain types of ads, such as those for housing, credit, and employment, from using ethnic-affinity targeting. But by using proxy data, Facebook didn’t just open the door for discriminatory ads; it also opened a potential legal loophole: they can deny that they were operating illegally, because they weren’t filtering users by race, but only by interest in race-related content. Sure.
Technically Wrong Page 9