Book Read Free

Data Versus Democracy

Page 7

by Kris Shaffer


  unintended—are more difficult to regulate, as the groups of people who see

  the same advertisements are far smaller and more diffuse than TV markets or

  subscribers to a particular magazine. Finally, what happens when the (private)

  data collected to make this hyper-targeted advertising effective is sold, leaked,

  or hacked?

  We’ll unpack these questions in Chapter 4 and beyond. But first, let’s get our

  heads around how the attention economy, our cognitive limitations, and

  personalized, targeted content work together in the increasingly central

  feature of modern human society: the algorithmic news feed.

  C H A P T E R

  3

  Swimming

  Upstream

  How Content Recommendation Engines Impact

  Information and Manipulate Our Attention

  As we have shifted from an information economy to an attention economy in

  the past two decades, we have almost simultaneously shifted from mass media

  to social media. The drastic increase in available media necessitates a way for

  individuals to sift through the media that is literally at their fingertips. Content

  recommendation systems have emerged as the technological solution to this

  social/informational problem. Understanding how recommendation system

  algorithms work, and how they reinforce (and even exaggerate) unconscious

  human bias, is essential to understanding the way data influences opinion.

  What’s New?

  Online advertising can be uncanny.

  My wife was on Facebook a while back and noticed an odd advertisement.

  Facebook was suggesting that she might be interested in a set of patio seat

  cushions. That’s funny, she thought, I was just on the phone with my mom, and

  she told me she was out looking for just such a cushion! A couple weeks later, the

  © Kris Shaffer 2019

  K. Shaf fer, Data versus Democracy,

  https://doi.org/10.1007/978-1-4842-4540-8_3

  32

  Chapter 3 | Swimming Upstream

  same thing happened with a table lamp from a store she hadn’t shopped at in

  years, but her parents had just been to. I asked her if she checked Facebook

  from her parents’ computer when we visited the previous month, and, of

  course, the answer was yes. She was sure to log out when she was done, but

  the cookie Facebook installed into her parents’ web browser was still there,

  phoning home, while her parents shopped online for seat cushions and lamps.

  A few web searches (and credit card purchases) later, her parents’ shopping

  had been synched to her profile on Facebook’s servers. So as her parents

  perused household items, she saw ads for similar items in her feed.

  In that case, Facebook made a mistake, conflating her online identity with that

  of her parents. But sometimes these platforms get it a little too right.

  You may have heard the story in the news a few years ago. A father was livid

  to receive advertisements from Target in the mail, addressed to his teenage

  daughter, encouraging her to buy diapers, cribs, baby clothes, and other

  pregnancy-related items. “Are you trying to encourage her to get pregnant?”

  he reportedly asked the store manager. A few days later, the manager phoned

  the father to reiterate his apology. But the father told him, “It turns out

  there’s been some activities in my house I haven’t been completely aware of.

  [My daughter]’s due in August. I owe you an apology. ”1

  It turns out that Target, like many retailers today, was using customers’

  purchasing history to predict future purchases and send them advertisements

  and individualized coupons based on those predictions, all in an effort to

  convince existing customers to turn Target into their one-stop shop.

  But it’s not just advertisers who feed user data into predictive models that

  generate hyper-targeted results. In her book Algorithms of Oppression: How

  Search Engines Reinforce Racism, Safiya Umoja Noble tells the story of a harrowing

  Google search. Looking for resources on growing up as a young woman of color

  in rural America to share with her daughter and her friends, Noble picked up

  her laptop and searched for “black girls” on Google. She writes:

  I had almost inadvertently exposed them to one of the most graphic

  and overt illustrations of what the advertisers already thought of

  them: Black girls were still the fodder of porn sites, dehumanizing

  them as commodities, as products and as objects of sexual gratifica-

  tion. I closed the laptop and redirected our attention to fun things

  we might do, such as see a movie down the street.2

  1Charles Duhigg, “How Companies Learn Your Secrets,” The New York Times Magazine,

  published February 16, 2012, www.nytimes.com/2012/02/19/magazine/shopping-

  habits.html.

  2Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (New

  York University Press, 2018), p. 18.

  Data versus Democracy

  33

  Noble’s book is full of other example searches that, like the ones we examined

  in Chapter 1, return predictive results based on the biases and prejudices of

  society—image searches for “gorillas” that return pictures of African

  Americans, searches for “girls” that return older and more sexualized results

  than searches for “boys,” etc. In these cases, the results are not “personalized”—

  Noble was clearly not looking for the pornography that Google’s search

  engine provided her in the first page of search results. The results take the

  biases of the programmers who created the algorithm and the biases of the

  users who search for “black girls” and, apparently, click on links to pornography

  more often than links to web sites containing resources meant to empower

  young women of color. These biases then return results predictive of what

  most users “want” to see.

  Google has since addressed problems like this. Now, a search for something

  neutral like “black girls” on my computer (with SafeSearch off) returns sites

  like “Black Girls Rock!” and “Black Girls Code,” as well as images of black

  women fully dressed. (Note that only a minority of the images included actual

  girls, as opposed to grown women. Progress is progress, but we still have work

  to do…)

  But what happens when the search itself isn’t neutral? What if the search is

  explicitly racist?

  Before terrorist Dylann Roof committed mass murder in Charleston, South

  Carolina, in 2015, he left behind a “manifesto.” In that manifesto he claims

  (and let me be clear, I take the claims made in any terrorist’s manifesto with a

  humungous grain of salt) that he was motivated to pursue his “race war” by a

  Google search. According to Roof, he searched for “black on white crime,”

  and the resulting pages provided him with all the information he needed to be

  self-radicalized.3 Google has since made updates that correct searches like

  this, as well, to the point that three of the top ten results for that search on

  my computer today lead me to official U.S. government crime statistics. But

  in 2015, people looking for government crime statistics broken down by

  demographic rarely searched using terms like “bla
ck on white crime” or “black

  on black crime.” Those were phrases most often used by racists, both in their

  searches and on their web sites. That very search reflects the likely racism of

  the search engine user, and even a “neutral” algorithm would likely predict

  racist web sites to be the most desirable results.

  Now, let me be clear. When someone who clearly has the capacity to kill nine

  perfect strangers in a church at a Bible study searches for “black on white

  crime,” that person is already well on the route to radicalization. Dylann Roof

  was not radicalized by a Google search. But given all search engines’ ability to

  return biased results for a “neutral” search like “black girls,” it’s important to

  3“Google and the Miseducation of Dylann Roof,” Southern Poverty Law Center, published

  January 18, 2017, www.splcenter.org/20170118/google-and-miseducation-dylann-roof.

  34

  Chapter 3 | Swimming Upstream

  note how biased the predictive results can be when the input itself is biased,

  even hateful.

  The bias-multiplication effect of search engine algorithms is perhaps seen

  most starkly in what happened to The Guardian reporter, Carole Cadwalladr,

  a few years ago. She writes:

  One week ago, I typed “did the hol” into a Google search box and

  clicked on its autocomplete suggestion, “Did the Holocaust happen?”

  And there, at the top of the list, was a link to Stormfront, a neo-Nazi

  white supremacist web site, and an article entitled “Top 10 reasons

  why the Holocaust didn’t happen. ”4

  Perhaps even more scandalous than the anti-Semitic search results was

  Google’s initial response. Google issued a statement, saying, “We are saddened

  to see that hate organizations still exist. The fact that hate sites appear in

  search results does not mean that Google endorses these views.” But they

  declined at the time to remove the neo-Nazi sites from their search results.

  Eventually, Google did step in and correct the results for that search, 5 so that

  today a Google search for “Did the holocaust happen?” returns an entire page

  of results confirming that it did, in fact, happen and explaining the phenomenon

  of Holocaust denial. In my search just now, there was even an article in the

  first page of results from The Guardian about the history of this search term

  scandal.

  Google’s initial response—that the algorithm was neutral, and that it would

  be inappropriate to alter the search results manually, regardless of how

  reprehensible they may be—hits at the core of what we will unpack in this

  chapter. This idea of algorithmic neutrality is a data science fallacy. Developing

  a machine learning model always means refining the algorithm when the

  outputs do not match the expectations or demands of the model. That’s

  because mistakes and biases always creep into the system, and sometimes

  they can only be detected once the algorithm is put to work at a large scale.

  In the case of search results, though, the model is constantly evolving, as

  millions of searches are performed each day, each one altering the dataset the

  model is “trained” on in real time. Not only will a search engine algorithm

  propagate the conscious and unconscious bias of its programmers, it is also

  open to being gamed consciously or unconsciously by its users.

  4Carole Cadwalladr, “How to bump Holocaust deniers off Google’s top spot? Pay Google,”

  The Observer, published December 17, 2016, www.theguardian.com/technology/2016/

  dec/17/holocaust-deniers-google-search-top-spot.

  5Jeff John Roberts, “Google Demotes Holocaust Denial and Hate Sites in Update to

  Algorithm,” Fortune, published December 20, 2016, http://fortune.com/2016/12/20/

  google-algorithm-update/.

  Data versus Democracy

  35

  Left unchecked, a content recommendation algorithm is a bias amplifier. That

  is as true for social network feeds as for search engines. Understanding how

  they work on a general level can help us better understand online self-

  radicalization, partisan polarization, and how actors can manipulate platforms

  to advance their agendas on a massive scale. It can also help us think more

  critically about the information we encounter in our own social feeds and

  search results, and give us the foothold needed to effect positive change in the

  online information space.

  So let’s dive in…

  How the Stream Works

  Social media and search algorithms are important trade secrets. Outside of

  the developers who work on those algorithms directly, no one can know with

  100% certainty how they work in all the gory details. However, equipped with

  a knowledge of how machine learning algorithms work in general, alongside

  experiences of seeing the inputs and outputs of the models, we can reverse

  engineer a fair amount of the details. Add that to research papers published

  by the data scientists behind these models, and we can paint a fairly good

  picture of how they work and the effects they have on society.

  Let’s use a search engine as an example. When someone performs an image

  search, the search algorithm, or model, considers a number of inputs alongside

  the search term in order to determine the output—the result images in an

  order that surfaces the images most likely to be clicked at the top of the list.

  In addition to the search query itself, the model considers information from

  my user profile and past activity.6 For example, if my profile tells it that I

  currently live in the United States, I’ll likely see different results than if my

  profile tells the model I live in India or Venezuela. My profile for most platforms

  also contains information about my gender, age, education, marital status,

  race, etc., either because I provided that information or because the platform

  inferred it from my activity. 7 Inferred data is, of course, imperfect—one of my

  Google accounts thinks I’m male, another thinks I’m female, and both have

  wildly misidentified my taste in music and sports—but it is used to personalize

  content alongside user-provided data.

  Depending on the platform, various kinds of past activity are weighed to

  determine the content recommended by the algorithm. For a search engine,

  that may be past searches and the results we clicked. For advertising platforms,

  6“How search algorithms work,” Google, www.google.com/search/howsearchworks/

  algorithms/.

  7You can find some of the information Google has inferred about you from your online

  activity at https://adssettings.google.com/.

  36

  Chapter 3 | Swimming Upstream

  it may include purchase made at partner stores, products viewed on partner

  e-commerce web sites, or retailers in our area frequented by many of our

  social media contacts. For a social network content feed, it likely includes the

  kinds of content we post and the kinds of content we “engage” most, measured

  by likes/favorites, comments, clicks, and in some cases, the amount of time we

  spend looking at it before we scroll by.8

  But there’s a third, and probably the most important, category of inputs that

  help the model dete
rmine the content we see: data that has nothing to do with us.

  This, of course, includes information about the content under consideration—

  the data and metadata that makes that content unique, general information

  about its popularity, etc. But it also contains information about the activity of

  other users.9

  Data from other users is key for any content recommendation engine.

  Without that data, the recommendation would only have past user activity as

  the basis for determining the best content to surface next. Because every

  person and situation is unique, that small amount of data would provide very

  little context in which to make a decision. In many cases, it would be flying

  blind, with no relevant data to draw on to make a prediction. As a result, the

  algorithm’s recommendations would be rather poor.

  Think about it this way. In many respects, a content recommendation engine

  is like a dating app, except instead of matching a person with another person,

  it matches a person with content. Only in the case of a dating app, it’s possible

  to require a minimum amount of data from a user before proceeding, ensuring

  that user matches can be filtered and ranked according to complete profiles. 10

  The same is true for the algorithms that assign medical school graduates with

  residencies. 11 But in the case of search engines, social networks, and even

  “personalized” educational apps, the algorithm needs to filter and rank content

  that’s unlike any the user has engaged with before. The solution is to join the

  user’s data with that of other users in a process called collaborative filtering.

  To understand collaborative filtering, let’s consider a simplistic profile for

  musical taste. Suppose there were a number of musical features that were

  8I don’t want to call out any single developer or company here, but a web search for “track

  user scrolls on a page” returns a number of solutions for web developers who want to

  track what web site visitors scroll by and what they don’t.

  9This is even true for “personalized” education apps. See “Knewton Adaptive Learning:

  Building the world’s most powerful education recommendation engine,” Knewton,

  accessed July 24, 2017, https://cdn.tc-library.org/Edlab/Knewton-adaptive-

  learning-white-paper-1.pdf.

 

‹ Prev