Data Versus Democracy

Home > Other > Data Versus Democracy > Page 8
Data Versus Democracy Page 8

by Kris Shaffer


  10This type of model is true for survey-based dating apps like eHarmony and OkCupid, not

  for behavior-based apps like Tinder.

  11“How the Matching Algorithm Works,” The National Resident Match Program,

  www.nrmp.org/matching-algorithm/.

  Data versus Democracy

  37

  present or absent in any given musical track: distortion guitar, grand piano,

  operatic soprano vocals, major key, minor key, fast tempo, slow tempo,

  improvised instrumental solos, etc. Some of these are binary (either there are

  bagpipes or there aren’t), and others exist on a scale (the relative volume of

  that distortion guitar, the actual tempo measurement in beats per minute,

  etc.). The more features the model contains, the more refined predictions it

  can make. But the more features the model contains, the more data it needs

  to make those predictions. And the more likely it is that at least some of that

  data is missing from a user’s profile.

  This leads to a paradox: to ensure a good user experience, especially when

  trying to hook new users, the app needs to collect as much relevant data as

  possible before choosing the song. On the other hand, to ensure a good user

  experience, the app needs to serve up good songs with as little delay as

  possible—no time for onboarding.

  In order to provide a quality user experience, the algorithm needs a way to

  make good predictions without a complete profile. That’s where collaborative

  filtering comes in.

  Collaborative filtering provides a way to fill in the gaps of a user’s profile by

  comparing them with other users.12 The theory behind it is this: if user A and

  user B have similar tastes for the features they both have data on, they are

  likely to have similar tastes for the features where one of them is missing data.

  In other words, if a friend and I both like distortion guitar, fast tempos, and

  dislike jazz, then my tastes about various classical music features will be used

  to make music recommendations for that friend, and their taste about country

  music will be used to inform my music recommendations. Our incomplete but

  overlapping profiles will “collaborate” to “filter” each other’s musical

  recommendations—hence the name.

  With millions (or, in Facebook’s case, billions) of users contributing data to the

  same model, the algorithm can theoretically cluster all of those users into

  tens or hundreds of thousands of “collaborative” groups, whose profiles will

  be combined into one super-profile. That super-profile can be used to filter

  and rank potential content for all of those hundreds or thousands of users in

  the group, and each one of them will encounter a “personalized” experience—

  one that is different from anyone else they know.

  The clusters can operate at various levels of detail. When I join Pandora and

  select the preloaded New Wave station as my first listening experience, it

  serves up songs based on the taste of other users with a New Wave station

  in their library. But as I give a thumbs up to The Cure, Depeche Mode, and

  12Albert Au Yueng, “Matrix Factorization: A Simple Tutorial and Implementation in

  Python,” quuxlabs, published September 16, 2010, www.quuxlabs.com/blog/2010/09/

  matrix-factorization-a-simple-tutorial-and-implementation-in-python/.

  38

  Chapter 3 | Swimming Upstream

  A Flock of Seagulls and a thumbs down to The Smiths and most songs by

  Duran Duran, it starts to align me with a smaller cluster of listeners who

  prefer “Space Age Love Song” to “Girls on Film.”

  To summarize, models make better predictions when they have access to

  more data—both more unique observations and more features for each

  observation. However, the more features a model takes into account, the

  more likely it is that each user’s profile will be missing critical features. So an

  additional algorithmic model will cluster users together according to their

  similarity of known features, so that a super-profile can be created which will

  provide data to fill in the unknown features. The predictive model then uses

  these newly complete profiles to generate content recommendations. This

  leads to a “personalized” experience that, in many ways, amounts to a cluster-

  based experience. They have a similar experience to many other users, they

  simply don’t interact with those other users, so their experience feels unique.

  Bias Amplifier

  Think back to the image searches we performed in Chapter 1: doctor, nurse, professor, teacher, etc. As discussed in that chapter, the feedback loop between

  the algorithm and the humans that use it takes already existing human biases

  and amplifies them. With a bit more understanding of how collaborative

  filtering works, we can now add some nuance to that feedback loop.

  Figure 3-1 illustrates the feedback loop(s) by which human biases are amplified

  and propagated through unchecked algorithmic content delivery. When a

  user performs a search, the model takes their search terms and any metadata

  around the search (location, timing, etc.) as inputs, along with data about the

  user from their profile and activity history, and other information from the

  platform’s database, like content features and the profiles and preferences of

  other similar users. Based on this data, the model delivers results—filtered

  and ranked content, according to predictions made about what the user is

  most likely to engage with. 13

  13Note that I didn’t say “most likely to be satisfied with.” Attention is the commodity, and

  engagement the currency, in this new economy. Taste is much harder to quantify, and thus

  to charge advertisers for.

  Data versus Democracy

  39

  content

  database

  media

  creation

  user profile

  model

  search

  & activity

  results

  search

  (meta)data

  our brains &

  media

  social

  consumption

  stereotypes

  Figure 3-1. The feedback loop of human-algorithm interaction

  But that’s not the entire process. When we perform a search or we open up

  our Facebook or Twitter app, we do something with that algorithmically

  selected content. Consider the image searches from the Introduction— doctor,

  nurse, professor, teacher, etc.—and let’s assume I want to create a web page for a health-related event. I search for stock images of doctors and nurses to

  include in that brochure. When I search for an image of a doctor, the search

  results will be filtered according to what the search knows (and guesses)

  about me, what it knows (and guesses) about users assessed to have similar

  tastes, what content it has in its database, and what general information it

  knows about that content and general engagement with it. As we know from

  the Introduction, the biases about what a doctor does/should look like that

  are present in the world will influence the search results, and the results will

  in turn influence our perception of the world and our biases, which will

  influence further search results, etc.

  But we can add so
me nuance to that understanding. First, because of the

  processes of collaborative filtering, the biases I already experience and the

  biases of people already similar to me are the ones that will most strongly

  influence the output for my search. This is most starkly seen in Dylann Roof’s

  alleged search for “black on white crime.” Any objective crime statistics that

  might have offered at least a small check on his hateful extremism was masked

  by the fact that searches for the specific phrase “black on white crime” from

  users with similar internet usage patterns to Roof’s were likely to filter out

  the more objective and moderate content from his results.

  Bias amplification can be even stronger on social media platforms like Facebook

  or Twitter. There the content of a user’s feed is already filtered by the people

  40

  Chapter 3 | Swimming Upstream

  they are friends with and the pages they “like.” Since we are already prone to

  associate more with people like us in some way than those who are not, that

  already represents a significant potential filter bubble. When our past

  engagement data and the results of the collaborative filtering process are also

  taken into account, the content we see can be extremely narrow. Intervening

  by following pages and befriending people who represent a wider range of

  perspectives can only help so much, as it affects the first filter, but not the

  collaborative engagement-based filter. This is why close friends or family

  members who have many friends in common may still see radically different

  content in their feeds. And since both the networks of friends/pages/groups

  we have curated and the posts we “like” and otherwise engage with tend to

  reflect our personal biases and limits of perspective, the content we encounter

  on the platform will tend to reflect those biases and limited perspectives as

  well.

  That leads to a second point: if the content served up to me by algorithmic

  recommendation is biased in ways that reflect how I already think about the

  world, I am not only more likely to engage with that bias, I am more likely to

  spread it. An increasing number of people are finding their news on social

  media platforms.14 But if it’s easier to find information in a one-stop shop like Facebook or Twitter, just think of how much easier it is to share information

  found on that platform. With just a tap or two, I can repropagate an article,

  photo, or video I encounter—without necessarily even reading the article or

  watching the entire video, if the previewed content gets me excited enough.

  And this is true for algorithmic feeds like Twitter and Facebook in a way that

  isn’t true for expertly curated content like that found in a print newspaper or

  a college textbook.

  This sharing optimization compounds the filter bubble effect. Because it is

  easier to find information that reflects my existing biases and easier to share it,

  my contributions to others’ social feeds will reflect my biases even more than

  if I only shared content that I found elsewhere on the internet. And, of course,

  the same is true for their contributions to my feed. This creates a feedback

  loop of bias amplification: I see things in accordance with my bias, I share a

  subset of that content that is chosen in accordance with that bias, and that

  feeds into the biased content the people in my network consume, from which

  they choose a subset in accordance with their bias to share with me, and so

  on. Just like the image search results in the Introduction (but likely more

  extreme), left unchecked this feedback loop will continue to amplify the biases

  already present among users, and the process will accelerate the more people

  find their news via social media feeds and the more targeted the algorithm

  14Kevin Curry, “More and more people get their news via social media. Is that good or

  bad?,” Monkey Cage, The Washington Post, published September 30, 2016, www.washing-

  tonpost.com/news/monkey-cage/wp/2016/09/30/more-and-more-people-

  get-their-news-via-social-media-is-that-good-or-bad/.

  Data versus Democracy

  41

  becomes. And given the way that phenomena like clickbait can dominate our

  attention, not only will the things that reflect our own bias propagate faster

  in an algorithmically driven content stream, but so will content engineered to

  manipulate our attention. Put together, clickbait that confirms our preexisting

  biases should propagate at disproportionally high speeds. And that, in fact, is

  what we’ve seen happen in critical times like the lead up to the 2016 U.S.

  presidential election.15 But, perhaps most importantly, the skewed media

  consumption that results will feed into our personal and social stereotypes

  about the world, influencing our behavior and relationships both online and in

  person.

  Sometimes bias amplification works one-way. In cases like gender, racial, and

  other demographic representation we considered in the Introduction, the

  dominant group has been dominant long enough that algorithms tend to

  amplify the pervasive bias in that same, singular direction. But this is not

  always the case. When it comes to politics where, especially in the United

  States, we are relatively equally divided into two groups, the amplification of

  bias is not one-sided, but two-sided or multisided. The result, then, is

  polarization.

  Polarization is easy enough to grasp. It simply means the increase in ideological

  difference and/or animosity between two or more opposing groups. 16 Digital

  polarization is in large part a result of the bias-amplification feedback loop

  applied to already like-minded groups. As biases get amplified within a group,

  it becomes more and more of a “filter bubble” or “echo chamber,” where

  content uncritically promotes in-group thinking and uncritically vilifies the

  opposition. Adding fuel to the fire, we also know (as discussed in Chapter 2)

  that engagement increases when content is emotionally evocative—both

  positive and negative, and especially anger. This means not only more content

  supporting your view and discounting others, but more content is shared and

  reshared that encourages anger toward those outside the group. This makes

  it harder to listen to the other side even when more diverse content does

  make it through the filter.

  Letting Your Guard Down

  There’s another major problem that, left unchecked, causes algorithmically

  selected content to increase bias, polarization, and even the proliferation of

  “fake news.” Social platforms are designed for optimal engagement and primed for

  15Craig Silverman, “This Analysis Shows How Viral Fake Election News Stories

  Outperformed Real News On Facebook,” BuzzFeed News, published November 16, 2016,

  www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-

  outperformed-real-news-on-facebook.

  16“Political Polarization in the American Public,” Pew Research Center, published June 12, 2014,

  www.people-press.org/2014/06/12/political-polarization-in-the-american-public/.

  42

  Chapter 3 | Swimming Upstream

  believability. Or, in the words of Renee DiResta, “Our political con
versations

  are happening on an infrastructure built for viral advertising, and we are only

  beginning to adapt. ”17

  There are several facets to this. First, social media encourages a relaxed

  posture toward information consumption and evaluation. By putting important

  news and policy debates alongside cat GIFs, baby pictures, commercial

  advertisements, and party invitations, social media puts us in a very different—

  and less critical—posture than a book, newspaper, or even magazine. Many

  users check social media when they are waiting in line at the store, riding the

  bus or train to work, even lying in bed. The relaxed posture can be great for

  social interactions, but that inhibition relaxation combined with the

  “attentional blink” that comes from shifting between cute cats and neo-Nazi

  counterprotests can make it difficult to think critically about what we believe

  and what we share.

  Second, social media platforms are designed to promote engagement, even to

  the point of addiction. The change from a star to a heart for a Twitter “favorite,”

  the increased emotion-based engagements on Facebook, all of these measures

  were taken in order to increase user engagement. And they worked.18 Meanwhile,

  former employees of Google, Twitter, and Facebook have gone public with ways

  the platforms have been designed to promote addictive behavior,19 and an

  increasing number of Silicon Valley tech employees have announced that they

  severely limit—or even prohibit—screen time for their own children.20 Promoting

  engagement, even addictive behavior, alongside relaxed posturing is mostly

  harmless when it comes to baby pictures and cute cat videos, but it is not a

  recipe for careful, critical thinking around the major issues of the day.

  Adding fuel to this fire, a recent study suggests that people judge the veracity

  of content on social media not by the source of the content but by the

  credibility of the person who shared it.21 This means that even when we exercise 17Renee DiResta, “Free Speech in the Age of Algorithmic Microphones,” WIRED, published October 12, 2018, www.wired.com/story/facebook-domestic-disinformation-

  algorithmic-megaphones/.

  18Drew Olanoff, “Twitter Sees 6% Increase In ‘Like’ Activity After First Week Of Hearts,”

  TechCrunch, published November 10, 2015, https://techcrunch.com/2015/11/10/

 

‹ Prev