Data Versus Democracy Page 8 Read online free by Kris Shaffer

Home > Other > Data Versus Democracy > Page 8

Data Versus Democracy Page 8

10This type of model is true for survey-based dating apps like eHarmony and OkCupid, not

for behavior-based apps like Tinder.

11“How the Matching Algorithm Works,” The National Resident Match Program,

www.nrmp.org/matching-algorithm/.

Data versus Democracy

37

present or absent in any given musical track: distortion guitar, grand piano,

operatic soprano vocals, major key, minor key, fast tempo, slow tempo,

improvised instrumental solos, etc. Some of these are binary (either there are

bagpipes or there aren’t), and others exist on a scale (the relative volume of

that distortion guitar, the actual tempo measurement in beats per minute,

etc.). The more features the model contains, the more refined predictions it

can make. But the more features the model contains, the more data it needs

to make those predictions. And the more likely it is that at least some of that

data is missing from a user’s profile.

This leads to a paradox: to ensure a good user experience, especially when

trying to hook new users, the app needs to collect as much relevant data as

possible before choosing the song. On the other hand, to ensure a good user

experience, the app needs to serve up good songs with as little delay as

possible—no time for onboarding.

In order to provide a quality user experience, the algorithm needs a way to

make good predictions without a complete profile. That’s where collaborative

filtering comes in.

Collaborative filtering provides a way to fill in the gaps of a user’s profile by

comparing them with other users.12 The theory behind it is this: if user A and

user B have similar tastes for the features they both have data on, they are

likely to have similar tastes for the features where one of them is missing data.

In other words, if a friend and I both like distortion guitar, fast tempos, and

dislike jazz, then my tastes about various classical music features will be used

to make music recommendations for that friend, and their taste about country

music will be used to inform my music recommendations. Our incomplete but

overlapping profiles will “collaborate” to “filter” each other’s musical

recommendations—hence the name.

With millions (or, in Facebook’s case, billions) of users contributing data to the

same model, the algorithm can theoretically cluster all of those users into

tens or hundreds of thousands of “collaborative” groups, whose profiles will

be combined into one super-profile. That super-profile can be used to filter

and rank potential content for all of those hundreds or thousands of users in

the group, and each one of them will encounter a “personalized” experience—

one that is different from anyone else they know.

The clusters can operate at various levels of detail. When I join Pandora and

select the preloaded New Wave station as my first listening experience, it

serves up songs based on the taste of other users with a New Wave station

in their library. But as I give a thumbs up to The Cure, Depeche Mode, and

12Albert Au Yueng, “Matrix Factorization: A Simple Tutorial and Implementation in

Python,” quuxlabs, published September 16, 2010, www.quuxlabs.com/blog/2010/09/

matrix-factorization-a-simple-tutorial-and-implementation-in-python/.

38

Chapter 3 | Swimming Upstream

A Flock of Seagulls and a thumbs down to The Smiths and most songs by

Duran Duran, it starts to align me with a smaller cluster of listeners who

prefer “Space Age Love Song” to “Girls on Film.”

To summarize, models make better predictions when they have access to

more data—both more unique observations and more features for each

observation. However, the more features a model takes into account, the

more likely it is that each user’s profile will be missing critical features. So an

additional algorithmic model will cluster users together according to their

similarity of known features, so that a super-profile can be created which will

provide data to fill in the unknown features. The predictive model then uses

these newly complete profiles to generate content recommendations. This

leads to a “personalized” experience that, in many ways, amounts to a cluster-

based experience. They have a similar experience to many other users, they

simply don’t interact with those other users, so their experience feels unique.

Bias Amplifier

Think back to the image searches we performed in Chapter 1: doctor, nurse, professor, teacher, etc. As discussed in that chapter, the feedback loop between

the algorithm and the humans that use it takes already existing human biases

and amplifies them. With a bit more understanding of how collaborative

filtering works, we can now add some nuance to that feedback loop.

Figure 3-1 illustrates the feedback loop(s) by which human biases are amplified

and propagated through unchecked algorithmic content delivery. When a

user performs a search, the model takes their search terms and any metadata

around the search (location, timing, etc.) as inputs, along with data about the

user from their profile and activity history, and other information from the

platform’s database, like content features and the profiles and preferences of

other similar users. Based on this data, the model delivers results—filtered

and ranked content, according to predictions made about what the user is

most likely to engage with. 13

13Note that I didn’t say “most likely to be satisfied with.” Attention is the commodity, and

engagement the currency, in this new economy. Taste is much harder to quantify, and thus

to charge advertisers for.

Data versus Democracy

39

content

database

media

creation

user profile

model

search

& activity

results

search

(meta)data

our brains &

media

social

consumption

stereotypes

Figure 3-1. The feedback loop of human-algorithm interaction

But that’s not the entire process. When we perform a search or we open up

our Facebook or Twitter app, we do something with that algorithmically

selected content. Consider the image searches from the Introduction— doctor,

nurse, professor, teacher, etc.—and let’s assume I want to create a web page for a health-related event. I search for stock images of doctors and nurses to

include in that brochure. When I search for an image of a doctor, the search

results will be filtered according to what the search knows (and guesses)

about me, what it knows (and guesses) about users assessed to have similar

tastes, what content it has in its database, and what general information it

knows about that content and general engagement with it. As we know from

the Introduction, the biases about what a doctor does/should look like that

are present in the world will influence the search results, and the results will

in turn influence our perception of the world and our biases, which will

influence further search results, etc.

But we can add so
me nuance to that understanding. First, because of the

processes of collaborative filtering, the biases I already experience and the

biases of people already similar to me are the ones that will most strongly

influence the output for my search. This is most starkly seen in Dylann Roof’s

alleged search for “black on white crime.” Any objective crime statistics that

might have offered at least a small check on his hateful extremism was masked

by the fact that searches for the specific phrase “black on white crime” from

users with similar internet usage patterns to Roof’s were likely to filter out

the more objective and moderate content from his results.

Bias amplification can be even stronger on social media platforms like Facebook

or Twitter. There the content of a user’s feed is already filtered by the people

40

Chapter 3 | Swimming Upstream

they are friends with and the pages they “like.” Since we are already prone to

associate more with people like us in some way than those who are not, that

already represents a significant potential filter bubble. When our past

engagement data and the results of the collaborative filtering process are also

taken into account, the content we see can be extremely narrow. Intervening

by following pages and befriending people who represent a wider range of

perspectives can only help so much, as it affects the first filter, but not the

collaborative engagement-based filter. This is why close friends or family

members who have many friends in common may still see radically different

content in their feeds. And since both the networks of friends/pages/groups

we have curated and the posts we “like” and otherwise engage with tend to

reflect our personal biases and limits of perspective, the content we encounter

on the platform will tend to reflect those biases and limited perspectives as

well.

That leads to a second point: if the content served up to me by algorithmic

recommendation is biased in ways that reflect how I already think about the

world, I am not only more likely to engage with that bias, I am more likely to

spread it. An increasing number of people are finding their news on social

media platforms.14 But if it’s easier to find information in a one-stop shop like Facebook or Twitter, just think of how much easier it is to share information

found on that platform. With just a tap or two, I can repropagate an article,

photo, or video I encounter—without necessarily even reading the article or

watching the entire video, if the previewed content gets me excited enough.

And this is true for algorithmic feeds like Twitter and Facebook in a way that

isn’t true for expertly curated content like that found in a print newspaper or

a college textbook.

This sharing optimization compounds the filter bubble effect. Because it is

easier to find information that reflects my existing biases and easier to share it,

my contributions to others’ social feeds will reflect my biases even more than

if I only shared content that I found elsewhere on the internet. And, of course,

the same is true for their contributions to my feed. This creates a feedback

loop of bias amplification: I see things in accordance with my bias, I share a

subset of that content that is chosen in accordance with that bias, and that

feeds into the biased content the people in my network consume, from which

they choose a subset in accordance with their bias to share with me, and so

on. Just like the image search results in the Introduction (but likely more

extreme), left unchecked this feedback loop will continue to amplify the biases

already present among users, and the process will accelerate the more people

find their news via social media feeds and the more targeted the algorithm

14Kevin Curry, “More and more people get their news via social media. Is that good or

bad?,” Monkey Cage, The Washington Post, published September 30, 2016, www.washing-

tonpost.com/news/monkey-cage/wp/2016/09/30/more-and-more-people-

get-their-news-via-social-media-is-that-good-or-bad/.

Data versus Democracy

41

becomes. And given the way that phenomena like clickbait can dominate our

attention, not only will the things that reflect our own bias propagate faster

in an algorithmically driven content stream, but so will content engineered to

manipulate our attention. Put together, clickbait that confirms our preexisting

biases should propagate at disproportionally high speeds. And that, in fact, is

what we’ve seen happen in critical times like the lead up to the 2016 U.S.

presidential election.15 But, perhaps most importantly, the skewed media

consumption that results will feed into our personal and social stereotypes

about the world, influencing our behavior and relationships both online and in

person.

Sometimes bias amplification works one-way. In cases like gender, racial, and

other demographic representation we considered in the Introduction, the

dominant group has been dominant long enough that algorithms tend to

amplify the pervasive bias in that same, singular direction. But this is not

always the case. When it comes to politics where, especially in the United

States, we are relatively equally divided into two groups, the amplification of

bias is not one-sided, but two-sided or multisided. The result, then, is

polarization.

Polarization is easy enough to grasp. It simply means the increase in ideological

difference and/or animosity between two or more opposing groups. 16 Digital

polarization is in large part a result of the bias-amplification feedback loop

applied to already like-minded groups. As biases get amplified within a group,

it becomes more and more of a “filter bubble” or “echo chamber,” where

content uncritically promotes in-group thinking and uncritically vilifies the

opposition. Adding fuel to the fire, we also know (as discussed in Chapter 2)

that engagement increases when content is emotionally evocative—both

positive and negative, and especially anger. This means not only more content

supporting your view and discounting others, but more content is shared and

reshared that encourages anger toward those outside the group. This makes

it harder to listen to the other side even when more diverse content does

make it through the filter.

Letting Your Guard Down

There’s another major problem that, left unchecked, causes algorithmically

selected content to increase bias, polarization, and even the proliferation of

“fake news.” Social platforms are designed for optimal engagement and primed for

15Craig Silverman, “This Analysis Shows How Viral Fake Election News Stories

Outperformed Real News On Facebook,” BuzzFeed News, published November 16, 2016,

www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-

outperformed-real-news-on-facebook.

16“Political Polarization in the American Public,” Pew Research Center, published June 12, 2014,

www.people-press.org/2014/06/12/political-polarization-in-the-american-public/.

42

Chapter 3 | Swimming Upstream

believability. Or, in the words of Renee DiResta, “Our political con
versations

are happening on an infrastructure built for viral advertising, and we are only

beginning to adapt. ”17

There are several facets to this. First, social media encourages a relaxed

posture toward information consumption and evaluation. By putting important

news and policy debates alongside cat GIFs, baby pictures, commercial

advertisements, and party invitations, social media puts us in a very different—

and less critical—posture than a book, newspaper, or even magazine. Many

users check social media when they are waiting in line at the store, riding the

bus or train to work, even lying in bed. The relaxed posture can be great for

social interactions, but that inhibition relaxation combined with the

“attentional blink” that comes from shifting between cute cats and neo-Nazi

counterprotests can make it difficult to think critically about what we believe

and what we share.

Second, social media platforms are designed to promote engagement, even to

the point of addiction. The change from a star to a heart for a Twitter “favorite,”

the increased emotion-based engagements on Facebook, all of these measures

were taken in order to increase user engagement. And they worked.18 Meanwhile,

former employees of Google, Twitter, and Facebook have gone public with ways

the platforms have been designed to promote addictive behavior,19 and an

increasing number of Silicon Valley tech employees have announced that they

severely limit—or even prohibit—screen time for their own children.20 Promoting

engagement, even addictive behavior, alongside relaxed posturing is mostly

harmless when it comes to baby pictures and cute cat videos, but it is not a

recipe for careful, critical thinking around the major issues of the day.

Adding fuel to this fire, a recent study suggests that people judge the veracity

of content on social media not by the source of the content but by the

credibility of the person who shared it.21 This means that even when we exercise 17Renee DiResta, “Free Speech in the Age of Algorithmic Microphones,” WIRED, published October 12, 2018, www.wired.com/story/facebook-domestic-disinformation-

algorithmic-megaphones/.

18Drew Olanoff, “Twitter Sees 6% Increase In ‘Like’ Activity After First Week Of Hearts,”

TechCrunch, published November 10, 2015, https://techcrunch.com/2015/11/10/

‹ Prev Next ›