by Kris Shaffer
10This type of model is true for survey-based dating apps like eHarmony and OkCupid, not
for behavior-based apps like Tinder.
11“How the Matching Algorithm Works,” The National Resident Match Program,
www.nrmp.org/matching-algorithm/.
Data versus Democracy
37
present or absent in any given musical track: distortion guitar, grand piano,
operatic soprano vocals, major key, minor key, fast tempo, slow tempo,
improvised instrumental solos, etc. Some of these are binary (either there are
bagpipes or there aren’t), and others exist on a scale (the relative volume of
that distortion guitar, the actual tempo measurement in beats per minute,
etc.). The more features the model contains, the more refined predictions it
can make. But the more features the model contains, the more data it needs
to make those predictions. And the more likely it is that at least some of that
data is missing from a user’s profile.
This leads to a paradox: to ensure a good user experience, especially when
trying to hook new users, the app needs to collect as much relevant data as
possible before choosing the song. On the other hand, to ensure a good user
experience, the app needs to serve up good songs with as little delay as
possible—no time for onboarding.
In order to provide a quality user experience, the algorithm needs a way to
make good predictions without a complete profile. That’s where collaborative
filtering comes in.
Collaborative filtering provides a way to fill in the gaps of a user’s profile by
comparing them with other users.12 The theory behind it is this: if user A and
user B have similar tastes for the features they both have data on, they are
likely to have similar tastes for the features where one of them is missing data.
In other words, if a friend and I both like distortion guitar, fast tempos, and
dislike jazz, then my tastes about various classical music features will be used
to make music recommendations for that friend, and their taste about country
music will be used to inform my music recommendations. Our incomplete but
overlapping profiles will “collaborate” to “filter” each other’s musical
recommendations—hence the name.
With millions (or, in Facebook’s case, billions) of users contributing data to the
same model, the algorithm can theoretically cluster all of those users into
tens or hundreds of thousands of “collaborative” groups, whose profiles will
be combined into one super-profile. That super-profile can be used to filter
and rank potential content for all of those hundreds or thousands of users in
the group, and each one of them will encounter a “personalized” experience—
one that is different from anyone else they know.
The clusters can operate at various levels of detail. When I join Pandora and
select the preloaded New Wave station as my first listening experience, it
serves up songs based on the taste of other users with a New Wave station
in their library. But as I give a thumbs up to The Cure, Depeche Mode, and
12Albert Au Yueng, “Matrix Factorization: A Simple Tutorial and Implementation in
Python,” quuxlabs, published September 16, 2010, www.quuxlabs.com/blog/2010/09/
matrix-factorization-a-simple-tutorial-and-implementation-in-python/.
38
Chapter 3 | Swimming Upstream
A Flock of Seagulls and a thumbs down to The Smiths and most songs by
Duran Duran, it starts to align me with a smaller cluster of listeners who
prefer “Space Age Love Song” to “Girls on Film.”
To summarize, models make better predictions when they have access to
more data—both more unique observations and more features for each
observation. However, the more features a model takes into account, the
more likely it is that each user’s profile will be missing critical features. So an
additional algorithmic model will cluster users together according to their
similarity of known features, so that a super-profile can be created which will
provide data to fill in the unknown features. The predictive model then uses
these newly complete profiles to generate content recommendations. This
leads to a “personalized” experience that, in many ways, amounts to a cluster-
based experience. They have a similar experience to many other users, they
simply don’t interact with those other users, so their experience feels unique.
Bias Amplifier
Think back to the image searches we performed in Chapter 1: doctor, nurse, professor, teacher, etc. As discussed in that chapter, the feedback loop between
the algorithm and the humans that use it takes already existing human biases
and amplifies them. With a bit more understanding of how collaborative
filtering works, we can now add some nuance to that feedback loop.
Figure 3-1 illustrates the feedback loop(s) by which human biases are amplified
and propagated through unchecked algorithmic content delivery. When a
user performs a search, the model takes their search terms and any metadata
around the search (location, timing, etc.) as inputs, along with data about the
user from their profile and activity history, and other information from the
platform’s database, like content features and the profiles and preferences of
other similar users. Based on this data, the model delivers results—filtered
and ranked content, according to predictions made about what the user is
most likely to engage with. 13
13Note that I didn’t say “most likely to be satisfied with.” Attention is the commodity, and
engagement the currency, in this new economy. Taste is much harder to quantify, and thus
to charge advertisers for.
Data versus Democracy
39
content
database
media
creation
user profile
model
search
& activity
results
search
(meta)data
our brains &
media
social
consumption
stereotypes
Figure 3-1. The feedback loop of human-algorithm interaction
But that’s not the entire process. When we perform a search or we open up
our Facebook or Twitter app, we do something with that algorithmically
selected content. Consider the image searches from the Introduction— doctor,
nurse, professor, teacher, etc.—and let’s assume I want to create a web page for a health-related event. I search for stock images of doctors and nurses to
include in that brochure. When I search for an image of a doctor, the search
results will be filtered according to what the search knows (and guesses)
about me, what it knows (and guesses) about users assessed to have similar
tastes, what content it has in its database, and what general information it
knows about that content and general engagement with it. As we know from
the Introduction, the biases about what a doctor does/should look like that
are present in the world will influence the search results, and the results will
in turn influence our perception of the world and our biases, which will
influence further search results, etc.
But we can add so
me nuance to that understanding. First, because of the
processes of collaborative filtering, the biases I already experience and the
biases of people already similar to me are the ones that will most strongly
influence the output for my search. This is most starkly seen in Dylann Roof’s
alleged search for “black on white crime.” Any objective crime statistics that
might have offered at least a small check on his hateful extremism was masked
by the fact that searches for the specific phrase “black on white crime” from
users with similar internet usage patterns to Roof’s were likely to filter out
the more objective and moderate content from his results.
Bias amplification can be even stronger on social media platforms like Facebook
or Twitter. There the content of a user’s feed is already filtered by the people
40
Chapter 3 | Swimming Upstream
they are friends with and the pages they “like.” Since we are already prone to
associate more with people like us in some way than those who are not, that
already represents a significant potential filter bubble. When our past
engagement data and the results of the collaborative filtering process are also
taken into account, the content we see can be extremely narrow. Intervening
by following pages and befriending people who represent a wider range of
perspectives can only help so much, as it affects the first filter, but not the
collaborative engagement-based filter. This is why close friends or family
members who have many friends in common may still see radically different
content in their feeds. And since both the networks of friends/pages/groups
we have curated and the posts we “like” and otherwise engage with tend to
reflect our personal biases and limits of perspective, the content we encounter
on the platform will tend to reflect those biases and limited perspectives as
well.
That leads to a second point: if the content served up to me by algorithmic
recommendation is biased in ways that reflect how I already think about the
world, I am not only more likely to engage with that bias, I am more likely to
spread it. An increasing number of people are finding their news on social
media platforms.14 But if it’s easier to find information in a one-stop shop like Facebook or Twitter, just think of how much easier it is to share information
found on that platform. With just a tap or two, I can repropagate an article,
photo, or video I encounter—without necessarily even reading the article or
watching the entire video, if the previewed content gets me excited enough.
And this is true for algorithmic feeds like Twitter and Facebook in a way that
isn’t true for expertly curated content like that found in a print newspaper or
a college textbook.
This sharing optimization compounds the filter bubble effect. Because it is
easier to find information that reflects my existing biases and easier to share it,
my contributions to others’ social feeds will reflect my biases even more than
if I only shared content that I found elsewhere on the internet. And, of course,
the same is true for their contributions to my feed. This creates a feedback
loop of bias amplification: I see things in accordance with my bias, I share a
subset of that content that is chosen in accordance with that bias, and that
feeds into the biased content the people in my network consume, from which
they choose a subset in accordance with their bias to share with me, and so
on. Just like the image search results in the Introduction (but likely more
extreme), left unchecked this feedback loop will continue to amplify the biases
already present among users, and the process will accelerate the more people
find their news via social media feeds and the more targeted the algorithm
14Kevin Curry, “More and more people get their news via social media. Is that good or
bad?,” Monkey Cage, The Washington Post, published September 30, 2016, www.washing-
tonpost.com/news/monkey-cage/wp/2016/09/30/more-and-more-people-
get-their-news-via-social-media-is-that-good-or-bad/.
Data versus Democracy
41
becomes. And given the way that phenomena like clickbait can dominate our
attention, not only will the things that reflect our own bias propagate faster
in an algorithmically driven content stream, but so will content engineered to
manipulate our attention. Put together, clickbait that confirms our preexisting
biases should propagate at disproportionally high speeds. And that, in fact, is
what we’ve seen happen in critical times like the lead up to the 2016 U.S.
presidential election.15 But, perhaps most importantly, the skewed media
consumption that results will feed into our personal and social stereotypes
about the world, influencing our behavior and relationships both online and in
person.
Sometimes bias amplification works one-way. In cases like gender, racial, and
other demographic representation we considered in the Introduction, the
dominant group has been dominant long enough that algorithms tend to
amplify the pervasive bias in that same, singular direction. But this is not
always the case. When it comes to politics where, especially in the United
States, we are relatively equally divided into two groups, the amplification of
bias is not one-sided, but two-sided or multisided. The result, then, is
polarization.
Polarization is easy enough to grasp. It simply means the increase in ideological
difference and/or animosity between two or more opposing groups. 16 Digital
polarization is in large part a result of the bias-amplification feedback loop
applied to already like-minded groups. As biases get amplified within a group,
it becomes more and more of a “filter bubble” or “echo chamber,” where
content uncritically promotes in-group thinking and uncritically vilifies the
opposition. Adding fuel to the fire, we also know (as discussed in Chapter 2)
that engagement increases when content is emotionally evocative—both
positive and negative, and especially anger. This means not only more content
supporting your view and discounting others, but more content is shared and
reshared that encourages anger toward those outside the group. This makes
it harder to listen to the other side even when more diverse content does
make it through the filter.
Letting Your Guard Down
There’s another major problem that, left unchecked, causes algorithmically
selected content to increase bias, polarization, and even the proliferation of
“fake news.” Social platforms are designed for optimal engagement and primed for
15Craig Silverman, “This Analysis Shows How Viral Fake Election News Stories
Outperformed Real News On Facebook,” BuzzFeed News, published November 16, 2016,
www.buzzfeednews.com/article/craigsilverman/viral-fake-election-news-
outperformed-real-news-on-facebook.
16“Political Polarization in the American Public,” Pew Research Center, published June 12, 2014,
www.people-press.org/2014/06/12/political-polarization-in-the-american-public/.
42
Chapter 3 | Swimming Upstream
believability. Or, in the words of Renee DiResta, “Our political con
versations
are happening on an infrastructure built for viral advertising, and we are only
beginning to adapt. ”17
There are several facets to this. First, social media encourages a relaxed
posture toward information consumption and evaluation. By putting important
news and policy debates alongside cat GIFs, baby pictures, commercial
advertisements, and party invitations, social media puts us in a very different—
and less critical—posture than a book, newspaper, or even magazine. Many
users check social media when they are waiting in line at the store, riding the
bus or train to work, even lying in bed. The relaxed posture can be great for
social interactions, but that inhibition relaxation combined with the
“attentional blink” that comes from shifting between cute cats and neo-Nazi
counterprotests can make it difficult to think critically about what we believe
and what we share.
Second, social media platforms are designed to promote engagement, even to
the point of addiction. The change from a star to a heart for a Twitter “favorite,”
the increased emotion-based engagements on Facebook, all of these measures
were taken in order to increase user engagement. And they worked.18 Meanwhile,
former employees of Google, Twitter, and Facebook have gone public with ways
the platforms have been designed to promote addictive behavior,19 and an
increasing number of Silicon Valley tech employees have announced that they
severely limit—or even prohibit—screen time for their own children.20 Promoting
engagement, even addictive behavior, alongside relaxed posturing is mostly
harmless when it comes to baby pictures and cute cat videos, but it is not a
recipe for careful, critical thinking around the major issues of the day.
Adding fuel to this fire, a recent study suggests that people judge the veracity
of content on social media not by the source of the content but by the
credibility of the person who shared it.21 This means that even when we exercise 17Renee DiResta, “Free Speech in the Age of Algorithmic Microphones,” WIRED, published October 12, 2018, www.wired.com/story/facebook-domestic-disinformation-
algorithmic-megaphones/.
18Drew Olanoff, “Twitter Sees 6% Increase In ‘Like’ Activity After First Week Of Hearts,”
TechCrunch, published November 10, 2015, https://techcrunch.com/2015/11/10/