by Amy Webb
Conway’s law is a blind spot for the Big Nine because there’s a certain amount of heritability when it comes to AI. For now, people are still making choices every step along the way for AI’s development. Their personal ideas and the ideology of their tribe are what’s being passed down through the AI ecosystem, from the codebases to the algorithms to the frameworks to the design of the hardware and networks. If you—or someone whose language, gender, race, religion, politics, and culture mirror your own—are not in the room where it happens, you can bet that whatever gets built won’t reflect who you are. This isn’t a phenomenon unique to the field of AI, because real life isn’t a meritocracy. It’s our connections and relationships, regardless of industry, that lead to funding, appointments, promotions, and the acceptance of bold new ideas.
I’ve seen the negative effects of Conway’s law firsthand on more than one occasion. In July 2016, I was invited to a dinner roundtable on the future of AI, ethics, and society—it was held at the New York Yankees Steakhouse in Midtown Manhattan. There were 23 of us, seated boardroom-style, and our agenda was to debate and discuss some of the most pressing social and economic impacts of AI facing humanity, with a particular focus on gender, race, and AI systems that were being built for health care. However, the very people about whom we were having the discussion got overlooked on the invite list. There were two people of color in the room and four women—two were from the organization hosting us. No one invited had a professional or academic background in ethics, philosophy, or behavioral economics. It wasn’t intentional, I was told by the organizers, and I believe them. It just didn’t occur to anyone that the committee had invited a mostly all-male, nearly all-white group of experts.
We were the usual suspects, and we either knew each other personally or by reputation. We were a group of prominent computer science and neuroscience researchers, senior policy advisors from the White House, and senior executives from the tech industry. All throughout the evening, the group used only female pronouns to talk generally about people—a lexical tick that’s now in vogue, especially in the tech sector and among journalists who cover technology.
Now, we weren’t writing code or policy together that night. We weren’t testing an AI system or conceptualizing a new product. It was just a dinner. And yet in the months that followed, I noticed threads of our discussion popping up in academic papers, in policy briefings, and even in casual conversations I had with Big Nine researchers. Together, over our steaks and salads, our closed network of AI experts generated nuanced ideas about ethics and AI that propagated throughout the community—ideas that could not have been wholly representative of the very people they concerned. Lots of little paper cuts.
Holding meetings, publishing white papers, and sponsoring conference panels to discuss the problem of technological, economic, and social challenges within AI won’t move the needle without a grander vision and alignment on what our future ought to look like. We need to solve for Conway’s law, and we need to act swiftly.
Our Personal Values Drive Decisions
In the absence of codified humanistic values within the Big Nine, personal experiences and ideals are driving decision-making. This is particularly dangerous when it comes to AI, because students, professors, researchers, employees, and managers are making millions of decisions every day, from seemingly insignificant (what database to use) to profound (who gets killed if an autonomous vehicle needs to crash).
Artificial intelligence might be inspired by our human brains, but humans and AI make decisions and choices differently. Princeton professor Daniel Kahneman and Hebrew University of Jerusalem professor Amos Tversky spent years studying the human mind and how we make decisions, ultimately discovering that we have two systems of thinking: one that uses logic to analyze problems, and one that is automatic, fast, and nearly imperceptible to us. Kahneman describes this dual system in his award-winning book Thinking, Fast and Slow. Difficult problems require your attention and, as a result, a lot of mental energy. That’s why most people can’t solve long arithmetic problems while walking, because even the act of walking requires that energy-hungry part of the brain. It’s the other system that’s in control most of the time. Our fast, intuitive mind makes thousands of decisions autonomously all day long, and while it’s more energy efficient, it’s riddled with cognitive biases that affect our emotions, beliefs, and opinions.
We make mistakes because of the fast side of our brain. We overeat, or drink to excess, or have unprotected sex. It’s that side of the brain that enables stereotyping. Without consciously realizing it, we pass judgment on other people based on remarkably little data. Or those people are invisible to us. The fast side makes us susceptible to what I call the paradox of the present: when we automatically assume our present circumstances will not or cannot ever change, even when faced with signals pointing to something new or different. We may think that we are in complete control of our decision-making, but a part of us is continually on autopilot.
Mathematicians say that it’s impossible to make a “perfect decision” because of systems of complexity and because the future is always in flux, right down to a molecular level. It would be impossible to predict every single possible outcome, and with an unknowable number of variables, there is no way to build a model that could weigh all possible answers. Decades ago, when the frontiers of AI involved beating a human player at checkers, the decision variables were straightforward. Today, asking an AI to weigh in on a medical diagnosis or to predict the next financial market crash involves data and decisions that are orders of magnitude more complex. So instead, our systems are built for optimization. Implicit in optimizing is unpredictability—to make choices that deviate from our own human thinking.
When AlphaGo Zero abandoned human strategy and invented its own, it wasn’t deciding between preexisting alternatives; it was making a deliberate choice to try something completely different. It’s the latter thinking pattern that is a goal for AI researchers, because that’s what theoretically leads to great breakthroughs. So rather than training AI to make absolutely perfect decisions every time, instead they’re being trained to optimize for particular outcomes. But who—and what—are we optimizing for?
To that end, how does the optimization process work in real time? That’s actually not an easy question to answer. Machine- and deep-learning technologies are more cryptic than older hand-coded systems, and that’s because these systems bring together thousands of simulated neurons, which are arranged into hundreds of complicated, connected layers. After the initial input is sent to neurons in the first layer, a calculation is performed and a new signal is generated. That signal gets passed on to the next layer of neurons and the process continues until a goal is reached. All of these interconnected layers allow AI systems to recognize and understand data in myriad layers of abstraction. For example, an image recognition system might detect in the first layer that an image has particular colors and shapes, while in higher layers it can discern texture and shine. The topmost layer would determine that the food in a photograph is cilantro and not parsley.
Here’s an example of how optimizing becomes a problem when the Big Nine use our data to build real-world applications for commercial and government interests. Researchers at New York’s Ichan School of Medicine ran a deep-learning experiment to see if it could train a system to predict cancer. The school, based within Mount Sinai Hospital, had obtained access to the data for 700,000 patients, and the data set included hundreds of different variables. Called Deep Patient, the system used advanced techniques to spot new patterns in data that didn’t entirely make sense to the researchers but turned out to be very good at finding patients in the earliest stages of many diseases, including liver cancer. Somewhat mysteriously, it could also predict the warning signs of psychiatric disorders like schizophrenia. But even the researchers who built the system didn’t know how it was making decisions. The researchers built a powerful AI—one that had tangible commercial and public health benefits—an
d to this day they can’t see the rationale for how it was making its decisions.11 Deep Patient made clever predictions, but without any explanation, how comfortable would a medical team be in taking next steps, which could include stopping or changing medications, administering radiation or chemotherapy, or going in for surgery?
That inability to observe how AI is optimizing and making its decisions is what’s known as the “black box problem.” Right now, AI systems built by the Big Nine might offer open-source code, but they all function like proprietary black boxes. While they can describe the process, allowing others to observe it in real time is opaque. With all those simulated neurons and layers, exactly what happened and in which order can’t be easily reverse-engineered.
One team of Google researchers did try to develop a new technique to make AI more transparent. In essence, the researchers ran a deep-learning image recognition algorithm in reverse to observe how the system recognized certain things such as trees, snails, and pigs. The project, called DeepDream, used a network created by MIT’s Computer Science and AI Lab and ran Google’s deep-learning algorithm in reverse. Instead of training it to recognize objects using the layer-by-layer approach—to learn that a rose is a rose, and a daffodil is a daffodil—instead it was trained to warp the images and generate objects that weren’t there. Those warped images were fed through the system again and again, and each time DeepDream discovered more strange images. In essence, Google asked AI to daydream. Rather than training it to spot existing objects, instead the system was trained to do something we’ve all done as kids: stare up at the clouds, look for patterns in abstraction, and imagine what we see. Except that DeepDream wasn’t constrained by human stress or emotion: what it saw was an acid-trippy hellscape of grotesque floating animals, colorful fractals, and buildings curved and bent into wild shapes.12
When the AI daydreamed, it invented entirely new things that made logical sense to the system but would have been unrecognizable to us, including hybrid animals, like a “Pig-Snail” and “Dog-Fish.”13 AI daydreaming isn’t necessarily a concern; however, it does highlight the vast differences between how humans derive meaning from real-world data and how our systems, left to their own devices, make sense of our data. The research team published its findings, which were celebrated by the AI community as a breakthrough in observable AI. Meanwhile, the images were so stunning and weird that they made the rounds throughout the internet. A few people used the DeepDream code to build tools allowing anyone to make their own trippy photos. Some enterprising graphic designers even used DeepDream to make strangely beautiful greeting cards and put them up for sale on Zazzle.com.
DeepDream offered a window into how certain algorithms process information; however, it can’t be applied across all AI systems. How newer AI systems work—and why they make certain decisions—is still a mystery. Many within the AI tribe will argue that there is no black box problem—but to date, these systems are still opaque. Instead, they argue that to make the systems transparent would mean disclosing proprietary algorithms and processes. This makes sense, and we should not expect a public company to make its intellectual property and trade secrets freely available to anyone—especially given the aggressive position China has taken on AI.
However, in the absence of meaningful explanations, what proof do we have that bias hasn’t crept in? Without knowing the answer to that question, how would anyone possibly feel comfortable trusting AI?
We aren’t demanding transparency for AI. We marvel at machines that seem to mimic humans but don’t quite get it right. We laugh about them on late-night talk shows, as we are reminded of our ultimate superiority. Again, I ask you: What if these deviations from human thinking are the start of something new?
Here’s what we do know. Commercial AI applications are designed for optimization—not interrogation or transparency. DeepDream was built to address the black box problem—to help researchers understand how complicated AI systems are making their decisions. It should have served as an early warning that AI’s version of perception is nothing like our own. Yet we’re proceeding as though AI will always behave the way its creators intended.
The AI applications built by the Big Nine are now entering the mainstream, and they’re meant to be user-friendly, enabling us to work faster and more efficiently. End users—police departments, government agencies, small and medium businesses—just want a dashboard that spits out answers and a tool that automates repetitive cognitive or administrative tasks. We all just want computers that will solve our problems, and we want to do less work. We also want less culpability—if something goes wrong, we can simply blame the computer system. This is the optimization effect, where unintended outcomes are already affecting everyday people around the world. Again, this should raise a sobering question: How are humanity’s billions of nuanced differences in culture, politics, religion, sexuality, and morality being optimized? In the absence of codified humanistic values, what happens when AI is optimized for someone who isn’t anything like you?
When AI Behaves Badly
Latanya Sweeney is a Harvard professor and former chief technology officer at the US Federal Trade Commission. In 2013, when she searching her name in Google, she found an ad automatically appearing with the wording: “Latanya Sweeney, Arrested? 1) Enter name and state 2) Access full background. Checks instantly. www.instantcheckmate.com.”14 The people who built that system, which used machine learning to match a user’s intent with targeted advertising, encoded bias right into it. The AI powering Google’s AdSense determined that “Latanya” was a Black-identifying name, and people with Black-identifying names more commonly appeared in police databases, therefore there was a strong likelihood that the user might be searching for an arrest record. Curious about what she’d just seen, Sweeney undertook a series of rigorous studies to see if her experience was an anomaly or if there was evidence of structural racism within online advertising. Her hunch about the latter turned out to be correct.
No one at Google built this system to intentionally discriminate against Black people. Rather, it was built to achieve speed and scale. In the 1980s, a company would meet with an agency, whose human staff would develop ad content and broker space within a newspaper—this used to result in exceptions and wrangling on price, and it required a lot of people who all expected to get paid. We’ve eliminated the people and now assign that work to algorithms, which automate the back-and-forth and deliver better results than the people could on their own. That worked well for everyone except Sweeney.
With the scope of humanity limited, the AI system got trained using an initial set of instructions from programmers. The data set most likely included lots of tags, including gender and race. Google makes money when users click through ads—so there’s a commercial incentive to optimize the AI for clicks. Someone along the way probably taught the system to categorize names into different buckets, which resulted in later databases segregated into racially identifying names. Those specific databases combined with individual user behavior would optimize the click-through rate. To its great credit, Google fixed the problem right away without hesitation or question.
The optimization effect has proven to be a problem for companies and organizations that see AI as a good solution to common problems, like administrative shortages and work backlogs. That’s especially true in law enforcement and the courts, which use AI to automate some of their decisions, including sentencing.15 In 2014, two 18-year-old girls saw a scooter and a bike along the side of the road in their Fort Lauderdale suburb. Though the bikes were of a size meant for little kids, the girls hopped on and started careening down the road before deciding they were too small. Just as they were untangling themselves from the scooter and bike, a woman came running after them, yelling, “That’s my kid’s stuff!” A neighbor, watching the scene, called the police, who caught up with the girls and arrested them. The girls were later charged with burglary and petty theft. Together, the bike and scooter were worth about $80. The summer before, a
41-year-old serial criminal was arrested in a nearby Home Depot for shoplifting $86 worth of tools, adding to his record of armed robbery, attempted armed robbery, and time served in prison.
Investigative news organization ProPublica published an exceptionally powerful series detailing what happened next. All three were booked into jail using an AI program that automatically gave them a score: the likelihood that each of them would commit a future crime. The girls, who were Black, were rated high risk. The 41-year-old convicted criminal with multiple arrests—who was white—got the lowest risk rating. The system got it backward. The girls apologized, went home, and were never charged again with new crimes. But the white man is currently serving an eight-year prison term for yet another crime—breaking into a warehouse and stealing thousands of dollars’ worth of electronics.16 ProPublica looked at the risk scores assigned to more than 7,000 people arrested in Florida to see whether this was an anomaly—and again, they found significant bias encoded within the algorithms, which were twice as likely to incorrectly flag Black defendants as future criminals while mislabeling white defendants as low risk.
The optimization effect sometimes causes brilliant AI tribes to make dumb decisions. Recall DeepMind, which built the AlphaGo and AlphaGo Zero systems and stunned the AI community as it dominated grandmaster Go matches. Before Google acquired the company, it sent Geoff Hinton (the University of Toronto professor who was on leave working on deep learning there) and Jeff Dean, who was in charge of Google Brain, to London on a private jet to meet its supernetwork of top PhDs in AI. Impressed with the technology and DeepMind’s remarkable team, they recommended that Google make an acquisition. It was a big investment at the time: Google paid nearly $600 million for DeepMind, with $400 million guaranteed up front and the remaining $200 million to be paid over a five-year period.