You Look Like a Thing and I Love You

Page 3

by Janelle Shane

Warning sign number 2: The Problem Is Not What We Thought It Was

The problem with designing an AI to screen candidates for us: we aren’t really asking the AI to identify the best candidates. We’re asking it to identify the candidates that most resemble the ones our human hiring managers liked in the past.

That might be okay if the human hiring managers made great decisions. But most US companies have a diversity problem, particularly among managers and particularly in the way that hiring managers evaluate resumes and interview candidates. All else being equal, resumes with white-male-sounding names are more likely to get interviews than those with female-and/or minority-sounding names.5 Even hiring managers who are female and/or members of a minority themselves tend to unconsciously favor white male candidates.

Plenty of bad and/or outright harmful AI programs are designed by people who thought they were designing an AI to solve a problem but were unknowingly training it to do something entirely different.

Warning sign number 3: There Are Sneaky Shortcuts

Remember the skin-cancer-detecting AI that was really a ruler detector? Identifying the minute differences between healthy cells and cancer cells is difficult, so the AI found it a lot easier to look for the presence of a ruler in the picture.

If you give a job-candidate-screening AI biased data to learn from (which you almost certainly did, unless you did a lot of work to scrub bias from the data), then you also give it a convenient shortcut to improve its accuracy at predicting the “best” candidate: prefer white men. That’s a lot easier than analyzing the nuances of a candidate’s choice of wording. Or perhaps the AI will find and exploit another unfortunate shortcut—maybe we filmed our successful candidates using a single camera, and the AI learns to read the camera metadata and select only candidates who were filmed with that camera.

AIs take sneaky shortcuts all the time—they just don’t know any better!

Warning sign number 4: The AI Tried to Learn from Flawed Data

There’s an old computer-science saying: garbage in, garbage out. If the AI’s goal is to imitate humans who make flawed decisions, perfect success would be to imitate those decisions exactly, flaws and all.

Flawed data, whether it’s flawed examples to learn from or a flawed simulation with weird physics, will throw an AI for a loop or send it off in the wrong direction. Since in many cases our example data is the problem we’re giving the AI to solve, it’s no wonder that bad data leads to a bad solution. In fact, warning signs numbers 1 through 3 are most often evidence of problems with data.

DOOM—OR DELIGHT

The job-candidate-screening example is, unfortunately, not hypothetical. Multiple companies already offer AI-powered resume-screening or video-interview-screening services, and few offer information about what they’ve done to address bias or to account for disability or cultural differences or to find out what information their AIs use in the screening process. With careful work, it’s at least possible to build a job-candidate-screening AI that is measurably less biased than human hiring managers—but without published stats to prove it, we can be pretty sure that bias is still there.

The difference between successful AI problem solving and failure usually has a lot to do with the suitability of the task for an AI solution. And there are plenty of tasks for which AI solutions are more efficient than human solutions. What are they, and what makes AI so good at them? Let’s take a look.

CHAPTER 2

AI is everywhere, but where is it exactly?

THIS EXAMPLE IS REAL, I KID YOU NOT

There’s a farm in Xichang, China, that’s unusual for a number of reasons. One, it’s the largest farm of its type in the world, its productivity unmatched. Each year, the farm produces six billion Periplaneta americana, more than twenty-eight thousand of them per square foot.1 To maximize productivity, the farm relies on algorithms that control the temperature, humidity, food supply, and even analyze the genetics and growing rate of Periplaneta americana.

But the primary reason the farm is unusual is that Periplaneta americana is simply the Latin name for the common cockroach. Yes, the farm produces cockroaches, which are crushed into a potion that’s highly valuable in traditional Chinese medicine. “Slightly sweet,” reports its packaging. With “a slightly fishy smell.”

Because it’s a valuable trade secret, details are scarce on what exactly the cockroach-maximizing algorithm is like. But the scenario sounds an awful lot like a famous thought experiment called the paper-clip maximizer, which supposes that a superintelligent AI has a singular task: producing paper clips. Given that single-minded goal, a superintelligent AI might decide to convert all the resources it could into the manufacture of paper clips—even converting the planet and all its occupants into paper clips. Fortunately—very fortunately, given that we’ve just been talking about an algorithm whose job it is to maximize the number of cockroaches in existence—the algorithms we have today are light-years away from being capable of running factories or farms by themselves, let alone converting the global economy into a cockroach producer. Very likely, the cockroach AI is making predictions about future production rates based on past data, then picking the environmental conditions it thinks will maximize cockroach production. It likely can suggest adjustments within a range that its human engineers set, but it probably relies on humans for taking data, filling orders, unloading supplies, and the all-important marketing of cockroach extract.

Still, helping optimize a cockroach farm is something an AI is likely to be good at. There’s a lot of data to parse, but these algorithms are good at finding trends in huge datasets. It’s a job that is likely to be unpopular, but AIs don’t mind repetitive tasks or the skittering sound of millions of cockroach feet in the dark. Cockroaches reproduce quickly, so it doesn’t take long to see the effects of variable tweaking. And it’s a specific, narrow problem rather than one that’s complex and open-ended.

Are there still potential problems with using AI to maximize cockroach production? Yes. Since AIs lack context about what they’re actually trying to accomplish and why, they often solve problems in unexpected ways. Suppose the cockroach AI found that by turning both the heat and water up to “max” in one particular room, it can significantly increase the number of cockroaches that room can produce. It would have no way of knowing (or caring) that what it had actually done was short out the door that prevents the cockroaches from accessing the employee kitchen.

Technically, shorting out the door was the AI being good at its job. Its job was to maximize cockroach production, not guard against their escape. To work with AI effectively, and to anticipate trouble before it happens, we need to understand what machine learning is best at.

ACTUALLY, I WOULD BE FINE WITH A ROBOT TAKING THIS JOB

Machine learning algorithms are useful even for jobs that a human could do better. Using an algorithm for a particular task saves the trouble and expense of having a human do it, especially when the task is high-volume and repetitive. This is true not just for machine learning algorithms, of course, but for automation in general. If a Roomba can save us from having to vacuum a room ourselves, we’ll put up with retrieving it again and again from under the sofa.

One repetitive task that people are automating with AI is analyzing medical images. Lab technicians spend hours every day looking at blood samples under a microscope, counting platelets or white or red blood cells or examining tissue samples for abnormal cells. Each one of these tasks is simple, consistent, and self-contained, so in that way they’re good candidates for automation. But the stakes are higher when these algorithms leave the research lab and start working in hospitals, where the consequences of a mistake are much more serious. There are similar problems with self-driving cars—driving is mostly repetitive, and it would be nice to have a driver who never gets tired, but even a tiny glitch can have serious consequences at sixty miles per hour.

Another high-volume task we’re happy to automate with AI, even if its perf
ormance isn’t quite at the human level: spam filtering. The onslaught of spam is a problem that can be nuanced and ever-changing, so it’s a tricky one for AI, but on the other hand, most of us are willing to put up with the occasional misfiltered message if it means our inboxes are mostly clear. Flagging malicious URLs, filtering social media posts, and identifying bots are high-volume tasks in which we mostly tolerate buggy performance.

Hyperpersonalization is another area where AI is starting to show its usefulness. With personalized product recommendations, movie recommendations, and music playlists, companies use AI to tailor the experience to each consumer in a way that would be cost-prohibitive if a human were coming up with the requisite insights. So what if the AI is convinced that we need an endless number of hallway rugs or thinks we are a toddler because of that one time we bought a present for a baby shower? Its mistakes are mostly harmless (except for those occasions when they’re very, very unfortunate), and it could bring the company a sale.

Commercial algorithms can now write up hyperlocal articles about election results, sports scores, and recent home sales. In each case, the algorithm can only produce a highly formulaic article, but people are interested enough in the content that it doesn’t seem to matter. One of these algorithms is called Heliograf, developed by the Washington Post to turn sports stats into news articles. As early as 2016, it was already producing hundreds of articles a year. Here’s an example of its reporting on a football game.2

The Quince Orchard Cougars shut out the Einstein Titans, 47–0, on Friday.

Quince Orchard opened the game with an eight-yard touchdown off a blocked punt return by Aaron Green. The Cougars added to their lead on Marquez Cooper’s three-yard touchdown run. The Cougars extended their lead on Aaron Derwin’s 18-yard touchdown run. The Cougars went even further ahead following Derwin’s 63-yard touchdown reception from quarterback Doc Bonner, bringing the score to 27–0.

It’s not exciting stuff, but Heliograf does describe the game.* It knows how to populate an article based on a spreadsheet full of data and a few stock sports phrases. But an AI like Heliograf would utterly fail when faced with information that doesn’t fit neatly into the prescribed boxes. Did a horse run onto the field midgame? Was the locker room of the Einstein Titans overrun by cockroaches? Is there an opportunity for a clever pun? Heliograf only knows how to report its spreadsheet.

Nevertheless, AI-generated writing allows news outlets to produce the types of articles that were formerly cost-prohibitive. It requires a human’s touch to decide which articles to automate and to build the AI’s basic templates and stock phrases, but once a paper has set up one of these hyperspecialized algorithms, it can churn out as many news articles as there are spreadsheets to draw from. One Swedish news site, for example, built the Homeowners Bot, which was able to read tables of real estate data and write up each sale into an individual article, producing more than ten thousand articles in four months. This has turned out to be the most popular—and lucrative—type of article the news site publishes.3 And human reporters can spend their valuable time on creative investigative work instead. Increasingly, major news outlets use AI assistance to write their articles.4

Science is another area where AI shows promise for automating repetitive tasks. Physicists, for example, have used AI to watch the light coming from distant stars,5 looking for telltale signs that the star might have a planet. Of course, the AI wasn’t as accurate as the physicists who trained it. Most of the stars it flagged as interesting were false alarms. But it was able to correctly eliminate more than 90 percent of the stars as uninteresting, which saved the physicists a lot of time.

Astronomy is full of huge datasets, as it turns out. Over the course of its life, the Euclid telescope will collect tens of billions of galaxy images, out of which maybe two hundred thousand will show evidence of a phenomenon called gravitational lensing,6 which happens when a supermassive galaxy has gravity so strong that it actually bends the light from other, more distant galaxies. If astronomers can find the lenses, they can learn a lot about gravity on a huge intergalactic scale, where there are so many unsolved mysteries that a full 95 percent of the universe’s mass and energy is unaccounted for. When algorithms reviewed the images, they were faster than humans and sometimes outperformed them in accuracy. But when the telescope captured one superexciting “jackpot” lens, only the humans noticed it.

Creative work can be automated as well, at least under the supervision of a human artist. Whereas before a photographer might spend hours tweaking a photograph, today’s AI-powered filters, like the built-in ones on Instagram and Facebook, do a decent job of adjusting contrast and lighting and even adding depth-of-focus effects to simulate an expensive lens. No need to digitally paint cat ears onto your friend—there’s an AI-powered filter built into your Instagram that will figure out where the ears should go, even as your friend moves their head. In big and small ways, AI gives artists and musicians access to time-saving tools that can expand their ability to do creative work on their own. On the flip side of this, of course, are tools like deepfakes, which allow people to swap one person’s head and/or body for another, even in video. On the one hand, greater access to this tool means that artists can readily insert Nicolas Cage or John Cho into various movie roles, goofing around or making a serious point about minority representation in Hollywood.7 On the other hand, the increasing ease of deepfakes is already giving harassers new ways to generate disturbing, highly targeted videos for dissemination online. And as technology improves and deepfake videos become increasingly convincing, many people and governments are worrying about the technique’s potential for creating fake but damaging videos—like realistic yet faked videos of a politician saying something inflammatory.

In addition to saving humans time, AI automation can mean more consistent performance. After all, an individual human’s performance may vary throughout the day depending on things like how recently they’ve eaten or how much they’ve slept, and each person’s biases and moods might have a huge effect as well. Countless studies have shown that sexism, racial bias, ableism, and other problems affect things like whether resumes get shortlisted, whether employees get raises, and whether prisoners get parole. Algorithms avoid human inconsistencies—given a set of data, they’ll return pretty much an unvarying result, no matter if it’s morning, noon, or happy hour. But, unfortunately, consistent doesn’t mean unbiased. It’s very possible for an algorithm to be consistently unfair, especially if it learned, as many AIs do, by copying humans.

So there are plenty of things that it’s attractive to automate with AI. But what about the things that determine whether we can automate a problem?

THE NARROWER THE TASK, THE SMARTER THE AI

The Turing test has been a famous benchmark for the intelligence level of a computer program ever since Alan Turing proposed it in the 1950s. A computer program passes the standard Turing test if it can chat with humans and convince approximately one-third of them that it’s a human being rather than a computer. Passing the Turing test is sometimes seen as a sign that an algorithm has achieved human-level intelligence and maybe even self-awareness. Lots of science fiction books and movies—Blade Runner, Ex Machina, Bicentennial Man, and many more—involve sophisticated artificial general intelligences that have proved their “personhood” by passing the Turing test.

But the Turing test isn’t actually a good measure of algorithmic intelligence. For one thing, it’s easy to pass the Turing test if you can make the topic of conversation narrow enough. I chatted with the Whole Foods Market bot on Facebook to test this theory:

Whole Foods: Hi Janelle! We’re here to help you find recipes as simple as their ingredients.

Me: Do you have a recipe for guacamole?

Whole Foods:

Me: Is it okay to make guacamole from green peas?

Whole Foods:

Me: Give me a recipe that uses avocado.
Anything but guacamole.

Whole Foods:

At this point, the conversation is a bit ambiguous. The thing I’m chatting with hasn’t specifically said whether it’s a human or a bot and is competently handling questions that can be answered by posting a recipe. But when I stray from this narrow format, the bot quickly reveals itself.

Me: How can I tell if an avocado is ripe?

Whole Foods:

Me: Do you have a favorite Star Wars character?

Whole Foods:

This is the strategy companies use when they want to use chatbots for customer service. Rather than identify the bots as such, they rely on human politeness to keep the conversation on topics in which the bots can hold their own. After all, if there’s a chance you might be talking with a human employee, it would be rude to test them with weird off-topic questions.

Even when customers stick to the prescribed topic, chatbots will struggle if the topic is too broad. Beginning in August 2015, Facebook tried to create an AI-powered chatbot called M that was meant to make hotel reservations, book theater tickets, recommend restaurants, and more.8 The idea was that the company would start out using humans to handle the most difficult requests, thereby generating lots of examples that the algorithm could learn from. Eventually, Facebook expected the algorithm to have enough data to handle most questions on its own. Unfortunately, given the freedom to ask M anything, customers took Facebook at its word. In an interview, the engineer who started the project recounted, “People try first to ask for the weather tomorrow; then they say ‘Is there an Italian restaurant available?’ Next they have a question about immigration, and after a while they ask M to organize their wedding.”9 A user even asked M to arrange for a parrot to visit his friend. M succeeded—by sending that request to be handled by a human. In fact, years after it introduced M, Facebook found that its algorithm still needed too much human help. It shut down the service in January 2018.10

‹ Prev Next ›