You Look Like a Thing and I Love You

Home > Other > You Look Like a Thing and I Love You > Page 17
You Look Like a Thing and I Love You Page 17

by Janelle Shane


  Hybrid AI and pseudo-AI chatbots also have their own potential pitfalls. Every remote interaction becomes a form of the Turing test, and in the tightly limited, highly scripted environment of a customer service interaction, humans and AIs can be tough to tell apart. Humans may end up being treated badly by other humans who think they’re dealing with a bot. Employees have already complained about this, including one whose job it was to generate real-time transcripts of phone calls for deaf and hearing-impaired customers. When a human made a mistake, the caller would sometimes complain about “useless computers.”7

  Another problem is that people end up with the wrong idea of what AI is capable of. If something claims to be AI and then starts holding human-level conversations, identifying faces and objects at a human level of performance, or producing nearly flawless transcriptions, people may assume that AIs really can do these things on their own. The Chinese government is reportedly taking advantage of this8 with its nationwide surveillance system. Experts agree that there’s no facial recognition system that could accurately identify the thirty million people China has on its watch lists. In 2018 the New York Times reported that the government was still doing much of its facial recognition the old-fashioned way, using humans to look through sets of photos and make matches. What they tell the public, however, is that they’re using advanced AI. They’d like people to believe that a nationwide surveillance system is already capable of tracking their every move. And reportedly, people largely believe them. Jaywalking and crime rates are down in areas where the cameras have been publicized, and when told that the system had seen their crimes, some suspects have even confessed.

  BOT OR NOT?

  So given how many AIs are partially or even completely replaced by humans, how can we tell if we’re dealing with a real AI? In this book, we’ve already covered a lot of things that you’ll see AI doing—and things you won’t see it doing. But out in the world, you’ll encounter plenty of exaggerated claims about what AI can do, what it’s already doing, or what it’ll do soon. People trying to sell a product or sensationalize a story will come up with overblown headlines:

  • Facebook AI Invents Language That Humans Can’t Understand: System Shut Down Before It Evolves into Skynet9

  • Babysitter Screening App Predictim Uses AI to Sniff Out Bullies10

  • Here’s What Sophia, the First Robot Citizen, Thinks About Gender and Consciousness11

  • 30-Ton Electronic Brain at U. of P. Thinks Faster Than Einstein (1946)12

  In this book I’ve tried to make it clear what AI is actually capable of and what it’s unlikely to be able to do. Headlines like the ones above are giant red flags—and in this book I’ve given you many reasons why.

  Here are a few questions to ask when evaluating AI claims.

  1. How broad is the problem?

  As we’ve seen throughout this book, AIs do best at very narrow, tightly defined problems. Playing chess or go is narrow enough for AI. Identifying specific kinds of images—recognizing the presence of a human face or distinguishing healthy cells from a specific kind of disease—is also probably doable. Dealing with all the unpredictability of a city street or a human conversation is probably beyond its reach—if it tries, it may succeed much of the time, but there will be glitches.

  Of course, there are some problems that occupy gray areas. An AI may be able to sort medical images pretty well, but if you slip it a picture of a giraffe, it will probably be baffled. AI chatbots that pass as human usually use some gimmick—such as, in one specific case, pretending to be an eleven-year-old Ukrainian kid with limited English skills13—to explain away non sequiturs or their inability to handle most topics. Other AI chatbots have their “conversations” in controlled settings where the questions are known—and the answers human-written—ahead of time. If a problem seems like it required broad understanding or context to solve, a human was probably responsible.

  2. Where did the training data come from?

  Sometimes people show off “AI-written” stories that they have written themselves. You may remember a viral Twitter joke from 2018 about a bot that watched a thousand hours of Olive Garden commercials and generated a script for a new one. One giveaway that the joke was written by a human was that the description of what the AI learned from doesn’t match what it produced. If you give an AI a bunch of videos to learn from, it will output videos. It won’t be able to produce a script with stage directions—not unless there’s another AI, or a human, whose job it is to turn videos into scripts. Did the AI have a set of examples to copy or a fitness function to maximize? If not, then you’re probably not looking at the product of an AI.

  3. Does the problem require a lot of memory?

  Remember from chapter 2 that AIs do best when they don’t have to remember very much at once. People are improving this all the time, but for now, a sign of an AI-generated response is a lack of memory. AI-written stories will meander, forgetting to resolve earlier plot points, sometimes even forgetting to finish sentences. AIs that play complex video games have a tough time with long-term strategy. AIs that hold conversations will forget information you gave them earlier unless they’re explicitly programmed to remember things like your name.

  An AI that’s making callbacks to earlier jokes, that sticks with a consistent cast of characters, and that keeps track of the objects in a room probably had a lot of human editing help, at least.

  4. Is it just copying human biases?

  Even if people do genuinely use AI to solve a problem, it’s possible the AI is not nearly as capable as its programmers claim. For example, if a company claims to have developed a new AI that can comb through a job candidate’s social media and decide whether or not that person is trustworthy, we should immediately be raising red flags. A job like that would require human-level language skills, with the ability to handle memes, jokes, sarcasm, references to current events, cultural sensitivity, and more. In other words, it’s a task for a general AI. So if it’s returning ratings of each candidate, what is it basing its decisions on?

  The CEO of one such service, which in 2018 was offering social media screenings of potential babysitters, told Gizmodo, “We trained our product, our machine, our algorithm to make sure it was ethical and not biased.” As evidence of its AI’s lack of bias, the company’s CTO said, “We don’t look at skin color, we don’t look at ethnicity, those aren’t even algorithmic inputs. There’s no way for us to enter that into the algorithm itself.” But as we’ve seen, there are plenty of ways for a determined AI to pick up on trends that seem to help it figure out how humans rate each other—zip code and even photographs can be an indicator of race, and word choice can give it clues about things like gender and social class. As a possible indication of problems, when a Gizmodo reporter tested the babysitter-screening service, he found that his black friend was rated as “disrespectful” while his foul-mouthed white friend was rated more highly. When asked if the AI might have picked up on systemic bias in its training data, the CEO admitted that this was possible but noted that they added human review to catch errors like this. The question, then, is why the service rated those two friends the way it did. Human review doesn’t necessarily solve the problem of a biased algorithm, since the bias likely came from humans in the first place. And this particular AI doesn’t tell its customers how it came to its decisions, and it quite possibly doesn’t tell its programmers, either. This makes its decisions hard to appeal.14 Shortly after Gizmodo and others reported on their service, Facebook, Twitter, and Instagram restricted the company’s social media access, citing violations of terms of service, and the company halted their planned launch.15

  There may be similar problems with AIs that screen job candidates, like the Amazon-resume-screening AI that learned to penalize female candidates. Companies that offer AI-powered candidate screening point to case studies of clients who have significantly increased the diversity of their hires after using AI.16 But without careful testing, it’s hard to know why. An AI-p
owered job screener could help increase diversity even if it recommended candidates entirely at random, if that’s already better than the racial and/or gender bias in typical company hiring. And what does a video-watching AI do about candidates with facial scarring or partial paralysis or whose facial expressions don’t match Western and/or neurotypical norms?

  As CNBC reported in 2018, people are already being advised to overemote for the AIs that screen videos of job candidates or to wear makeup that makes their faces easier to read.17 If emotion-screening AIs become more prevalent, scanning crowds for people whose microexpressions or body language trigger some warning, people could be compelled to perform for those, too.

  The problem with asking AI to judge the nuances of human language and human beings is that the job is just too hard. To make matters worse, the only rules that are simple and reliable enough for it to understand may be those—like prejudice and stereotyping—that it shouldn’t be using. It’s possible to build an AI system that improves on human prejudices, but it doesn’t happen without a lot of deliberate work, and bias can sneak in despite the best of intentions. When we use AI for jobs like this, we can’t trust its decisions, not without checking its work.

  CHAPTER 10

  A human-AI partnership

  INSTANT AI: JUST ADD HUMAN EXPERTISE

  If there’s one thing we’ve learned from this book, it’s that AI can’t do much without humans. Left to its own devices, at best it will flail ineffectually, and at worst it will solve the wrong problem entirely—which, as we’ve seen, can have devastating consequences. So it’s unlikely that AI-powered automation will be the end of human labor as we know it. A far more likely vision for the future, even one with the widespread use of advanced AI technology, is one in which AI and humans collaborate to solve problems and speed up repetitive tasks. In this chapter, I’ll take a look at what the future holds for AI and humans working together—and how they can partner in surprising ways.

  As we’ve seen throughout this book, humans need to make sure that an AI solves the right problems. This job involves anticipating the kinds of mistakes that machine learning tends to make and making sure to look for them—and even to avoid them in the first place. Choosing the right data can be a big part of that—we’ve seen that messy or flawed data can lead to problems. And of course an AI can’t go collect its own dataset. Not unless we design another AI whose job it is to find data.

  Building the AI in the first place is, of course, another job for humans. A blank mind that absorbs information like a sponge only exists in science fiction. For real AIs, a human has to choose the form to match the problem it’s supposed to solve. Are we building something that will recognize images? Something that will generate new scenes? Something that will predict numbers on a spreadsheet or words in a sentence? Each of those needs a specific type of AI. If the problem is complex, it may need many specialized algorithms working together for the best results. Again, a human has to choose the subalgorithms and set them up so they can learn together.

  A lot of human engineering goes into the dataset as well. The AI will get further if the human programmer can set things up so the AI has less to do. Remember the knock-knock jokes from chapter 1—the AI would have progressed a lot faster if it didn’t have to learn the entire joke formula of knocks and responses but could just focus on filling in the punchline. It would have done even better if we had started it off with a list of existing words and phrases to use when constructing puns. To cite another example, people who know that their AIs will need to keep track of 3-D information can help them out by building them with 3-D object representations in mind.1 Cleaning up a messy dataset to remove distracting or confusing data is also an important part of human dataset engineering. Remember the AI from chapter 4 that spent its time trying to format ISBN numbers rather generating the recipes it was supposed to, and dutifully copied weird typos from its dataset?

  In that sense, practical machine learning ends up being a bit of a hybrid between rules-based programming, in which a human tells a computer step-by-step how to solve a problem, and open-ended machine learning, in which an algorithm has to figure everything out. A human with very specialized knowledge about whatever the algorithm’s trying to solve can really help the program out. In fact, sometimes (perhaps even ideally) the programmer researches the problem and discovers that they now understand it so well that they no longer need to use machine learning at all.

  Of course, too much human supervision can also be counterproductive. Not only are humans slow, but we also sometimes just don’t know what the best approach to a problem is. In one instance, a group of researchers tried to improve the performance of an image recognition algorithm by incorporating more human help.2 Rather than just label a picture as depicting a dog, the researchers asked humans to click on the part of the image that actually contained the dog, then they programmed the AI to pay special attention to that part. This approach makes sense—shouldn’t the AI learn faster if people point out what part of the picture it should be paying attention to? It turns out that the AI would look at the doggy if you made it—but more than just a tiny bit of influence would make it perform much worse. Even more confoundingly, researchers don’t know exactly why. Maybe there’s something we don’t understand about what really helps an image recognition algorithm identify something. Maybe the people who clicked on the images don’t even understand how they recognize dogs and clicked on the parts of the images they thought were important (mostly eyes and muzzles) rather than the parts they actually used to identify it. When the researchers asked the AI which parts of the images it thought were important (by looking at which parts made its neurons activate), it was likely to highlight the edges of the dog or even the background of the photo.

  MAINTENANCE

  Another thing machine learning needs humans for is maintenance.

  After an AI has been trained on real-world data, the world might change. Machine learning researcher Hector Yee reports that around 2008 some colleagues told him there was no need to design a new AI to detect cars in an image—they already had an AI that worked great. But when Yee tried their AI on real-world data, it did terribly. It turned out that the AI had been trained on cars from the 1980s and didn’t know how to recognize modern cars.3

  I’ve seen similar quirks with Visual Chatbot, the giraffe-happy chatbot we met in chapter 4. It has a tendency to identify handheld objects (lightsabers, guns, swords) as Wii remotes. That might be a reasonable guess if it were still 2006, when Wii was in its heyday. More than a decade later, however, finding a person holding a Wii remote is becoming increasingly unlikely.

  All sorts of things could change and mess with an AI. As I mentioned in an earlier chapter, road closures or even hazards like wildfires might not deter an AI that sees only traffic from recommending what it thinks is an attractive route. Or a new kind of scooter could become popular, throwing off the hazard-detection algorithm of a self-driving car. A changing world adds to the challenge of designing an algorithm to understand it.

  People also need to be able to adjust algorithms to fix newly discovered problems. Maybe there’s a rare but catastrophic bug that develops, like the one that affected Siri for a brief period of time, causing her to respond to users saying “Call me an ambulance” with “Okay, I’ll call you ‘an ambulance’ from now on.”4

  Another place where we need human oversight is in the matter of detecting and correcting bias. To combat the tendency of AI decision making to perpetuate bias, governments and other organizations are starting to require bias testing as a matter of course. As I mentioned in chapter 7, in January 2019, New York State issued a letter requiring life insurance companies to prove that their AI systems do not discriminate on the basis of race, religion, country of origin, or other protected classes. The state worried that making coverage decisions using “external lifestyle indicators”—anything from home address to educational level—would lead an AI to use this information to discriminate in illegal ways.5 In
other words, they wanted to prevent mathwashing. We may see pushback against this kind of testing from companies that want their AIs to remain proprietary or harder to hack or that don’t want their AIs’ embarrassing shortcuts to be revealed. Remember Amazon’s sexist resume-screening AI? The company discovered the problem before using the AI in the real world and told us about it as a cautionary tale. How many other biased algorithms are out there right now, doing their best but doing it wrong?

  BEWARE OF AIS THAT LEARN ON THE JOB

  Not only are AIs not great at realizing when their brilliant solutions pose problems, AIs and their environments can also interact in unfortunate ways. One example is the now infamous Microsoft Tay chatbot, a machine learning–based Twitter bot that was designed to learn from the users who tweeted at it. The bot was short-lived. “Unfortunately, within the first 24 hours of coming online,” Microsoft told the Washington Post, “we became aware of a coordinated effort by some users to abuse Tay’s commenting skills to have Tay respond in inappropriate ways. As a result, we have taken Tay offline and are making adjustments.”6 It had taken almost no time at all for users to teach Tay to spew hate speech and other abuse. Tay had no built-in sense of what kind of speech was offensive, a fact that vandals were happy to exploit. In fact, it’s notoriously difficult to flag offensive content without also falsely flagging discussion of the effects of offensive content. Without a good way to recognize offensive things automatically, machine learning algorithms will sometimes go out of their way to promote it, as we learned in chapter 5.

 

‹ Prev