And the neural net did ask for chopped flour on occasion, but it seems that it learned that from mistakes like this one in the original dataset:
   ⅔ cup chopped floured
   1 nuts
   Similar mistakes resulted in the neural net learning the following ingredients:
   1 (optional) sugar, grated
   1 salt and pepper
   1 noodles
   1 up
   TIME-WASTING DATA
   Sometimes problems with the dataset didn’t so much lead the neural net into making a mistake as waste its time. Take a look at this neural-net-generated recipe:
   Good Ponesed Dressing deserts
   —TOPPING—
   4 cup cold water or yeast meat
   ½ cup butter
   ¼ teaspoon cloves
   ½ cup vegetable oil
   1 cup grated white rice
   1 parsley sprigs
   Cook the onions in oil, flour, dates and salt together through both plates.
   Put the sauce to each prepared Broiler coated (2 10” side up) to lower the fat and add the cornstarch with a wooden toothpick hot so would be below, melt chicken. Garnish with coconut and shredded cheese.
   Source: IObass Cindypissong (in Whett Quesssie. Etracklitts 6) Dallas Viewnard, Brick-Nut Markets, Fat. submitted by Fluffiting/sizevory, 1906. ISBN 0-952716-0-3015
   NUBTET 10, 1972mcTbofd-in hands, Christmas charcoals Helb & Mochia Grunnignias: Stanter Becaused Off Matter, Dianonarddit Hht
   5.1.85 calories CaluAmis
   Source: Chocolate Pie Jan 584
   Yield: 2 servings
   In addition to generating the recipe’s title, category,* ingredients, and directions, the neural net spent half its time generating the footnotes—everything from the source to the nutrition information and even an ISBN number. Not only did this waste its time and brainpower (how long must it have taken to figure out how to format an ISBN?), it was also darn confusing to it. Why do some recipes have ISBNs and others don’t? Why do some give human names as sources and others give books or magazines? These occur in the training data basically at random, so the neural net has no hope of figuring out the underlying pattern.
   Mestow Southweet With Minks and Stuff In Water pork, bbq
   3 pkg of salmon balls
   1 sea salt & pepper
   120 mm tomatoes and skim milk
   2 cup light sour cream
   1 cup dry white wine
   1 salt
   1 pepper
   1 can 13-oz. eggs; separated
   Combine the sour cream into the sarchball to coat the meatly carefully then seed and let it serve (gently for another night) (the watermeagas of cinnamon bread, wrap them and put may be done sherry) in the center of a saucepan, stirring constantly until almost thoroughly smooth, about 4 minutes. Stir the water, the salt, lemon juice and mashed potato through liberally.
   Cook in the butter. Serve immediately. Thoroughly slice the fish on cup, the remaining 1 cup sliced peas to remove this from the grill for another minute part under and refrigerated. It doesn’t have broken makes a some-nictive other thickness. Per cookies to make strawberries
   from The Kitchen of Crocked, One. The Extice Chef’s Wermele to seasony, it’s Lakes OAK:
   **** The from Bon Meshing, 96 1994. MG (8Fs4.TE, From: Hoycoomow Koghran*.Lavie: 676 (WR/12-92-1966) entral. Dive them, Tiftigs: ==1
   Shared by: Dandy Fistary
   Yield: 10 servings
   In another experiment, I trained a neural net to generate new titles for BuzzFeed list articles. My first training round, however, didn’t go that well. Here’s a sampling of article titles it generated:
   11 Videos Unges Annoying Too Real Week
   29 choses qui aphole donnar desdade
   17 Things You Aren’t Perfectly And Beautiful
   11 choses qui en la persona de perdizar como
   11 en 2015 fotos que des zum Endu a ter de viven beementer aterre Buden
   15 GIFs
   14 Reasons Why Your Don’t Beauty School Things Your Time
   11 fotos qui prouitamente tu pasan sie de como amigos para
   18 Photos That Make Book Will Make You Should Bengulta Are In 2014
   17 Reasons We Astroas Admiticational Tryihnall In Nin Life
   Half the articles it was generating didn’t appear to be in English but rather in some strange hybrid of French, Spanish, German, and a few other languages. That prompted me to look back at the dataset. Sure enough, though it had an impressive ninety-two thousand article titles to learn from, half of those were in some language other than English. The neural net was spending half its time learning English and half its time trying to learn and separate several other languages at once. Once I removed the extra languages, its English results improved as well:
   17 Times The Most Butts
   43 quotes guaranteed to make you a mermaid immediately
   31 photos of ninja turtles’s hair costume
   18 secrets snowmen won’t tell you
   15 emo football fans share their ways
   27 christmas ornaments every college twentysomething knows
   12 serious creative ways to put chicken places in sydney
   25 unfortunate cookie performances from around the world
   21 pictures of food that will make you wince and say “oh i’m i sad?”
   10 Memories That Will Make You Healthy In 2015
   24 times australia was the absolute worst
   23 memes about being funny that are funny but also laugh at
   18 delicious bacon treats to make clowns amazingly happy
   29 things to do with tea for Halloween
   7 pies
   32 signs of the hairy dad
   Since machine learning algorithms don’t have context for the problems we’re trying to solve, they don’t know what’s important and what to ignore. The BuzzFeed-list-generating neural net didn’t know that multiple languages were a thing or that we meant for it to generate results only in English; as far as it could tell, all these patterns were equally important to learn. Zeroing in on extraneous information is very common in image-generating and image-recognizing algorithms, too.
   In 2018 a team from Nvidia trained a GAN to generate a variety of images, including those of cats.5 They found that some of the cats the GAN generated were accompanied by blocky textlike markings. Apparently, some of the training data included cat memes, and the algorithm had dutifully spent time trying to figure out how to generate meme text. In 2019 another team, using the same dataset, trained another AI—StyleGAN—that also tended to generate meme text with its cats. It also spent significant time learning how to generate pictures of a single unusual-looking but internet-famous cat named Grumpy Cat.6
   Other image-generating algorithms get similarly confused. In 2018, a team at Google trained an algorithm called BigGAN, which could do impressively well at generating a variety of images. It was particularly good at generating pictures of dogs (for which there were a lot of examples in the dataset) and landscapes (it was very good at textures). But the example pictures it saw sometimes confused it. Its images for “soccer ball” sometimes included a fleshy lump that was probably an attempt at a human foot, or even an entire human goalie, and its images for “microphone” were often humans with no actual microphone evident. The example pictures in its training data weren’t plain pictures of the thing it was trying to generate; they had people and backgrounds that the neural net tried to learn about as well. The problem was that, unlike a human, BigGAN had no way of distinguishing an object’s surroundings from the object itself—remember our landscape-sheep confusion from chapter 1? Just as StyleGAN struggled to handle all the different kinds of cat pictures, BigGAN was struggling with a dataset that unintentionally made its task too broad.
   If the dataset is messy, one of the main ways programmers can improve their machine learning results is to spend time cleaning it up. Programmers can even go further and use their knowledge of the dataset to help the algorithm. They might, for example, weed out the images of soccer balls t
hat have other things in them—like goalies and landscapes and nets. In the case of image recognition algorithms, humans can also help by drawing boxes or outlines around the various items in the image, manually separating a given thing from the items with which it’s commonly associated.
   But there are plenty of times where even clean data contains problems.
   IS THIS THE REAL LIFE?
   I mentioned earlier in this chapter that even if data is relatively clean and doesn’t have a lot of extra time-wasting stuff in it, it can still cause an AI to face-plant if it isn’t representative of the real world.
   Consider giraffes, for example.
   Among the community of AI researchers and enthusiasts, AI has a reputation for seeing giraffes everywhere. Given a random photo of an uninteresting bit of landscape—a pond, for example, or some trees—AI will tend to report the presence of giraffes. The effect is so common that internet security expert Melissa Elliott suggested the term giraffing for the phenomenon of AI overreporting relatively rare sights.7
   The reason for this has to do with the data the AI is trained on. Though giraffes are uncommon, people are much more likely to photograph a giraffe (“Hey, cool, a giraffe!”) than a random boring bit of landscape. The big free-to-use image datasets that so many AI researchers train their algorithms on tend to have images of lots of different animals, but few, if any, pictures of plain dirt or plain trees. An AI that studies this dataset will learn that giraffes are more common than empty fields and will adjust its predictions accordingly.
   I tested this with Visual Chatbot, and no matter what boring pictures I showed it, the bot was convinced it was on the best safari ever.
   A giraffed AI does an excellent job at matching the data it saw but a pretty bad job at matching the real world. All sorts of things, not just animals and dirt, are overrepresented or underrepresented in the datasets we train AI on. For example, people have pointed out that female scientists are vastly underrepresented on Wikipedia compared to male scientists with similar accomplishments. (Donna Strickland, the 2018 winner of the Nobel Prize in Physics, hadn’t been the subject of a Wikipedia article until after she won—just earlier that year, a draft Wikipedia article about her had been rejected because the editor thought she wasn’t famous enough.)8 An AI trained on Wikipedia articles might think there are very few notable female scientists.
   OTHER DATASET QUIRKS
   The quirks of an individual dataset show up in trained machine learning models in sometimes surprising ways. In 2018 some users of Google Translate noticed that when they asked it to translate repeated nonsense syllables from some languages into English, the resulting text was weirdly coherent—and weirdly biblical.9 Jon Christian of Motherboard investigated and found, for example, that
   “ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag”
   translated from Somali to English as
   “As a result, the total number of the members of the tribe of the sons of Gershon was one hundred fifty thousand”
   while
   “ag ag ag ag ag ag ag ag ag ag”
   translated from Somali to English as
   “And its length was one hundred cubits at one end”
   Once Motherboard reached out to Google, the strange translations disappeared, but the question remained: why did this happen at all? The editors interviewed experts in machine translation who theorized that it was because Google Translate uses machine learning for its translations. In machine learning translation, the algorithm learns to translate words and phrases by looking at example phrases that humans have translated. It learns which phrases translate to which other phrases and in which context. This makes it generally very good at producing realistic translations, even of idioms and slang. Google’s translation algorithm was one of the first large-scale commercial applications of machine learning, capturing the world’s attention in 2010 when it made Google’s translation service better virtually overnight. As we know from chapter 2, a machine learning algorithm will do best when it has lots of examples to work from. The machine-translation experts theorized that Google Translate didn’t have very many examples of translated texts for some languages but that the Bible was likely one of the examples they did have in their dataset because it has been translated into so many languages. When the machine learning algorithm powering Google Translate wasn’t sure what the translation was, it may have defaulted to outputting bits of its training data—resulting in the weird religious fragments.
   When I checked in late 2018, the biblical bits were gone, but Google Translate was still doing strange things with repeated or nonsense syllables.
   For example, if I changed the spacing in an English sentence and then translated the resulting nonsense from Maori to English, here are some of the results I got:
   ih ave noi dea wha tthi ssen tenc eis sayi ng ->
   Your email address is one of the most important features in this forum
   ih ave noi dea wha tthi ssen tenc eis sayi ngat all ->
   This is one of the best ways you can buy one or more of these
   ih ave noi dea wha tthi ssen tenc eis sayi ngat all ple aseh elp ->
   In addition, you will be able to find out more about the queries
   This phenomenon is weird and fun, but there’s a serious side, too. Many proprietary neural networks are trained on customer information—some of which could be highly private and confidential. If trained neural network models can be interrogated in such a way that they reveal information from their test data, it poses a pretty huge security risk.
   In 2017, researchers from Google Brain showed that a standard machine learning language-translation algorithm could memorize short sequences of numbers—like credit card numbers or Social Security numbers—even if they appeared just four times in a dataset of one hundred thousand English-Vietnamese sentence pairs.10 Even without access to the AI’s training data or inner workings, the researchers found that the AI was more sure about a translation if it was an exact pair of sentences that it had seen during training. By tweaking the numbers in a test sentence like “My Social Security number is XXX-XX-XXXX,” they could figure out which Social Security numbers the AI had seen during training. They trained an RNN on a dataset of more than one hundred thousand emails containing sensitive employee information collected by the US government as part of their investigation into the Enron Corporation (yes, that Enron) and were able to extract multiple Social Security numbers and credit card numbers from the neural net’s predictions. It had memorized the information in such a way that it could be recovered by any user—even without access to the original dataset. This problem is known as unintentional memorization and can be prevented with appropriate security measures—or by keeping sensitive data out of a neural network’s training dataset in the first place.
   MISSING DATA
   Here’s another way to sabotage an AI: don’t give it all the information it needs.
   Humans use a lot of information to make even the simplest choices. Say we’re choosing a name for our cat. We can think of lots of cats whose names we know and form a rough idea what a cat’s name should sound like. A neural network can do that—it can look at a long list of existing cat names and figure out the common letter combinations and even some of the most common words. But what it doesn’t know are the words that aren’t in the list of existing cat names. Humans know which words to avoid; AIs do not. As a result, a list of cat names generated by a recurrent neural network will contain entries like these:
   Hurler
   Hurker
   Jexley Pickle
   Sofa
   Trickles
   Clotter
   Moan
   Toot
   Pissy
   Retchion
   Scabbys
   Mr Tinkles
   Soundwise and lengthwise, they fit right in with the rest of the cat names. The AI did a good job with that part. But it accidentally picked some words that are really, really weird.
   Sometimes weird is exactly what’s
 called for, and that’s where neural networks shine. Working at the level of letters and sounds rather than with meaning and cultural references, they can build combinations that probably would not have occurred to a human. Remember earlier in the chapter where I crowdsourced a list of Halloween costumes? Here are some of the costumes an RNN came up with when I asked it to imitate them.
   Bird Wizard
   Disco Monster
   The Grim Reaper Mime
   Spartan Gandalf
   Moth horse
   Starfleet Shark
   A masked box
   Panda Clam
   Shark Cow
   Zombie School Bus
   Snape Scarecrow
   Professor Panda
   Strawberry shark
   King of the Poop Bug
   Failed Steampunk Spider
   lady Garbage
   Ms. Frizzle’s Robot
   Celery Blue Frankenstein
   Dragon of Liberty
   A shark princess
   Cupcake pants
   Ghost of Pickle
   Vampire Hog Bride
   Statue of pizza
   Pumpkin picard
   Text-generating RNNs create non sequiturs because their world essentially is a non sequitur. If specific examples aren’t in its dataset, a neural net will have no idea why “Zombie School Bus” is unlikely but “Magic School Bus” is sensible or why “Ghost of Pickle” is a less likely choice than “Ghost of Christmas Past.” This comes in handy for Halloween, when part of the fun is being the only person at the party dressed as “Vampire Hog Bride.”
   With their limited, narrow knowledge of the world, AIs can struggle even when faced with the relatively mundane. Our “mundane” is still very broad, and it’s tough to build an AI that’s prepared for it all.
   
 
 You Look Like a Thing and I Love You Page 10