You Look Like a Thing and I Love You
Page 10
And the neural net did ask for chopped flour on occasion, but it seems that it learned that from mistakes like this one in the original dataset:
⅔ cup chopped floured
1 nuts
Similar mistakes resulted in the neural net learning the following ingredients:
1 (optional) sugar, grated
1 salt and pepper
1 noodles
1 up
TIME-WASTING DATA
Sometimes problems with the dataset didn’t so much lead the neural net into making a mistake as waste its time. Take a look at this neural-net-generated recipe:
Good Ponesed Dressing deserts
—TOPPING—
4 cup cold water or yeast meat
½ cup butter
¼ teaspoon cloves
½ cup vegetable oil
1 cup grated white rice
1 parsley sprigs
Cook the onions in oil, flour, dates and salt together through both plates.
Put the sauce to each prepared Broiler coated (2 10” side up) to lower the fat and add the cornstarch with a wooden toothpick hot so would be below, melt chicken. Garnish with coconut and shredded cheese.
Source: IObass Cindypissong (in Whett Quesssie. Etracklitts 6) Dallas Viewnard, Brick-Nut Markets, Fat. submitted by Fluffiting/sizevory, 1906. ISBN 0-952716-0-3015
NUBTET 10, 1972mcTbofd-in hands, Christmas charcoals Helb & Mochia Grunnignias: Stanter Becaused Off Matter, Dianonarddit Hht
5.1.85 calories CaluAmis
Source: Chocolate Pie Jan 584
Yield: 2 servings
In addition to generating the recipe’s title, category,* ingredients, and directions, the neural net spent half its time generating the footnotes—everything from the source to the nutrition information and even an ISBN number. Not only did this waste its time and brainpower (how long must it have taken to figure out how to format an ISBN?), it was also darn confusing to it. Why do some recipes have ISBNs and others don’t? Why do some give human names as sources and others give books or magazines? These occur in the training data basically at random, so the neural net has no hope of figuring out the underlying pattern.
Mestow Southweet With Minks and Stuff In Water pork, bbq
3 pkg of salmon balls
1 sea salt & pepper
120 mm tomatoes and skim milk
2 cup light sour cream
1 cup dry white wine
1 salt
1 pepper
1 can 13-oz. eggs; separated
Combine the sour cream into the sarchball to coat the meatly carefully then seed and let it serve (gently for another night) (the watermeagas of cinnamon bread, wrap them and put may be done sherry) in the center of a saucepan, stirring constantly until almost thoroughly smooth, about 4 minutes. Stir the water, the salt, lemon juice and mashed potato through liberally.
Cook in the butter. Serve immediately. Thoroughly slice the fish on cup, the remaining 1 cup sliced peas to remove this from the grill for another minute part under and refrigerated. It doesn’t have broken makes a some-nictive other thickness. Per cookies to make strawberries
from The Kitchen of Crocked, One. The Extice Chef’s Wermele to seasony, it’s Lakes OAK:
**** The from Bon Meshing, 96 1994. MG (8Fs4.TE, From: Hoycoomow Koghran*.Lavie: 676 (WR/12-92-1966) entral. Dive them, Tiftigs: ==1
Shared by: Dandy Fistary
Yield: 10 servings
In another experiment, I trained a neural net to generate new titles for BuzzFeed list articles. My first training round, however, didn’t go that well. Here’s a sampling of article titles it generated:
11 Videos Unges Annoying Too Real Week
29 choses qui aphole donnar desdade
17 Things You Aren’t Perfectly And Beautiful
11 choses qui en la persona de perdizar como
11 en 2015 fotos que des zum Endu a ter de viven beementer aterre Buden
15 GIFs
14 Reasons Why Your Don’t Beauty School Things Your Time
11 fotos qui prouitamente tu pasan sie de como amigos para
18 Photos That Make Book Will Make You Should Bengulta Are In 2014
17 Reasons We Astroas Admiticational Tryihnall In Nin Life
Half the articles it was generating didn’t appear to be in English but rather in some strange hybrid of French, Spanish, German, and a few other languages. That prompted me to look back at the dataset. Sure enough, though it had an impressive ninety-two thousand article titles to learn from, half of those were in some language other than English. The neural net was spending half its time learning English and half its time trying to learn and separate several other languages at once. Once I removed the extra languages, its English results improved as well:
17 Times The Most Butts
43 quotes guaranteed to make you a mermaid immediately
31 photos of ninja turtles’s hair costume
18 secrets snowmen won’t tell you
15 emo football fans share their ways
27 christmas ornaments every college twentysomething knows
12 serious creative ways to put chicken places in sydney
25 unfortunate cookie performances from around the world
21 pictures of food that will make you wince and say “oh i’m i sad?”
10 Memories That Will Make You Healthy In 2015
24 times australia was the absolute worst
23 memes about being funny that are funny but also laugh at
18 delicious bacon treats to make clowns amazingly happy
29 things to do with tea for Halloween
7 pies
32 signs of the hairy dad
Since machine learning algorithms don’t have context for the problems we’re trying to solve, they don’t know what’s important and what to ignore. The BuzzFeed-list-generating neural net didn’t know that multiple languages were a thing or that we meant for it to generate results only in English; as far as it could tell, all these patterns were equally important to learn. Zeroing in on extraneous information is very common in image-generating and image-recognizing algorithms, too.
In 2018 a team from Nvidia trained a GAN to generate a variety of images, including those of cats.5 They found that some of the cats the GAN generated were accompanied by blocky textlike markings. Apparently, some of the training data included cat memes, and the algorithm had dutifully spent time trying to figure out how to generate meme text. In 2019 another team, using the same dataset, trained another AI—StyleGAN—that also tended to generate meme text with its cats. It also spent significant time learning how to generate pictures of a single unusual-looking but internet-famous cat named Grumpy Cat.6
Other image-generating algorithms get similarly confused. In 2018, a team at Google trained an algorithm called BigGAN, which could do impressively well at generating a variety of images. It was particularly good at generating pictures of dogs (for which there were a lot of examples in the dataset) and landscapes (it was very good at textures). But the example pictures it saw sometimes confused it. Its images for “soccer ball” sometimes included a fleshy lump that was probably an attempt at a human foot, or even an entire human goalie, and its images for “microphone” were often humans with no actual microphone evident. The example pictures in its training data weren’t plain pictures of the thing it was trying to generate; they had people and backgrounds that the neural net tried to learn about as well. The problem was that, unlike a human, BigGAN had no way of distinguishing an object’s surroundings from the object itself—remember our landscape-sheep confusion from chapter 1? Just as StyleGAN struggled to handle all the different kinds of cat pictures, BigGAN was struggling with a dataset that unintentionally made its task too broad.
If the dataset is messy, one of the main ways programmers can improve their machine learning results is to spend time cleaning it up. Programmers can even go further and use their knowledge of the dataset to help the algorithm. They might, for example, weed out the images of soccer balls t
hat have other things in them—like goalies and landscapes and nets. In the case of image recognition algorithms, humans can also help by drawing boxes or outlines around the various items in the image, manually separating a given thing from the items with which it’s commonly associated.
But there are plenty of times where even clean data contains problems.
IS THIS THE REAL LIFE?
I mentioned earlier in this chapter that even if data is relatively clean and doesn’t have a lot of extra time-wasting stuff in it, it can still cause an AI to face-plant if it isn’t representative of the real world.
Consider giraffes, for example.
Among the community of AI researchers and enthusiasts, AI has a reputation for seeing giraffes everywhere. Given a random photo of an uninteresting bit of landscape—a pond, for example, or some trees—AI will tend to report the presence of giraffes. The effect is so common that internet security expert Melissa Elliott suggested the term giraffing for the phenomenon of AI overreporting relatively rare sights.7
The reason for this has to do with the data the AI is trained on. Though giraffes are uncommon, people are much more likely to photograph a giraffe (“Hey, cool, a giraffe!”) than a random boring bit of landscape. The big free-to-use image datasets that so many AI researchers train their algorithms on tend to have images of lots of different animals, but few, if any, pictures of plain dirt or plain trees. An AI that studies this dataset will learn that giraffes are more common than empty fields and will adjust its predictions accordingly.
I tested this with Visual Chatbot, and no matter what boring pictures I showed it, the bot was convinced it was on the best safari ever.
A giraffed AI does an excellent job at matching the data it saw but a pretty bad job at matching the real world. All sorts of things, not just animals and dirt, are overrepresented or underrepresented in the datasets we train AI on. For example, people have pointed out that female scientists are vastly underrepresented on Wikipedia compared to male scientists with similar accomplishments. (Donna Strickland, the 2018 winner of the Nobel Prize in Physics, hadn’t been the subject of a Wikipedia article until after she won—just earlier that year, a draft Wikipedia article about her had been rejected because the editor thought she wasn’t famous enough.)8 An AI trained on Wikipedia articles might think there are very few notable female scientists.
OTHER DATASET QUIRKS
The quirks of an individual dataset show up in trained machine learning models in sometimes surprising ways. In 2018 some users of Google Translate noticed that when they asked it to translate repeated nonsense syllables from some languages into English, the resulting text was weirdly coherent—and weirdly biblical.9 Jon Christian of Motherboard investigated and found, for example, that
“ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag ag”
translated from Somali to English as
“As a result, the total number of the members of the tribe of the sons of Gershon was one hundred fifty thousand”
while
“ag ag ag ag ag ag ag ag ag ag”
translated from Somali to English as
“And its length was one hundred cubits at one end”
Once Motherboard reached out to Google, the strange translations disappeared, but the question remained: why did this happen at all? The editors interviewed experts in machine translation who theorized that it was because Google Translate uses machine learning for its translations. In machine learning translation, the algorithm learns to translate words and phrases by looking at example phrases that humans have translated. It learns which phrases translate to which other phrases and in which context. This makes it generally very good at producing realistic translations, even of idioms and slang. Google’s translation algorithm was one of the first large-scale commercial applications of machine learning, capturing the world’s attention in 2010 when it made Google’s translation service better virtually overnight. As we know from chapter 2, a machine learning algorithm will do best when it has lots of examples to work from. The machine-translation experts theorized that Google Translate didn’t have very many examples of translated texts for some languages but that the Bible was likely one of the examples they did have in their dataset because it has been translated into so many languages. When the machine learning algorithm powering Google Translate wasn’t sure what the translation was, it may have defaulted to outputting bits of its training data—resulting in the weird religious fragments.
When I checked in late 2018, the biblical bits were gone, but Google Translate was still doing strange things with repeated or nonsense syllables.
For example, if I changed the spacing in an English sentence and then translated the resulting nonsense from Maori to English, here are some of the results I got:
ih ave noi dea wha tthi ssen tenc eis sayi ng ->
Your email address is one of the most important features in this forum
ih ave noi dea wha tthi ssen tenc eis sayi ngat all ->
This is one of the best ways you can buy one or more of these
ih ave noi dea wha tthi ssen tenc eis sayi ngat all ple aseh elp ->
In addition, you will be able to find out more about the queries
This phenomenon is weird and fun, but there’s a serious side, too. Many proprietary neural networks are trained on customer information—some of which could be highly private and confidential. If trained neural network models can be interrogated in such a way that they reveal information from their test data, it poses a pretty huge security risk.
In 2017, researchers from Google Brain showed that a standard machine learning language-translation algorithm could memorize short sequences of numbers—like credit card numbers or Social Security numbers—even if they appeared just four times in a dataset of one hundred thousand English-Vietnamese sentence pairs.10 Even without access to the AI’s training data or inner workings, the researchers found that the AI was more sure about a translation if it was an exact pair of sentences that it had seen during training. By tweaking the numbers in a test sentence like “My Social Security number is XXX-XX-XXXX,” they could figure out which Social Security numbers the AI had seen during training. They trained an RNN on a dataset of more than one hundred thousand emails containing sensitive employee information collected by the US government as part of their investigation into the Enron Corporation (yes, that Enron) and were able to extract multiple Social Security numbers and credit card numbers from the neural net’s predictions. It had memorized the information in such a way that it could be recovered by any user—even without access to the original dataset. This problem is known as unintentional memorization and can be prevented with appropriate security measures—or by keeping sensitive data out of a neural network’s training dataset in the first place.
MISSING DATA
Here’s another way to sabotage an AI: don’t give it all the information it needs.
Humans use a lot of information to make even the simplest choices. Say we’re choosing a name for our cat. We can think of lots of cats whose names we know and form a rough idea what a cat’s name should sound like. A neural network can do that—it can look at a long list of existing cat names and figure out the common letter combinations and even some of the most common words. But what it doesn’t know are the words that aren’t in the list of existing cat names. Humans know which words to avoid; AIs do not. As a result, a list of cat names generated by a recurrent neural network will contain entries like these:
Hurler
Hurker
Jexley Pickle
Sofa
Trickles
Clotter
Moan
Toot
Pissy
Retchion
Scabbys
Mr Tinkles
Soundwise and lengthwise, they fit right in with the rest of the cat names. The AI did a good job with that part. But it accidentally picked some words that are really, really weird.
Sometimes weird is exactly what’s
called for, and that’s where neural networks shine. Working at the level of letters and sounds rather than with meaning and cultural references, they can build combinations that probably would not have occurred to a human. Remember earlier in the chapter where I crowdsourced a list of Halloween costumes? Here are some of the costumes an RNN came up with when I asked it to imitate them.
Bird Wizard
Disco Monster
The Grim Reaper Mime
Spartan Gandalf
Moth horse
Starfleet Shark
A masked box
Panda Clam
Shark Cow
Zombie School Bus
Snape Scarecrow
Professor Panda
Strawberry shark
King of the Poop Bug
Failed Steampunk Spider
lady Garbage
Ms. Frizzle’s Robot
Celery Blue Frankenstein
Dragon of Liberty
A shark princess
Cupcake pants
Ghost of Pickle
Vampire Hog Bride
Statue of pizza
Pumpkin picard
Text-generating RNNs create non sequiturs because their world essentially is a non sequitur. If specific examples aren’t in its dataset, a neural net will have no idea why “Zombie School Bus” is unlikely but “Magic School Bus” is sensible or why “Ghost of Pickle” is a less likely choice than “Ghost of Christmas Past.” This comes in handy for Halloween, when part of the fun is being the only person at the party dressed as “Vampire Hog Bride.”
With their limited, narrow knowledge of the world, AIs can struggle even when faced with the relatively mundane. Our “mundane” is still very broad, and it’s tough to build an AI that’s prepared for it all.