Entropy and the Turing Test
We return to the Shannon Game one last time. Scientists all the way back to Claude Shannon have regarded creating an optimal playing strategy for this game as equivalent to creating an optimal compression method for English. These two challenges are so related that they amount to one and the same thing.
But only now are researchers29 arguing one step further—that creating an optimal compressor for English is equivalent to another major challenge in the AI world: passing the Turing test.
If a computer could play this game optimally, they say, if a computer could compress English optimally, it’d know enough about the language that it would know the language. We’d have to consider it intelligent—in the human sense of the word.
So a computer, to be humanly intelligent, doesn’t even need—as in the traditional Turing test—to respond to your sentences: it need only complete them.
Every time you whip out your mobile and start plowing your thumbs into it—“hey dude 7 sounds good see you there”—you’re conducting your very own Turing test; you’re seeing if computers have finally caught us or not. Remember that every frustration, every “Why does it keep telling people I’m feeling I’ll today!?” and “Why in the heck does it keep signing off with Love, Asian!?” is, for better or worse, a verdict—and the verdict is not yet, not just yet. The line itself is still no match for you. And it’s still no match for the person at the other end.
1. Claude Shannon: “Joyce … is alleged to achieve a compression of semantic content.”
2. Length here refers to binary bits, not words of English, but the distinction isn’t hugely important in this case.
3. This is why, for instance, starting a game of Guess Who?, as I routinely did in the late ’80s, by asking about the person’s gender is a poor strategy: the game only had five women characters to nineteen men, so the question wasn’t as incisive as one that would create a twelve-twelve split.
4. The problem was how to get an accurate gauge of its volume without melting it down. Thinking about this as he stepped into a public bath, all of a sudden he registered: the water level rose as he got in! You can measure the volume of an irregular object by the amount of water it displaces! Allegedly, he was so excited about this insight that he immediately leaped out of the bath and ran home to work out the experiment, naked and dripping bathwater through the streets, shouting for joy. The word he was shouting was the Greek for “I’ve got it!” and has since become our synonym for scientific discovery: Eureka.
5. As a result, highly compressed files are much more fragile, in the sense that if any of the bits are corrupted, the context won’t help fill them in, because those contextual clues have already been capitalized on and compressed away. This is one of the useful qualities of redundancy.
6. Not to be confused with thermodynamic entropy, the measure of “disorder” in a physical system. The two are in fact related, but in complicated and mathematically strenuous ways that are outside of our scope here but well worth reading about for those curious.
7. Play the game yourself at math.ucsd.edu/~crypto/java/ENTROPY/. It’s fun; plus, moving that slowly and being forced to speculate at every single darn step of the way, you’ll never think about language and time the same way again. Some elementary schools use a variation of the Shannon Game to teach spelling; I’d have my undergraduate poetry workshop students play the Shannon Game to strengthen their syntactical chops. In poems, where the economy of language is often pushed to its breaking point, having a feel for what chains of words will be predictable to a reader is a useful compass.
8. Fascinatingly, this suggests that blander, more generic, lower-vocabulary, or more repetitive books are harder to search, and harder to edit.
9. For this reason much swear-bleeping censorship on television makes no sense to me, because if the removed words are cloze-test obvious, then to what extent have you removed them?
10. Heck, even pickup artists don’t like it. As Mystery puts it, “The location where you first encounter a woman is not necessarily favorable … The music may be too loud for lengthy comfort-building dialogue.”
11. That the distinctions between the words in “And in an …” are steamrollered by native speakers, especially when talking excitedly—“Nininin …”—is lossy compression. We can afford to render all three words alike because the rules of grammar and syntax prevent other “decompressions,” like “and and an,” or “an in in,” from seeming plausible.
12. I can’t help noticing that the fifteen-character “Shannon entropy” and five-character “mouth” have both been assigned a two-character substitute—in fact, the same two-character symbol, the pronoun “it”—in this sentence: more compression for you.
13. Any utterance or description or conversation, of course, leaves countless things out. Thus the implication of anything said is that it is, in fact, non-obvious. Thus the word “obviously” (or the phrase “of course”) is always at least slightly disingenuous—because anything said must be at least somewhat surprising and/or informative in order to be said. (Everything said has a presumption of ignorance behind it. This is why stating the obvious is not only inefficient, but frequently offensive. Yet the opposite, too much left unsaid—as Shannon shows us in the value of redundancy, and as expressions like “when you assume you make an ass out of u and me” indicate—has its own risks.)
14. “A girl on the stairs listens to her father / Beat up her mother,” it begins, and ends with what might be a reference to either the mother or the girl: “Someone putting their tongue where their tooth had been.”
15. David Bellos, director of the Program in Translation and Intercultural Communication at Princeton, speculates that firmly “generic” books may be easier for computers to translate: “If you were to take a decidedly jaundiced view of some genre of contemporary foreign fiction (say, French novels of adultery and inheritance), you could surmise that since such works have nothing new to say and employ only repeated formulas, then after a sufficient number of translated novels of that kind and their originals had been scanned and put up on the web, Google Translate should be able to do a pretty good simulation of translating other regurgitations of the same ilk … For works that are truly original—and therefore worth translating—statistical machine translation hasn’t got a hope.”
16. See, e.g., Columbia University clinical psychologist George Bonanno’s “Loss, Trauma, and Human Resilience: Have We Underestimated the Human Capacity to Thrive After Extremely Aversive Events?”
17. As a confederate, it was often the moments when I (felt I) knew what the judge was typing that I jumped the Q&A gun. This suggests a way in which Shannon Game entropy and the (much less well understood) science of barge-in may be related: a link between the questions of how to finish another’s sentences, and when.
18. “You know, if people spoke completely compressed text, no one would actually be able to learn English,” notes Brown University professor of computer science and cognitive science Eugene Charniak. Likewise, adults would find it much harder to distinguish gibberish at a glance, because every string of letters or sounds would have at least some meaning. “Colorless green ideas sleep furiously” is, famously, nonsensical, but requires a second’s thought to identify it as such, whereas “Meck pren plaphth” is gibberish at a glance. A language that was compressed for maximum terseness and economy wouldn’t have this distinction.
Another casualty of an optimally compressed language (if anyone could learn it in the first place) would be crossword puzzles. As Claude Shannon noted, if our language was better compressed—that is, if words were shorter, with almost all short strings of letters, like “meck” and “pren” and all sorts of others, being valid words—then it would be much harder to complete crossword puzzles, because wrong answers wouldn’t produce sections where no words seemed to fit, signaling the error. Intriguingly, with a less well-compressed language, with more non-word letter strings and longer words on average, crossword
puzzles would be nearly impossible to compose, because you couldn’t find enough valid words whose spellings crisscrossed in the right way. The entropy of English is just about perfect for crossword puzzles.
19. This sentence read best when I made my examples all nouns, but lest you think that this process happens only to (somewhat uncommon) nouns, and not to everyday adjectives and adverbs, I hope you don’t think so anymore. Anything and everything can do it.
20. The American Heritage Book of English Usage, §8. (That § symbol, being outside both alphabet and punctuation, is probably the entropy value of half a sentence. I get a satisfying feeling from using it. Ditto for other arcane and wonderfully dubbed marks like the pipe, voided lozenge, pilcrow, asterism, and double dagger.)
21. The most recent statistics I’ve seen put global cell phone subscriptions at 4.6 billion, in a global population of 6.8 billion.
22. Dave Matthews Band’s “You and Me” is, to my knowledge, the first major radio single to have its lyrics written on an iPhone—suggesting that text prediction may increasingly affect not only interpersonal communication but the production of art.
23. Including underneath the word “ain’t,” despite its having been in steady use since the eighteenth century. It returns 83,800,000 results on Google and was said in the 2008 vice presidential debate.
24. In October 2008, an online petition over twenty thousand strong helped persuade Apple to allow users, once they download the new version of the iPhone firmware, to disable auto-correction if they wanted.
25. Some artists are actually using compression artifacts and compression glitches to create a deliberate visual aesthetic, called “datamoshing.” From art-world short films like Takeshi Murata’s “Monster Movie” to mainstream music videos like the Nabil Elderkin-directed video for Kanye West’s “Welcome to Heartbreak,” we’re seeing a fascinating burst of experiments with what might be called “delta compression mischief.” For instance, what happens when you apply a series of diffs to the wrong I-frame, and the wall of a subway station starts to furrow and open uncannily, as though it were Kanye West’s mouth?
26. E.g., Timothy Ferriss: “My learning curve is insanely steep right now. As soon as that plateaus, I’ll disappear to Croatia for a few months or do something else.” Not all of us can disappear to Croatia at whim, but the Shannon Game suggests, perhaps, that simply asking the right questions might work.
27. This sentence itself being one, and perhaps the only, exception.
28. (log2 27 = 4.75)
29. Florida Tech’s Matt Mahoney for one, and Brown’s Eugene Charniak for another.
11. Conclusion:
The Most Human Human
The Most Human Computer award in 2009 goes to David Levy—the same David Levy whose politically obsessed “Catherine” took the prize in 1997. Levy’s an intriguing guy: he was one of the big early figures in the computer chess scene of the 1980s, and was one of the organizers of the Marion Tinsley–Chinook checkers matches that preceded the Kasparov–Deep Blue showdown in the ’90s. He’s also the author of the recent nonfiction book Love and Sex with Robots, to give you an idea of the other sorts of things that are on his mind when he’s not competing for the Loebner Prize.
Levy stands up, to applause, accepts the award from Philip Jackson and Hugh Loebner, and makes a short speech about the importance of AI to a bright future, and the importance of the Loebner Prize to AI. I know what’s next on the agenda, and my stomach knots despite itself in the second of interstitial silence before Philip takes back the microphone. I’m certain that Doug’s gotten it; he and the Canadian judge were talking NHL from the third sentence in their conversation.
Ridiculous Canadians and their ice hockey, I’m thinking. Then I’m thinking how ridiculous it is that I’m even allowing myself to get this worked up about some silly award—granted, I flew all the way out here to compete for it. Then I’m thinking how ridiculous it is to fly five thousand miles just to have an hour’s worth of instant messaging conversation. Then I’m thinking how maybe it’ll be great to be the runner-up; I can obsessively scrutinize the transcripts in the book if I want and seem like an underdog, not a gloater. I can figure out what went wrong. I can come back next year, in Los Angeles, with the home-field cultural advantage, and finally show—
“And the results here show also the identification of the human that the judges rated ‘most human,’ ” Philip announces, “which as you can see was ‘Confederate 1,’ which was Brian Christian.”
And he hands me the Most Human Human award.
Rivals; Purgatory
I didn’t know what to feel about it, exactly. It seemed strange to treat it as meaningless or trivial: I had, after all, prepared quite seriously, and that preparation had, I thought, paid off. And I found myself surprisingly invested in the outcome—how I did individually, yes, but also how the four of us did together. Clearly there was something to it all.
On the other hand, I felt equal discomfort regarding my new prize as significant—a true measure of me as a person—a thought that brought with it feelings of both pride (“Why, I am an excellent specimen, and it’s kind of you to say so!”) and guilt: if I do treat this award as “meaning something,” how do I act around these three people, my only friends for the next few days of the conference, people judged to be less human than myself? What kind of dynamic would that create? (Answer: mostly they just teased me.)
Ultimately, I let that particular question drop: Doug, Dave, and Olga were my comrades far more than they were my foes, and together we’d avenged the mistakes of 2008 in dramatic fashion. 2008’s confederates had given up a total of five votes to the computers, and almost allowed one to hit Turing’s 30 percent mark, making history. But between us, we hadn’t permitted a single vote to go the machines’ way. 2008 was a nail-biter; 2009 was a rout.
At first this felt disappointing, anticlimactic. There were any number of explanations: there were fewer rounds in ’09, so there were simply fewer opportunities for deceptions. The strongest program from ’08 was Elbot, the handiwork of a company called Artificial Solutions, one of many new businesses leveraging chatbot technology to “allow our clients to offer better customer service at lower cost.” After Elbot’s victory at the Loebner Prize competition and the publicity that followed, the company decided to prioritize the Elbot software’s more commercial applications, and so it wouldn’t be coming to the ’09 contest as returning champion. In some ways it would have been more dramatic to have a closer fight.
In another sense, though, the results were quite dramatic indeed. We think of science as an unhaltable, indefatigable advance: the idea that the Macs and PCs for sale next year would be slower, clunkier, heavier, and more expensive than this year’s models is laughable. Even in fields where computers were being matched up to a human standard, such as chess, their advance seemed utterly linear—inevitable, even. Maybe that’s because humans were already about as good at these things as they ever were and will ever be. Whereas in conversation it seems we are so complacent so much of the time, so smug, and with so much room for improvement—
In an article about the Turing test, Loebner Prize co-founder Robert Epstein writes, “One thing is certain: whereas the confederates in the competition will never get any smarter, the computers will.” I agree with the latter, and couldn’t disagree more strongly with the former.
Garry Kasparov says, “Athletes often talk about finding motivation in the desire to meet their own challenges and play their own best game, without worrying about their opponents. Though there is some truth to this, I find it a little disingenuous. While everyone has a unique way to get motivated and stay that way, all athletes thrive on competition, and that means beating someone else, not just setting a personal best … We all work harder, run faster, when we know someone is right on our heels … I too would have been unable to reach my potential without a nemesis like Karpov breathing down my neck and pushing me every step of the way.”
Some people imagin
e the future of computing as a kind of heaven. Rallying behind an idea called the “Singularity,” people like Ray Kurzweil (in The Singularity Is Near) and his cohort of believers envision a moment when we make machines smarter than ourselves, who make machines smarter than themselves, and so on, and the whole thing accelerates exponentially toward a massive ultra-intelligence that we can barely fathom. This time will become, in their view, a kind of techno-rapture, where humans can upload their consciousnesses onto the Internet and get assumed, if not bodily, then at least mentally, into an eternal, imperishable afterlife in the world of electricity.
Others imagine the future of computing as a kind of hell. Machines black out the sun, level our cities, seal us in hyperbaric chambers, and siphon our body heat forever.
Somehow, even during my Sunday school days, hell always seemed a little bit unbelievable to me, over the top, and heaven, strangely boring. And both far too static. Reincarnation seemed preferable to either. To me the real, in-flux, changeable and changing world seemed far more interesting, not to mention fun. I’m no futurist, but I suppose, if anything, I prefer to think of the long-term future of AI as neither heaven nor hell but a kind of purgatory: the place where the flawed, good-hearted go to be purified—and tested—and to come out better on the other side.
If Defeat
As for the final verdict on the Turing test itself, in 2010, 2011, and thereafter—
The Most Human Human Page 27