The Most Human Human

Page 20

by Brian Christian

Holds are useful for the judges to manipulate. Limiting the number of holds can stall the conversation, which is potentially interesting: humans, on the side of truth, would have more of an incentive to reanimate it than the computers. On the other hand, computers are frequently oblivious to conversational momentum anyway, and tend to be eager to topic shift on a dime; it’s likely a many-holds approach from the judge might be best. One idea that a judge might employ is to put something odd into the sentence—for instance, if asked how long they traveled to the test: “Oh, just two hours in the ol’ Model T, not too far.” A grammatical parser might strip the sentence down to “two hours—not far,” but a human will be so intrigued by the idea of a guy driving a hundred-year-old car that the ordinary follow-ups about traffic, commuting, etc., will be instantly discarded.

As for myself on the confederate side, in that odd and almost oxymoronic situation of high-pressure chitchat, I would have planted holds all over the first few remarks (“no holds barred”?), because there just isn’t any time for slow starts. A judge might find it useful to stall a confederate, but surely not the reverse.

A simple, flat, factual answer (what Lowndes calls a “frozen” or “naked” answer) offers essentially a single hold, asking for more information about that answer. (Or one and a half holds, if you count awkwardly volunteering one’s own answer to the same question: “Cool, well my favorite movie is …”) The only thing worse—which many bots and some confederates nonetheless do—is not answering at all. Demurrals, evasions, and dodges in a Turing test can be fatal: it’s harder to prove that you understand a question when you’re dodging it.

It surprised me to see some of the other confederates being coy with their judges. Asked what kind of engineer he is, Dave, to my left, answers, “A good one. :)” and Doug, to my right, responds to a question about what brings him to Brighton with “if I tell you, you’ll know immediately that I’m human ;-).” For my money, wit is very successful, but coyness is a double-edged sword. You show a sense of humor, but you jam the cogs of the conversation. Probably the most dangerous thing a confederate can do in a Turing test is stall. It’s suspect—as the guilty party would tend to be the one running out the clock—and it squanders your most precious resource: time.

The problem with the two jokes above is that they are not contextually tied in to anything that came before in the conversation, or anything about the judges and confederates themselves. You could theoretically use “if I tell you, you’ll know immediately that I’m human” as a wild-card, catch-all, panic-button-type answer in a bot (similar to ELIZA’s “Can you say more about that?”), applicable for virtually any question in a conversation. And likewise, it’s easy to imagine a bot replying “A good one :)” by template-matching a question asking what kind or type of x something is. Decontextual, non-context-sensitive, or non-site-specific remarks are, in the case of the Turing test, dangerous.

Answer Only the Question Asked

Many last names in America are “occupational”—they reflect the professions of our ancestors. “Fletchers” made arrows, “Coopers” made barrels, “Sawyers” cut wood, and so on. Sometimes the alignment of one’s last name and one’s career is purely coincidental—see, for instance, poker champion Chris Moneymaker,6 world record-holding sprinter Usain Bolt, and the British neurology duo, who sometimes published together, of Russell Brain and Henry Head. Such serendipitous surnames are called “aptronyms,” a favorite word of mine.

Such were the sorts of thoughts in my head when I called attorney Melissa Prober. Prober’s worked on a number of high-profile cases, including being part of the team that defended President Clinton during the investigation leading to his impeachment hearings and subsequent Senate acquittal. The classic advice given to all deponents, Prober explained to me, is to answer just the question being asked, and only the question being asked.

Her colleague (who has since become the executive assistant U.S. attorney for the district of New Jersey) Mike Martinez concurred. “If you volunteer too much—First, it’s just not the way the system’s supposed to be anyway. The way it’s supposed to be is Lawyer A makes a question and Lawyer B decides whether that’s a fair question. If the person answers beyond that, then he’s doing so unprotected.”

It’s interesting—many Loebner Prize judges approach the Turing test as a kind of interrogation or deposition or cross-examination; strangely, there are also a number of confederates who seem to approach it with that role in mind. One of the conversations in 2008 seems never to manage to get out of that stiff question-and-response mode:

JUDGE: Do you have a long drive?

REMOTE: fairly long

JUDGE: so do I :( ah well, do you think you could have used public transport?

REMOTE: i could have

JUDGE: and why not?

REMOTE: i chose not to

JUDGE: that’s fair. Do you think we have too many cars on the road or not enough today?

REMOTE: its not for me to say

Yawn! Meanwhile the computer in the other terminal is playful from the get-go:

JUDGE: HI

REMOTE: Amen to that.

JUDGE: quite the evangelist

REMOTE: Our Father, who art in cyberspace, give us today our daily bandwidth.

JUDGE: evangelist / nerd lol. So how are things with you today?

And has practically sealed up the judge’s confidence from sentence two. Note that the confederate’s stiff answers prompt more grilling and forced conversation—what’s your opinion on such-and-such political topic? But with the computer, misled into assuming it’s the real person by its opening wisecracks, the judge is utterly casual: How are things? This makes things easier for the computer and harder for the confederate.

On the Record

The humans in a Turing test are strangers, limited to a medium that is slow and has no vocal tonality, and without much time—and also stacked against them is the fact that the Turing test is on the record.

In 1995 one of the judges, convinced—correctly, it turns out—that he was talking to a female confederate, asked her out, to which she gave the mu-like non-answer “Hm. This conversation is public isn’t it?” And in 2008 two humans got awkward and self-conscious:

JUDGE: Did you realise everyone can see what’s being typed on this machine on a big screen behind me?

REMOTE: uhh.. no.

REMOTE: so you have a projector hooked up to your terminal then?

JUDGE: Yeah, it’s quite freaky. So watch what you say!!

That guardedness makes the bots’—I can’t believe I was going to say “lives” here—easier.

As author and interviewer David Sheff—who wrote, among numerous books and articles, the last major interview with John Lennon and Yoko Ono, for Playboy in 1980—explains to me, “The goal has always been to transform the conversation from a one that is, you know, perceived by the subject as an interview to one that becomes a dialogue between two people. The best stuff always came when it was almost as though the microphone disappeared.” In the conversation we saw earlier, with the judge saying “Do you think we have too many cars on the road” in one window and “So how are things with you today?” in the other, this difference in timbre can make a big difference.

The paradigm of guardedness in our culture is the politician. Just the other day some of my friends were talking about a mutual acquaintance who has started obsessively scrubbing and guarding his Facebook profile. “What, is he running for office or something?” they half joked. That’s the kind of character-sterility that our society both demands and laments in politics. No holds.

Professional interviewers across the board say that guardedness is the worst thing they can run into, and they are all, as far as I can tell, completely unanimous in saying that politicians are the worst interview subjects imaginable. “With every response they’re [politicians] trying to imagine all the pitfalls and all the ways it could come back to bite them,” Sheff says. “The most interesting p
eople to interview are the people who want to do exactly what you want to do in this test—which is to show that they’re a unique individual.” That tends not to be on politicians’ agendas—they treat conversation as a minimax game, partially because their worst words and biggest gaffes and failures so often ring out the loudest: in the press, and sometimes also in history. Whereas artists, for example, will generally be remembered for their best, while their lesser works and miscues are gracefully forgotten. They can be non-zero-sum.

Prolixity

The more words spoken the better the chance of distinguishing lies from truthfulness.

–PAUL EKMAN

Add to all the above the fact that the Turing test is, at the end of the day, a race against the clock. A five-second Turing test would be an obvious win for the machines: the judges, barely able even to say “hello,” simply wouldn’t be able to get enough data from their respondents to make any kind of judgment. A five-hour one would be an obvious win for the humans. The time limit at the Loebner Prize contest has fluctuated since its inception, but in recent years has settled on Turing’s original prescription of five minutes: around the point where conversation starts to get interesting.

Part of what I needed to do was simply to make as much engagement happen in those minutes as I physically and mentally could. Against the terseness of the deponent I offered the prolixity and logorrhea of the author. In other words, I talked a lot. I only stopped typing when to keep going would have seemed blatantly impolite or blatantly suspicious. The rest of the time, my fingers were moving.

If you look at Dave’s transcripts, he warms up later on, but starts off like he’s on the receiving end of a deposition, answering in a kind of minimal staccato:

JUDGE: Are you from Brighton?

REMOTE: No, from the US

JUDGE: What are you doing in Brighton?

REMOTE: On business

JUDGE: How did you get involved with the competition?

REMOTE: I answered an e-mail.

Like a good deponent, he lets the questioner do all the work7—whereas I went out of my way to violate that maxim of “A bore is a man who, being asked ‘How are you?’ starts telling you how he is.” (And I might add: “And doesn’t stop until you cut him off.”)

JUDGE: Hi, how’s things?

REMOTE: hey there

REMOTE: things are good

REMOTE: a lot of waiting, but …

REMOTE: good to be back now and going along

REMOTE: how are you?

When I saw how stiff Dave was being, I confess I felt a certain confidence—I, in my role as the world’s worst deponent, was perhaps in fairly good shape as far as the Most Human Human award was concerned.

This confidence lasted approximately sixty seconds, or enough time to glance to my other side and see what Doug and his judge had been saying.

Fluency

Success in distinguishing when a person is lying and when a person is telling the truth is highest when … the interviewer and interviewee come from the same cultural background and speak the same language.

–PAUL EKMAN

In 2008, London Times reporter Will Pavia misjudged a human as a computer (and thus voted the computer in the other window a human) when a confederate responded “Sorry don’t know her” to a question about Sarah Palin—to which he incredulously replied, “How can you possibly not know her? What have you been doing for the last two months?” Another judge that year opened his conversations with a question about the “Turner Prize shortlist,” the annual award to a contemporary British visual artist, with similarly hit-or-miss results: Most Human Computer winner Elbot didn’t seem to engage the question—

JUDGE: What do you think of this year’s Turner Prize shortlist?

REMOTE: Difficult question. I will have to work on that and get back to you tomorrow.

—but neither, really, did the confederate in that round:

JUDGE: What do you think of this year’s Turner Prize shortlist?

REMOTE: good I think. Better than the years before i herad

JUDGE: Which was your favorite?

REMOTE: Not really sure

Runner-up for 2008’s Most Human Computer was the chatbot “Eugene Goostman,” which pretended to be an immigrant, a non-native speaker of English with an occasionally shaky command of the language:

REMOTE: I am from Ukraine, from the city called Odessa. You might have heard about it.

JUDGE: cool

REMOTE: Agree :-) Maybe, let’s talk about something else? What would you like to discuss?

JUDGE: hmm, have you heard of a game called Second Life?

REMOTE: No, I’ve never heard such a trash! Could you tell me what are you? I mean your profession.

Is this cheating, or merely clever? Certainly it’s true that if language is the judge’s sole means of determining which of his correspondents is which, then any limitations in language use become limitations in the judge’s overall ability to conduct the test. There’s a joke that goes around in AI circles about a program that models catatonic patients, and—by saying nothing—perfectly imitates them in the Turing test. What the joke illustrates, though, is that seemingly the less fluency between the parties, the less successful the Turing test will be.

What, exactly, does “fluency” mean, though? Certainly, to put a human who only speaks Russian in a Turing test with all English speakers would be against the spirit of the test. What about dialects, though? What exactly counts as a “language”? Is a Turing test peopled by English speakers from around the globe easier on the computers than one peopled by English speakers raised in the same country? Ought we to consider, beyond national differences, demographic ones? And where—as I imagine faltering against a British judge’s cricket slang—do we draw the line between language and culture?

It all gets a bit murky, and because in the Turing test all routes to and from intelligence pass through language, these become critical questions.

All of a sudden I recalled a comment that Dave Ackley had made to me on the phone, seemingly offhand. “I really have no idea how I would do as a confederate,” he said. “It’s a little bit of a crapshoot whether the judges are your kind of people.” He’s right: if language is the medium with which we confederates must prove ourselves to the judges, then there are any number of things that can aid or impair it, from shared interests or reference points, to generational gaps, to nuances of allusion and slang.

Among the four confederates, Dave and I are Americans, Doug is Canadian, and Olga is a Russian-born South African. Among the four judges, two are English, one is an American expatriate to England, and one is a Canadian. I had read logs of Loebner Prizes past and had seen the problems that arise when cultural mismatch or cultural disfluency rears its head.

I wondered: Would any such cultural issues come to bear in 2009? All my preparations, my investigations, all the good advice I’d gotten from lawyers, linguists, researchers, and interviewers, wilted compared to actually having something in common and hitting it off with someone. To “speaking the same language,” however literally or figuratively. Would that play in this year?

I didn’t have to wait long for my answer; any uncertainty I’d had on that score, not to mention the optimism I’d begun to feel about my own chances, faded fast when I glanced at Doug’s terminal:

JUDGE: Hey Bro, I’m from TO.

REMOTE: cool

REMOTE: leafs suck

REMOTE: ;-)

JUDGE: I am jist back froma sabbatical in the CS Dept. at U or T.

REMOTE: nice!

JUDGE: I remember when they were a great team.

JUDGE: That carbon date me, eh?

REMOTE: well, the habs were a great team once, too …

REMOTE: *sigh*

JUDGE: YEH, THEY SUCK TOO.

REMOTE: (I’m from Montreal, if you didn’t guess)

Doug and his judge had just discovered that they were both from Canada. And they started to let rip with abbreviatio
ns and nicknames and slang and local references. And they started to talk about hockey.

I was in trouble.

1. Generally speaking, software has three ways of going awry: crashing while the code is being compiled into a program (“compile-time”), crashing when the program is being run by a user (“run-time”), or running smoothly but producing weird behavior. This is roughly analogous to sentences that are ungrammatical, un-meaningful, and false—to which we could reply “Huh!?,” “Mu,” and “Nope,” respectively.

2. That Wikipedia contains relatively detailed instructions on how to parry such questions is indicative of how difficult they are to deal with.

3. Also, there’s no point in trying to mask your interest—whether it be sexual, social, academic, professional, or other—in the person anyway, because the very fact that you’re talking to them signals it: they’re not stupid.

4. A common complaint among “pickup artists,” I learned, is that they get tons of phone numbers but no one calling back—a telltale sign of a maximin approach.

5. Graph theory talks about the “branching factor” or the “degree” of a vertex, meaning the number of nodes in the graph to which a given node connects. The conversational analogue is how many distinct continuations or segues there are from the present remark or topic; for my money, the sweet spot is around two or three.

6. Apparently his German ancestors, surname Nurmacher, were in fact “moneyers,” or coin smiths, by trade.

7. Prober recalled asking one deponent if he could state his name for the record. His answer: “Yes.”

9. Not Staying Intact

Each is trying not to give himself or herself away, each is preserving fundamental loneliness, each remains intact and therefore unfructified. In such experiences there is no fundamental value.

–BERTRAND RUSSELL

‹ Prev Next ›