The Numerati

Home > Other > The Numerati > Page 14
The Numerati Page 14

by Stephen Baker


  Still, the casino has concrete evidence that she has broken the law. A discussion ensues. Is she a pro? Is she drunk? Is it possible she doesn’t know the rules? They decide she’s no pro. It’s only $5, after all. They send a supervisor on the floor to talk to her as she and her friends are leaving the table. We watch. She’s surprised, confused, and then grave. Then the supervisor says something that puts her at ease. She relaxes, smiles, jokes, and then goes her tipsy way. The authorities have let her know that she, along with all the other gamblers, lays her bets under a legion of watchful eyes. But they’ve opted this time not to dampen the partying mood. In fact, as she walks away, they still don’t know her name. It doesn’t matter.

  The same goes for the other gamers milling about below. Whether they’re sidled up to the bar downing Cuba Libres or grimly feeding quarters from a paper cup into a slot machine, they’re free. No finger wagging here. If they pay with cash and look older than 21, no one asks their names. They’re anonymous. And yet they’re at play under the gaze of pit captains and security forces on the floor and their teammates up here in the surveillance room.

  What details, I ask the boss in the crow’s nest, does the casino need to collect in order to pick out the handful of grifters and thieves? What data separates them from the rest of us? He says it boils down to three questions: Are they on the casino’s list of known crooks, cheaters, and card counters? Does their behavior in the casino signal malicious designs? And are they winning lots of money? If you think about it, these three questions are the underpinnings of most police and intelligence work: Does the person have a record? Is he acting suspiciously, perhaps in cahoots with others? And is he in or near the spot when significant events take place, whether they’re bus bombings in London or a fabulous pinch-me-and-tell-me-I’m-not-dreaming run on a craps table in Vegas?

  It’s the folks here in the crow’s nest and on the floor who scrutinize the behavior of the gamblers. The signals they’re looking for are far too sophisticated and nuanced for machines or data mining to pick up. Some people, for example, aren’t smiling or drinking. “They look like they’re working,” he says. Some of them make gestures that could be signals. They rake their hands through their hair insistently, or they make a tipping gesture with a drink. Some put tiny card-counting computers into their shoes and make small movements as they hit buttons with their toes. These gestures fit into patterns surveillance teams are taught. Picking them up requires observation and human intelligence.

  The far easier signals come from numbers. They should be just as predictable, on average, as the trains pulling into a Zurich station. In the short run, they fluctuate, giving hope to long-shot gamblers. But with time, each tool hits its standard rate of return—each one of them favoring the casino. When casinos see deviations from the expected numbers, they go take a look.

  The third category comes from data. That’s where things have changed—and where people like Jeff Jonas make a difference. In the early years, Las Vegas relied on plugged-in people to collect data. Experts combed through hotel, credit, and personnel files, looking for folks who appeared as “subjects of interest.” They zeroed in, of course, on known crooks and cheaters who should be shown the door—or locked up. But the data hounds were also on the lookout for high rollers who should be upgraded to luxury suites or comped a magnum of champagne. When it came to spotting something out of the ordinary, whether promising or suspicious, no machine could rival a smart and experienced human. Think of Humphrey Bogart’s character Rick in the movie Casablanca. He’d eyeball the floor and scan the ledger names. He knew the sordid stories. He kept up with the evolving tangle of friendships and alliances. He had the place scoped out. But these jumbo casinos have grown far too big for the human approach. Some of them have more than 3,000 rooms. They entertain 100,000 visitors in a single day—considerably more than Rick’s gin joint in West Africa. They need powerful machines to sort through the data. And the approach the casinos use, Jonas says, could help sharpen the focus in the battle against terrorism.

  Of course, we can’t all punch the pillow and roll over, confident that the most dangerous terrorists will pop up on government watch lists, obediently publish their names and addresses in phone books, and reserve plane tickets and hotel rooms for their suicidal cohorts. That would be too much to wish for. We don’t have all the suspects on record, not by a long shot. The gumshoes are short on leads. Data miners, meanwhile, often struggle to find meaningful signals. Does this mean that the government calls off the electronic hunt for terrorists and the Numerati retreat to safer and surer jobs in advertising and grocery stores, where their statistical methods work fine? Not on your life. The need to close the intelligence gap is urgent. Our safety is at stake. So we reach for something, even if it’s faulty, to protect us. In cultures that are tough to penetrate and understand, data mining at least offers the possibility of finding something. In essence, we compensate for our shortcomings in languages and on-the-ground intelligence with a heavy dose of unproven technology.

  This is fueling a researchers’ gold rush, a period of wide and wild experimentation. The Numerati are reaching into any discipline they can find, whether it’s economics, physics, biology, or sociology, to unearth formulas that can be tweaked to predict the behavior of terrorists. They’re not only in the business of mining data; they’re also mining theories, many of them hatched long before computers were around to crunch the numbers. The idea is that certain patterns, both in human behavior and in nature, pop up in different realms. Maybe some of them will help expose sleeper cells or bomb factories. Researchers have centuries of data, for example, on the diffusion of plagues and epidemics. They can tell you, mathematically, the chances that the seeds from my dandelion-infested yard will float onto my neighbor’s pristine, manicured lawn across the street. Do the hateful ideas of terrorists spread in similar patterns? Do terror cells metastasize like cancer? Do they mutate and evolve like certain viruses? No? How about if you change a variable or two? Social scientists study the evolution of networks, from those on MySpace to cell phone users in Singapore. Who are the hubs in these networks? How do they rise to this status? Do their spheres of influence shift with time? Again, what researchers learn here can be boiled down to the mathematics of human communication and organization across networks. Does Al Qaeda follow similar patterns?

  JUST AS OUR experience on the Internet is moving beyond the written word, so is the data pouring into NSA computers. Much of it arrives as spoken voices, images, and video. It might be a face in a crowd in Baghdad, or perhaps a raspy voice giving orders in Farsi from a Skype account somewhere in the Horn of Africa. To mine this outpouring of data, machines must make sense of the words we speak in scores of languages. They must learn to pick out one or two faces from six billion others. To extend their nets into sounds and images, the counterterrorists need new technologies. Researchers around the world, many of them scooping up rich government grants, are busy assembling them.

  This technology has been replayed in the movies so long that it seems familiar. A machine automatically sorts through photos of people at a café in Tripoli or Karachi, or perhaps crowd shots at the Olympics. Then it matches the faces in the photos with a dossier of known and suspected terrorists. That’s the goal. As these systems take shape, our faces will find their way into enormous databases. Then computers run by governments and corporations will be able to map the movements of humanity. For most of us, in truth, this won’t make much difference. Our faces will show up along the same trails drawn by our airplane tickets, credit card bills, and—above all—our cell phones. Yet these facial images could prove vital for police. They could capture data on people who are struggling mightily to stay off the grid. A photo reader might find, for example, that the same green-eyed man with a bump in his nose and a scar on his lip has traveled at least three times this year between Newark, the rough Parisian suburb of Saint-Denis, and Cairo. Does that face pop up on other databases?

  A global snooping networ
k is already emerging. Britain has been an early leader in installing security cameras, with 200,000 operating in London alone. The image of the average Briton, police say, is captured by as many as 300 cameras per day. American cities, including Chicago and New York, are rushing to follow suit. And late in 2007, according to the New York Times, the Chinese government announced plans not only to monitor the streets of the southern city of Shenzhen with 20,000 police cameras but also to give police there access to the feeds from another 180,000 video cameras run by the government and private companies.

  All of us, from bombers to subway passengers, will be playing ever bigger roles in these surveillance films. But on this global stage—unlike the cozy casinos in Las Vegas—there aren’t nearly enough human workers to monitor all the action. And the machinery to sift through all this video isn’t yet up to the job. At this point, an automated system can compare mug shots of suspects with thousands of photos on file, and suggest a handful of them that have a similar facial profile—before handing over the job to humans. Despite what Hollywood would have you believe, identifying faces in the real world is still very much a work in progress. Faces duck in and out of shadows. They turn from full face to profile. They tighten as we laugh and bulge as we eat. With age, they sprout beards, lose teeth, gain heft, grow new lines and furrows. Pinpointing the same face through all those changes is an immensely complicated task for a machine. But the computers are getting closer. The U.S. National Institute for Standards and Technology held a Grand Challenge for face-recognition systems in 2006. Researchers had to develop 3-D models of faces so that they could be recognized from a variety of angles. In the four years since the previous competition, results improved by a factor of ten.

  Scientists are also delving more deeply into the noises we make. They’re analyzing not just the words we utter but even the timbre on our voice. Researchers at BBN near Boston, for example, have government contracts to study the effects of emotion on our voices. “When you’re under stress, you’ll produce sounds differently,” says Herb Gish, the chief scientist at the company. “Are these different from when you’re angry, or sad?” Naturally, they tackle this challenge, like so many others, by breaking down the voice into bits of data. They study the patterns, almost as if they were strands of DNA, and correlate them mathematically to the emotions they express. At some point, Gish says, researchers will have tools to gauge the likelihood that the voice echoing through the phone line or over the Internet comes from a person who is sad, angry, or tense. This means more work for data miners. They’ll have to write algorithms to burrow through vast audio files, searching not just for key words or network patterns but certain moods. The level of complexity shoots ever upward.

  As government sleuths dig deeper into networks and data, they find themselves wrestling with the same challenges as the Numerati elsewhere, at Google, at Umbria, and at Microsoft. Spies and advertisers are working on the same math. This leads to a battle for precious brainpower. A generation ago, the NSA could lay claim, in its quiet way, to many of the brightest mathematicians and computer scientists in the country. But now they have to compete with Internet giants worth hundreds of billions of dollars. There’s a global race for star talent. When hotshots like Yahoo’s Raghavan, a former star at IBM Research, step into the Internet job market, bidding battles erupt. In Raghavan’s case, it was between Microsoft and Yahoo. Both companies struggle to keep up with Google, which has been minting millionaires all over the world. How can the NSA compete? What’s more, the Internet companies are free to open research divisions in India, China, Japan, and Europe—which produce more mathematicians and scientists than the United States does. And as we’ve seen in nearly every chapter of this book, these companies hire plenty of gifted foreigners in the United States. The NSA, by contrast, is limited to U.S. citizens—a severe constraint. Schatz says the agency can still land great ones, people who are drawn to a settled suburban life, national service, and a chance to grapple with outsized challenges.

  Still, when it comes to tracking down the likes of Al Qaeda through data, the government can hardly do the work alone. So they farm much of it out. They take vast files of so-called terrorist data. They declassify the files by changing names and other features. Then they distribute these sets to university and corporate researchers. This opens the job to thousands who find themselves, as I do, outside the fenced compound of the NSA.

  HAVE YOU EVER met someone and thought, “Isn’t it strange that I didn’t meet this person before?” It’s usually someone whose path crosses with yours. Maybe you live in the same neighborhood. Maybe you ride the same train every morning, or you have an ex or two in common. Perhaps you are the only two in your sleepy town who have pierced tongues and green hair.

  Now imagine trying to predict the next friend you’ll meet. Which of your circles will this person emerge from? Which facts about yourself, and about others, are most likely to lead to the connection? Researchers at Carnegie Mellon University are searching for answers as they wade through gobs of unclassified surveillance data from the Department of Homeland Security.

  Let’s say that three suspected bombers were spotted a week ago in Nairobi. There’s no sign of them now. But chances are, they were making plans with comrades, perhaps members of sleeper cells. Who are these hidden allies? Often, says Artur Dubrawski, one of the CMU researchers, the government has data on lots of people. It knows that a few of them are suspected terrorists, but everyone else is just a name. So how do investigators know where to look? Which of those names are most likely to be associated with those three who passed through Nairobi?

  It’s easier to imagine the math involved here by picturing your own life and your own friends. Say you’re having four people to dinner, and you want to find a fifth guest. This would ideally be someone who would fit neatly into the group, either through shared friendships or values. This is what the CMU team would call “a next friend.” You and the four people in this tiny example are going to be the training set. You’re going to add up the various links you share with those people and then use them to calculate the most likely fifth guest. So go at it. What features do you share with these people? Let’s say two of them are lawyers, like you. Three are women. One is a friend of your sister’s. One is a former lover. One of the two lawyers went to summer camp with you back in the 1980s. He lives upstairs. Two of them speak excellent French, a language you love, and a third cooks good French food. Some of the features may seem outlandish or beside the point. Maybe you know for a fact that three of these people snore, or that two of them used to date diplomats. Include everything. The irrelevant stuff—the noise—will be flushed out later.

  Now picture yourself and your four guests as five dots on a graph. In the world of social networks, these are called nodes. (And most training sets have hundreds or thousands of them.) Each link shared among the five people is a line connecting them, a so-called edge. In the computer world, these graphs exist in limitless dimensions, like Umbria’s blog universe. You don’t have to worry about these thousands of edges running into each other and making a mess, the way they might in a grade-school science project. There’s space for all of them. The next step is to calculate the importance of each edge. This involves statistics. Which links most distinguish your friends from everyone else in the world? If you’re inviting both men and women to your dinner party, the gender links are meaningless predictors. In that respect, your party reflects the world. The Next Friend approach zeros in on the links that set your party apart. The lawyer links, for example, the French, the summer camp connection, those are probably much more predictive. So they’re given a higher score, or coefficient. Those lines on the graph are thicker. Now it’s time for CMU’s program to put all the numbers together and create a composite profile of your most likely “next friend.” Then it will go through a database of your friends and give them each a score. The highest-ranking person on the list is its choice as the most likely to enjoy dinner next Saturday with you. How did that pers
on get such a sky-high score? In this case, it might be a taste for French food, some experience at summer camp, or perhaps a litigious nature. In the more important national security case, the next friend of the three terrorists seen in Nairobi might turn out to have been in Afghanistan during the same period as two of the others. Or maybe they call the same phone numbers or have a brother in the same jail.

  That’s assuming that the declassified database in CMU’s computer is bursting with the same kind of rich details that you came up with to describe your dinner guests. This brings us to a central problem in the electronic hunt for terrorists: iffy data and incomplete files. It shouldn’t come as any surprise that we know our friends far better than we know our enemies.

  Intelligence services are often flummoxed by even the most basic piece of data in a person’s file: his or her name. This is one crucial area where our cultural diversity defies the sorting and counting magic of the computer. Jack Hermansen knows this all too well. He’s been working on the electronic recognition of names since 1984, when he got his doctorate in computational linguistics from Georgetown University. The U.S. State Department called on him back then to help figure out which names belonged to which people. It seemed like a simple enough task. Figure out the variations, from culture to culture, on the spelling of a name like Sean, Mohammed, or Chang, and stick them into a computer. “They wanted some linguistic fairy dust sprinkled on their problem,” he says. But Hermansen knew that the global interpretation of names was endlessly complex. That “Haj” in an Arab name? That just means he’s made the pilgrimage to Mecca, but it will show up as a last name in some databases. A Chang will appear in French as Tchang, or maybe Tchung, and the Germans and Russians will have different takes on it. The Chinese alone have 11 different spellings for Osama bin Laden.

 

‹ Prev