Netflixed

Home > Other > Netflixed > Page 20
Netflixed Page 20

by Gina Keating


  With a feeling of dubious curiosity I traveled to Netflix’s new Los Gatos headquarters in January 2007, along with journalists from other major outlets and HackingNetflix’s Kaltschnee, for a demonstration of the long-awaited streaming feature. Steve Swasey had been warning me and the others—as Hastings had been warning investors—not to expect much in the way of title selection. I couldn’t help thinking of the myriad download services I had seen come and go over the previous three years, mainly because there was nothing on them that anyone wanted to watch. They seemed little more than an inconvenient form of pay per view.

  Hastings and Swasey, who had by now been promoted to vice president of corporate communications, gave me a quick tour of the airy, nouveau Mediterranean open-plan building. We stopped by a state-of-the-art espresso bar in the kitchen-dining area so that Hastings could make me a cappuccino—a ritual I later learned he performed for all the reporters he met with individually. Then we sat down in the main conference room, so that he could show me the streaming feature.

  The conference room was cavernous and sleek, with a big skylight that let the winter sunlight pour in. When I remarked how much the company had grown from the dingy digs on University Avenue where I had first met him years earlier, Hastings looked around proudly and smiled, saying he could hardly believe it himself.

  The “instant streaming” feature, which he demonstrated on a laptop with the eagerness of a kid with a new toy, was, like most Netflix software, a work of art that fit seamlessly into the Web site’s suite of features. It took one mouse click to load it and about twenty seconds to begin playing a movie at DVD-quality resolution. The onscreen controls worked without a hitch—better than my DVD player, I thought.

  It was puzzling, though, that they had launched with so thin a title selection—just one thousand movies—considering that Hastings had torpedoed his earlier download effort for the very same reason. I wondered how much Total Access was cutting into Netflix’s growth and whether that had influenced the decision to go public with the feature so quickly.

  Analysts generally liked the timing of the new offering as an answer to Total Access, but tech writers whined that tethering viewers to the Internet to watch a movie made Instant Viewing of limited value. And they brought up a valid point: How many subscribers would want to watch something as long as a movie or TV show on a small screen?

  It seemed counterintuitive, but as with all his bold moves, Hastings had glimpsed the future and was steering his company toward it. Meanwhile, Netflix’s market researchers had found the holy grail of customer feedback in the streaming service—real-time input about what customers thought about the movies they watched, based on how they behaved as they watched them. The system watched viewers as they screened films, noting the scenes where they stopped and rewound, how long it took them to abandon a film they didn’t like, where they paused, what scenes they skipped. The resulting analysis of human behavior had the potential to be richer and more personal than any focus group could be.

  Netflix no longer needed to connect with subscribers through its movie ratings system to know what they liked, but Ross knew that if the company could not make Total Access disappear like New Coke, they would have to call on consumers’ sentimental attachment—and hope it was enough.

  CHAPTER ELEVEN

  THE INCREDIBLES

  (2006–2009)

  FROM NETFLIX’S BEGINNINGS IT HAD been paramount to make every movie seem enticing. That wisdom, handed down by Randolph from his direct mail bible, had been critical to Netflix’s survival when the DVD universe was new and the title selection slim, and tending toward the old and obscure.

  Helping subscribers find movies they loved—not just liked—ensured that they kept returning to the catalog to find some hidden gem, paid their fees each month, and told others about the service. At its most engaging, the Cinematch algorithm acted as a guide leading subscribers down fascinating and unexpected paths through the huge catalog.

  Nearly 70 percent of the titles that ended up in subscribers’ queues resulted from a Cinematch recommendation. The recommendation engine was so compelling that Netflix used it to predict and control its inventory needs—it helped smooth out a steep demand for new releases and directed subscribers toward older films with better rental economics. The fact that the voyage of discovery captivated subscribers was gravy in the early years, but in the throes of Netflix’s war with Blockbuster, it had the potential to be a game changer.

  At first Cinematch sorted and presented lists of movie titles that users were likely to rate highly, based on how they had rated other films in the past, along with themed lists created by Netflix’s content editors. The more movies subscribers rated, the more accurate the system became. As the Web site functionality grew more sophisticated, Cinematch would present only titles that a subscriber was likely to enjoy—meaning that every subscriber saw a different Web site each time he or she signed on to the site. Along with software created by Amazon, Cinematch represented the world’s best collaborative filtering system.

  Over the years, Hastings had augmented his software engineers with mathematicians, to improve the algorithm, and had tinkered with it extensively himself. The idea of boiling down human behavior and tastes to a set of equations fascinated him: Was it really possible to capture so much chaos within the confines of numbers?

  He later would describe how his obsession with the matching algorithm took over his free time—how he had once spent Christmas closeted with his laptop at his Park City ski chalet working on Cinematch while his wife, Patty, complained that he was ignoring their children and frittering away their vacation.

  By 2006, Hastings and his team had wrung all the advances they could out of their approach. Taking in outsiders seemed pointless—he had hired the best he could find. Just as his great-grandfather had set up his Tuxedo Park laboratory to attract the world’s finest scientific minds to the greatest physics mysteries of his day, Hastings decided to hold a $1 million science contest to push for breakthroughs in the algorithms that powered Cinematch. Alfred Loomis had enticed world-famous scientists to his physics lab by dangling cutting-edge equipment, luxurious accommodations, and generous stipends. Hastings would attract machine-language scientists to his contest by offering a real-world data set larger than that community had ever seen.

  Scientists at the Loomis lab raced to make breakthroughs in radar and nuclear fission that would change the course of World War II; Hastings hoped the results of the Netflix Prize would come fast enough to put an end to the war with Blockbuster. He favored a contest along the lines of the 20,000-pound Longitude Prize, awarded in 1714 by the British government to the developers of a method for measuring longitude at sea, or the $10 million Ansari X Prize awarded in 2004 to the developers of the first reusable civilian spacecraft.

  The $1 million cash prize would go to the first team to improve Cinematch’s predictive powers by 10 percent, with $50,000 Progress Prizes awarded to the leaders at each anniversary of the contest’s start date. The contest would be open to anyone of any educational level and background, from any country allowed to do business with the United States. Netflix would provide a database of one hundred million subscriber movie ratings (stripped of personal identifying information), so that contestants could test their equations with real data. Netflix would keep a running tally of the teams’ progress on a public leaderboard, and the winner would own the algorithm—but had to grant the company a license to use it.

  The improvement of 10 percent was equivalent to consistently predicting a subscriber’s movie ratings to within one half to three-quarters star on Netflix’s five-star system. The task of implementing the contest fell to James Bennett, vice president for the recommendation system, and Stan Lanning, a former Pure Atria engineer who, along with Hastings, had refined Cinematch and presided over the movie ratings system.

  Lanning, a genial man with a bald head and a long, gra
y beard, shared a dark cave of an office with a bank of computer monitors and a life-size plastic skeleton riding a pogo stick in one corner.

  Steve Swasey and Ken Ross planted a story about the Netflix Prize in the New York Times and were surprised when a page one story appeared on the contest’s launch day—October 2, 2006. The U.S. and international press ran the story widely, and before the day was up, more than five thousand teams and individuals had registered for the contest. For Swasey, whose assessment of his day was inextricably bound up in the tenor of Netflix’s press coverage, media reaction to the announcement was as thrilling as watching election results roll in and knowing that his candidate was winning by a landslide. Swasey later compared the prize to the a combination of the Preakness, World Cup, and Super Bowl for geeks.

  More than forty thousand teams from 186 countries registered for the $1 million contest in the ensuing three years, attracted by the largest data set ever released and the open and generally friendly nature of the competition. As they began posting their results on the live leaderboard maintained by Netflix, and talking about their progress in the discussion groups, the scientists, mathematicians, and interested amateurs slowly built the world’s most accurate recommender engine from scratch.

  Among them was a team of statisticians looking for new ways to predict human behavior.

  AT&T Shannon Laboratory lies in a gentle fold of green fields bordered by large lush trees in Florham Park, New Jersey, about a ninety-minute train ride from Manhattan. The complex is square and geometric, and has a clean, unpretentious, and uncluttered lobby from which anonymous hallways radiate into the distance. One wall features a gallery of photographs of AT&T scientists, famous in their insular world, and artifacts such as early telephones and antique electronic equipment stand in as decor.

  Each floor has a cozy lounge furnished in Arts and Crafts–style couches and chairs set up around old-fashioned blackboards that are used for brainstorming. The network of hallways gives onto relatively spacious offices, each with a huge whiteboard on the hallway side and a wall of windows overlooking a neatly kept green on the other. The furnishings are utilitarian, and many offices, including that of researcher Robert Bell, have piles of papers stacked neatly along one wall, waist high.

  Bell, a shy California native who came to AT&T Labs in 1998, heard about the Netflix Prize in an e-mail that AT&T’s executive director of research, Chris Volinsky, sent around to about twenty researchers in Florham Park a day or two after Netflix announced the contest. Volinsky led AT&T’s data-mining group, which had worked for more than a decade on large-scale predictions of how customers were likely to behave: which customers were likely to buy an iPhone; which were likely to set up fraudulent accounts; what were the evolving risks associated with the U.S. customer base?

  Data mining is the process of finding predictive or meaningful patterns in huge sets of data: the instant sorting and sifting through billions of Web sites that produces the ranked results of a Google search; the detection of abnormalities among normal cells in a computer-aided medical scan; or suspicion over the comings and goings of a group of visa holders that could indicate a potential threat against the United States.

  Scientists mastering data mining have to write algorithms that examine a data set for important patterns but also discard associations that may seem compelling but lead nowhere.

  Volinsky was a gregarious man whose childhood passion for baseball statistics evolved into a career as a data-mining expert; he loved contests not only to showcase what AT&T Labs could do but for the excitement of competing against the world’s best minds in their emerging field. Volinsky also loved movies, and both he and Bell, who also found his vocation in baseball stats, were excited about the chance to experiment with Netflix’s huge trove of real-world data—a set of customer ratings that was a hundred times larger than any they had ever seen.

  Bell had entered and won contests before the Netflix Prize, but the $1 million and the open-door nature of the competition—anybody with a PC and an Internet connection could enter—gave the contest a special allure. It quickly became a leading topic of conversation in the research and academic communities that Bell traveled in, and he relished the chance to see how he stacked up against his peers.

  About fifteen people showed up for a brainstorming session Volinsky organized shortly after the Netflix Prize was announced, but active members dwindled after a couple of weeks to just three—Bell, Volinsky, and their younger Israeli colleague, Yehuda Koren.

  At first they watched as the Netflix-sponsored leaderboard lit up with a couple of hundred solutions—at least two of which bettered Cinematch within a week. A month later there were several thousand teams, the best of which had wrung a 4 percent improvement over Cinematch using all-original solutions. The chase after the $1 million prize drew not just the elite of data mining, but also from the machine language and mathematics communities, as well as brilliant amateurs from software development, and even from psychology.

  Each team was limited to one submission per day, but a lively conversation was taking place all day and all night, as participants from all over the world signed on to the discussion board maintained by Netflix.

  For Koren, this informal conclave of brilliant minds homing in on the same problem was captivating. He spent hours at home and at work tinkering with their equations and trying to stay ahead of the surging progress on the leaderboard. Each adjustment of the equation could take a week or more of time stolen from regular work tasks—a day to write the proposed solution, several hours to run the enormous data set through powerful computers, more time to analyze the outcome and make adjustments, and another set of hours to run the data again. Each man found himself thinking of the contest at odd hours, perhaps waking in the night with an idea for an incremental improvement.

  They were ready to post their own entry on the leaderboard by the contest’s fourth month, as team BellKor. After Netflix used a confidential set of test data to verify their results, team BellKor entered the contest in the twentieth slot. From then on, Koren was obsessed, pushing Volinsky and Bell to try to drive their way up the leaderboard. Let’s see if we can get into the top ten, he’d say. Then, the top five, and the top three.

  In April 2007, they landed briefly in the top slot, only to be knocked out a few days later. For weeks they flirted with the lead against Dinosaur Planet, from Princeton, and Gravity, a team of four Hungarian researchers. BellKor again took the lead at the eight-month mark, and this time they held it. They collected the first fifty-thousand-dollar Progress Prize for an 8.4 percent improvement to Cinematch. The big prize seemed well within their grasp as they entered the contest’s second year.

  • • •

  WHEN NETFLIX’S FOUNDING software engineers, including Hastings, contemplated building a recommendation engine in 1999, their first approach was rudimentary and involved linking movies through common attributes: genre, actors, director, setting, happy or sad ending. As the film library grew, that method proved cumbersome and inaccurate, because no matter how many attributes they assigned each film, they could not capture why Pretty Woman was so different from, say, American Gigolo. Both were movies about prostitution set in a major U.S. city and starring Richard Gere, but they were unlikely to appeal to the same audiences.

  Early recommendation engines were unpredictable: In one famous gaffe, Walmart had to issue an apology and disable theirs after its Web site presented the film Planet of the Apes to shoppers looking for films related to Black History Month.

  Netflix’s software engineers next turned to a “nearest neighbor” algorithm, one with a focus on grouping customers together according to their tastes in movies, rather than associating the films with each other.

  By the time the Netflix Prize was announced, subscribers had made one billion ratings of sixty thousand movies and television shows—a rich data set but one whose subtleties were not being plumbed by Cinem
atch.

  BellKor and the other teams wrote their recommendation algorithms from scratch, and they experienced in a matter of months the learning curve that had taken Netflix years to traverse, and then they transcended it. The algorithms they created found eddies and whorls in the huge data set that were completely unfamiliar to Volinsky, Bell, and Koren. The algorithms analyzed the patterns created by the subscriber ratings and assigned its own descriptors to films that were richer and more subtle than labels like director, actor, and genre but had no real meaning to the human mind.

  For example, Bell noticed that the algorithm “learned” that subscribers who liked Woody Allen movies often cared only for particular types of film that Allen had made—perhaps made during a particular era of his career or in a peculiar setting—and did not recommend the director’s other works.

  Progress came slower in the contest’s second year, especially after BellKor divulged its methods in a paper required by Netflix Prize rules, and the team watched others come close to overtaking them by using their own methods. They became stuck at an 8.6 percent improvement over Cinematch.

  Toward the middle of the contest’s second year, Koren took a job with Yahoo! Research in Israel, and unsure of what his future contribution would be, pushed hard to try to solve the puzzle before he departed. Their momentum slowed to a half percent here and a tenth of a point there, so Bell and Volinsky turned to the leaderboard for fresh blood to propel them out of their doldrums.

  A new team, called Big Chaos—two young Austrian mathematicians who had built on BellKor’s first-year foundations and were soaring through the rankings—caught Bell and Volinsky’s attention. In a sort of scientific blind date, to see whether their approaches to the problem and their personalities would dovetail, Bell e-mailed the team—Andreas Toscher and Michael Jahrer of Commendo Research—to explore the possibility of a hookup. The BellKor team felt assured by a series of e-mails that Toscher and Jahrer would hold nothing back, and they agreed over a transatlantic phone call to combine forces, as BellKor in Big Chaos.

 

‹ Prev