More Money Than God_Hedge Funds and the Making of a New Elite
Page 35
Deciding when to buy the stock-market index—or a currency or oil future—was only part of the challenge. The next question was how much to bet on each position. Short-term trading systems like the one that powered Jim Simons’s Medallion Fund also confronted this problem, but in a different way: Because they were operating on short time frames, they risked moving the price against them if they traded suddenly and hard, so they calculated how much they could bet without destroying their own profits. But Wadhwani was creating a system to trade liquid markets over a longer horizon; he had time to build a position to whatever size he liked without moving the price adversely. The limiting factor was the risk he was prepared to take. If he bet too little, he would leave money on the table; if he bet too much, he would risk insolvency. To all great human traders, knowing when to go for the jugular and when to be patient is a large part of the skill; spotting the best opportunities and betting big could make a greater contribution to the bottom line than increasing the share of bets that you were right on. Wadhwani’s models tackled this problem by assigning particular trades a “z score”: The greater the confidence of winning, and the higher the likely payoff from a win, the more the system would bet on a position.
As Wadhwani progressed with his modeling, he could see that it was not just the sizing of trades that needed to be determined flexibly. The type of trades needed to change in different environments. In moments of turbulence, animal spirits mattered more than in calm times, so the computer system needed to weight measures of sentiment from the options market more heavily.17 Equally, when the economy entered a recession, each negative data point was likely to have a larger impact on financial markets than the previous one. A bad employment number might hurt stocks slightly in good times but a lot more in a downturn. Most fundamentally, a serious effort at computerized investing needed to be based on more than one program. In stable times, LTCM-style arbitrage could pay off well: You needed a program that bet on price anomalies disappearing. In unstable times, arbitrage was dangerous: You needed a trend-following program. The ideal, Wadhwani realized, was to devise ways of shifting between these strategies automatically.
Wadhwani left Tudor for the Bank of England in 1999, before he had had time to build out his vision. A first version of his program, a trend-following system called Techno-Fundamentals, ran money successfully from the end of 1997, but the job of creating a system that would shift according to the market environment remained uncompleted. Wadhwani returned to the task when he set up his own hedge fund, Wadhwani Asset Management, in London in 2002, and meanwhile, Tudor’s program trading continued to develop. By 2008 Paul Jones’s firm had more than fifty people working on its computerized trading, and their algorithms were driving more than $3 billion of Tudor’s $17 billion capital.18 The rock-and-roller with the Bruce Willis sneakers had accomplished quite a transformation.
And yet, just as with D. E. Shaw, there were limits to this achievement. The systems that Tudor created were not as original as those developed by James Simons’s team at Renaissance Technologies. The fact that Tudor’s system was built by an economist from Goldman Sachs and based partly on the instincts of a trader from Goldman Sachs was revealing: No matter how brilliant Wadhwani and Heffernan might be, they came from the heart of the financial establishment, and other parts of that establishment were likely to hatch strategies that were at least somewhat similar. A handful of prized experts moved among a handful of firms. Each move reduced the odds that any single firm would build a unique system.
Meanwhile, Simons plowed his own road. He hired established scientists and mathematicians, not the young quants that Shaw favored and certainly not Wall Street veterans.19 He limited cross-fertilization with rivals by locating his operation in Long Island, away from the hedge-fund heartlands of New York, Greenwich, and London. He had no use for ideas that came from academic finance: For a while the faculty in East Setauket plowed through the academic finance journals and met weekly to discuss the latest articles, but then it abandoned this as fruitless. The Renaissance researchers built systems that were in a class of their own. “I can only look at them and realize that you have the gods of the business and then you have mere mortals like me,” Wadhwani said, echoing the view of the entire industry.20
IN 1993 SIMONS MADE TWO IMPORTANT ADDITIONS TO HIS brain trust: Peter Brown and Robert Mercer. They came from IBM’s research center, and they drove much of the success of Medallion over the next years, eventually taking the reins when Simons opted for retirement. The two men complemented each other well. Brown was a magnesium flare of energy: He slept five hours per night, riffed passionately on every topic of the day, and for a while got around the office on a unicycle. Mercer was the calm half of the duo: He was an icy cold poker player; he never recalled having a nightmare; his IBM boss jokingly called him an automaton. Before arriving at Renaissance, Brown and Mercer had worked a little on cryptography, but their real achievement lay elsewhere. They had upended a related field—that of computerized translation.
Until Brown and Mercer decided to take on translation, the subject was dominated by programmers who actually spoke some foreign languages. The approach was to understand the language from the inside, to know its grammar and its syntax, and to teach the computer that “la fille” means “the girl” and “les filles” is the plural form, much as you might teach a middle schooler. But Brown and Mercer had a different method. They did not speak French, and they were not about to wade into its syntax or grammar. Instead, they got hold of Canada’s parliamentary records, which contain thousands of pages of paired passages in French and English. Then they fed the material into an IBM workstation and told it to figure out the correlations.
Unlike the work that Brown and Mercer later did at Renaissance, their experiment at IBM was written up and published.21 It began with some scrubbing of data: Just as financial-market price histories must be checked for “bad tics”—places where a sale is reported at $16 instead of $61—so the Canadian Hansard contained misprinted words that might confuse a translation program. Next, the computer began to search the data for patterns. For all it knew at the outset, a given English word was equally likely to be translatable into any of the fifty-eight thousand French words in the sample, but once the computer had checked through the twinned passages, it found that most English words appeared in only some: Immediately, nearly 99 percent of the uncertainty was eliminated. Then the computer proceeded with a series of more subtle tests; for example, it assumed that an English word was most likely to correspond to a French word that came in the same position in the sentence. By now some word pairs were starting to appear: Couplings such as lait/milk and pourquoi/why shouted from the data. But other correlations spoke in a softer voice. To hear them clearly, you had to comb the data multiple times, using slightly different algorithms at each turn. “Only in this way can one hope to hear the quiet call of marqué d’un asterisque/starred or the whisper of qui s’est fait bousculer/embattled,” Brown and Mercer reported.
To the code crackers at the Institute for Defense Analyses, this method would not have seemed surprising.22 Indeed, Brown and Mercer used a tool called the “expectations maximization algorithm,” and they cited its inventor, Leonard Baum—this was the same Leonard Baum who had worked for IDA and then later for Simons.23 But although the idea of “statistical machine translation” seemed natural to the code breakers, it was greeted with outrage by traditional translation programmers. A reviewer of the Brown-Mercer paper scolded that “the crude force of computers is not science,” and when the paper was presented at a meeting of translation experts, a listener recalled, “We were all flabbergasted…. People were shaking their heads and spurting grunts of disbelief or even of hostility.” “Where’s the linguistic intuition?” the audience wanted to know—to which the answer seemed to be, “Yes that’s the point; there isn’t any.” Fred Jelinek, the IBM manager who oversaw Brown and Mercer, poured salt into the wounds. “Every time I fire a linguist, my system’s performance impro
ves,” he told the naysayers.24
By the time Brown and Mercer joined Renaissance in 1993, the skeptics were capitulating. Once the IBM team’s program had figured out the sample passages from the Canadian Hansard, it could translate other material too: If you presented it with an article in a French newspaper, it would zip through its database of parliamentary speeches, matching the article’s phrases with the decoded material. The results outclassed competing translation systems by a wide margin, and within a few years the advent of statistical machine translation was celebrated among computer scientists as something of an intellectual revolution.25 Canadian political rhetoric had proved more useful than suspected hitherto. And Brown and Mercer had reminded the world of a lesson about artificial intelligence.
The lesson concerned the difference between human beings and computers. The early translation programs had tried to teach computers vocabulary and grammar because that’s how people learn things. But computers are better suited to a different approach: They can learn to translate between English and French without paying much attention to the rules of either language. Computers don’t need to understand verb declensions or adjectival inflections before they approach a pile of political speeches; they prefer to get the speeches first, then penetrate their code by combing through them algorithmically. Likewise, computers have no trouble committing millions of sentences to memory; they can learn languages in chunks, without the crutch of grammatical rules that human students use to prompt their memories. For example, a computer can remember the English translations for phrases such as “la fille est intelligente, les filles sont intelligentes,” and a dozen other variations besides; they do not necessarily need to understand that “fille” is the singular form of “filles,” that “est” and “sont” are different forms of the verb “être,” and so on.26 Contrary to the harrumphing of the IBM team’s critics, the crude force of a computer’s memory can actually substitute for human notions of intelligence and science. And computers are likely to work best when they don’t attempt to reach results in the way that humans would do.
What clues might this hold about Medallion’s performance? Quite possibly, none: Again, the reasons for the fund’s spectacular success are secret. But it’s clear that the way Brown and Mercer approached programming was fundamentally different from the way other hedge-fund programmers thought about it. At Tudor, for example, Sushil Wadhwani trained a machine to approach markets in a manner that made sense for human traders. By contrast, Brown and Mercer trained themselves to approach problems in a manner that made sense for a computer. At D. E. Shaw, the approach was frequently to start with theories about the market and to test them against the data. By contrast, Brown and Mercer fed the data into the computer first and let it come up with the answers. D. E. Shaw’s approach recalls the programmers who taught computers French grammar. The Brown-Mercer approach resembles that of code crackers, who don’t have the option of starting with a grammar book. Presented with apparently random data and no further clues, they sift it repeatedly for patterns, exploiting the power of computers to hunt for ghosts that to the human eye would be invisible.
Renaissance’s quantitative rivals have reason to avoid ghost hunting. The computer may find fake ghosts—patterns that exist for no reason beyond chance, and that consequently have no predictive value. Eric Wepsic, who runs statistical arbitrage at D. E. Shaw, gives the example of the Super Bowl: It used to be said that if a team from the original National Football League won, the market would head upward. As a matter of statistics, this relationship might hold; but as a matter of common sense, it is a meaningless coincidence. Because of the threat from coincidental correlations masquerading as predictive signals, Wepsic suggests that it is often dangerous to trade on statistical evidence unless it can be intuitively explained. In the 1990s, for example, D. E. Shaw’s systems began to detect curious correlations between previously unrelated stocks—cable companies, media companies, and consumer electronics firms all seemed to be responding to a strange new force field. On the basis of this evidence alone, Shaw’s team would have been inclined to dismiss the correlations as a statistical fluke. But once the firm realized that the correlations made intuitive sense—they reflected the technology euphoria that had pushed into all these industries—they seemed more likely to be tradable.27 Moreover, signals based on intuition have a further advantage: If you understand why they work, you probably understand why they might cease to work, so you are less likely to keep trading them beyond their point of usefulness. In short, Wepsic is saying that pure pattern recognition is a small part of what Shaw does, even if the firm does some of it.
Again, this presents a contrast with Renaissance. Whereas D. E. Shaw grew out of statistical arbitrage in equities, with strong roots in fundamental intuitions about stocks, Renaissance grew out of technical trading in commodities, a tradition that treats price data as paramount.28 Whereas D. E. Shaw hired quants of all varieties, usually recruiting them in their twenties, the crucial early years at Renaissance were largely shaped by established cryptographers and translation programmers—experts who specialized in distinguishing fake ghosts from real ones. Robert Mercer echoes some of Wepsic’s wariness about false correlations: “If somebody came with a theory about how the phases of Venus influence markets, we would want a lot of evidence.” But he adds that “some signals that make no intuitive sense do indeed work.” Indeed, it is the nonintuitive signals that often prove the most lucrative for Renaissance. “The signals that we have been trading without interruption for fifteen years make no sense,” Mercer explains. “Otherwise someone else would have found them.”29
BY THE LATE 2000S THE RENAISSANCE RESEARCH EFFORT had long since outgrown the rented premises in the Long Island High Technology Incubator building. Simons had moved the faculty to a campus with a gym and lighted tennis courts, a pond with bulbous gold-fish, and a big skylight in the entrance hall that splashed sun onto a slate staircase. The place felt like an upmarket science facility—comfortable, low-key, eerily clean—and on the door of one office along an antiseptic corridor, somebody had stuck an article with the title “Why Most Published Research Findings Are False.” The windowless rooms that housed racks of computer servers were guarded with elaborate key systems, but the facility’s most striking feature was its openness. Whereas other quantitative hedge funds enforced fierce internal Chinese walls, doling out information to employees on a need-to-know basis in an effort to protect secrets, the atmosphere at Renaissance was altogether different. The scientists roamed the corridors freely, constrained only by the danger that Peter Brown would crash into them on his unicycle. Mirrors had been positioned at critical corners so you could see if Brown was coming.
Simons believed passionately in this open atmosphere. Like the Institute for Defense Analyses, his operation was closed to outsiders in order to protect secrets, yet open on the inside so as to promote teamwork. On Tuesday mornings on the Renaissance campus, the entire faculty of ninety or so PhDs would gather for what they called the Big Meeting. Every refinement to Medallion’s trading program began with a presentation at one of these sessions: A researcher would explain his idea, complete with simulations showing how it would blend in with the other signals already in the system; then he would answer questions. A colleague might ask how the proposed signal would have fared during the LTCM crisis; another might wonder how it would have performed during a period of low volatility. In the days after the Big Meeting, the scientists were free to wander into the proponent’s room and ask follow-up questions. At the end of this peer-review period, a Small Meeting would take place: This time only those scientists who still had questions would show up, and Brown and Mercer would decide whether to give the green light. Then there was one final check. Henry Laufer, the veteran ghost hunter from the 1980s, retained the title of chief scientist and the right of veto.
Simons had devised a compensation system to reinforce this culture of teamwork. The researchers’ pay was linked to the profits of the firm, not t
o the narrower results of some subunit. Collaboration was written into the firm’s technology infrastructure as well. At IBM, Brown and Mercer had created a system on which multiple programmers could work simultaneously, and they repeated this trick at Renaissance; a researcher could even adapt the in-house programming language in order to express a new idea—in computing, as in everyday speech, neologisms can be useful.30 Into this collaborative architecture the faculty fed the reams of data that modern society generates. The more finance went global, the more statistics from foreign markets were fed into the system. The more business went digital, the more new data became available—e-commerce sales, Web-surfing habits, and so on. The computerization of finance created a vast information windfall. In the old days, it had been possible to track a stock price trade by trade. Now it was possible to see each bid and offer for each stock—including those that never got consummated. The more the possibilities expanded, the more they exceeded the reach of a few minds. But the collaborative faculty at Renaissance could manage this complexity and thrive on it.
The firm’s culture of teamwork involved a risk, however. It presumed that no member of the team would leave with the trading secrets and set up a rival. Like “The Firm” in John Grisham’s novel, Renaissance thought carefully about the matter of employee loyalty. It instructed job applicants that, if they joined Renaissance, they could never work elsewhere in the financial industry; it generally did not hire from Wall Street partly because anyone who left one team of traders might later choose to leave a second one. To enforce the noncompete and nondisclosure agreements that researchers were made to sign, they were required to invest a fifth of their pay in the Medallion Fund, and the money was locked up as a sort of bail payment for four years after they departed. And of course it helped that the firm was based in the quaint town of East Setauket, miles from its competitors. Once a researcher installed his kids in local schools, he didn’t want to go anywhere.