Shaw’s secretiveness regarding his firm’s trading strategies is legendary. Employees sign nondisclosure agreements, and even within the firm, knowledge about the trading methodology is on a need-to-know basis. Thus, in my interview, I knew better than to even attempt to ask Shaw explicit questions about his company’s trading approach. Still, I tried what I thought were some less sensitive questions:
What strategies were once used by the firm but have been discarded because they no longer work?
What fields of math would one have to know to develop the same strategies his firm uses?
What market anomalies that once provided trading opportunities have so obviously ceased to exist that all his competitors would be aware of the fact?
Even these circumspect questions were met with a polite refusal to answer. Although he did not use these exact words, the gist of Shaw’s responses to these various queries could be succinctly stated as: “I prefer not to answer on the grounds that it might provide some remote hint that my competitors could find useful.”
Shaw’s flagship trading program has been consistently profitable since it was launched in 1989. During its eleven-year life span, the program has generated a 22 percent average annual compounded return net of all fees while keeping risks under tight control. During this entire period, the program’s worst decline from an equity peak to a month-end low was a relatively moderate 11 percent—and even this loss was fully recovered in just over four months.
How has D. E. Shaw managed to extract consistent profits from the market for over a decade in both bullish as well as bearish periods? Clearly, Shaw is not talking—or at least not about the specifics of his company’s trading strategies. Nevertheless, based on what Shaw does acknowledge and reading between the lines, it may be possible to sketch a very rough description of his company’s trading methodology. The following explanation, which admittedly incorporates a good deal of guesswork, is intended to provide the reader with a flavor of Shaw’s trading approach.
We begin our overview with classic arbitrage. Although Shaw doesn’t use classic arbitrage, it provides a conceptual starting point. Classic arbitrage refers to the risk-free trade of simultaneously buying and selling the same security (or commodity) at different prices, therein locking in a risk-free profit. An example of classic arbitrage would be buying gold in New York at $290 an ounce and simultaneously selling the same quantity in London at $291. In our age of computerization and near instantaneous communication, classic arbitrage opportunities are virtually nonexistent.
Statistical arbitrage expands the classic arbitrage concept of imultaneously buying and selling identical financial instruments for a locked-in profit to encompass buying and selling closely related financial instruments for a probable profit. In statistical arbitrage, each individual trade is no longer a sure thing, but the odds imply an edge. The trader engaged in statistical arbitrage will lose on a significant percentage of trades but will be profitable over the long run, assuming trade probabilities and transaction costs have been accurately estimated. An appropriate analogy would be roulette (viewed from the casino’s perspective): The casino’s odds of winning on any particular spin of the wheel are only modestly better than fifty-fifty, but its edge and the laws of probability will assure that it wins over the long run.
There are many different types of statistical arbitrage. We will focus on one example: pairs trading. In addition to providing an easy-to grasp illustration, pairs trading has the advantage of reportedly being one of the prime strategies used by the Morgan Stanley trading group, for which Shaw worked before he left to form his own firm.
Pairs trading involves a two-step process. First, past data are used to define pairs of stocks that tend to move together. Second, each of these pairs is monitored for performance divergences. Whenever there is a statistically meaningful performance divergence between two stocks in a defined pair, the stronger of the pair is sold and the weaker is bought. The basic assumption is that the performance of these closely related stocks will tend to converge. Insofar as this theory is correct, a pairs trading approach will provide an edge and profitability over the long run, even though there is a substantial chance that any individual trade will lose money.
An excellent description of pairs trading and the testing of a specific strategy was contained in a 1999 research paper written by a group of Yale School of Management professors.* Using data for 1963–97, they found that the specific pairs trading strategy they tested yielded statistically significant profits with relatively low volatility. In fact, for the twenty-five-year period as a whole, the pairs trading strategy had a higher return and much lower risk (volatility) than the S&P 500. The pairs trading strategy, however, showed signs of major deterioration in more recent years, with near-zero returns during the last four years of the survey period (1994–97). A reasonable hypothesis is that the increased use of pairs-based strategies by various trading firms (possibly including Shaw’s) drove down the profit opportunity of this tactic until it was virtually eliminated.
What does Shaw’s trading approach have to do with pairs trading? Similar to pairs trading, Shaw’s strategies are probably also based on a structure of identifying securities that are underpriced relative to other securities. However, that is where the similarity ends. A partial list of the elements of complexity that differentiate Shaw’s trading methodology from a simple statistical arbitrage strategy, such as pairs trading, include some, and possibly all, of the following:
Trading signals are based on over twenty different predictive techniques, rather than a single method.
Each of these methodologies is probably far more sophisticated than pairs trading. Even if performance divergence between correlated securities is the core of one of these strategies, as it is for pairs trading, the mathematical structure would more likely be one that simultaneously analyzes the interrelationship of large numbers of securities, rather than one that analyzes two stocks at a time.
Strategies incorporate global equity markets, not just U.S. stocks.
Strategies incorporate equity related instruments—warrants, options, and convertible bonds—in addition to stocks.
In order to balance the portfolio so that it is relatively unaffected by the trend of the general market, position sizes are probably adjusted to account for factors such as the varying volatility of different securities and the correlations among stocks in the portfolio.
The portfolio is balanced not only to remove the influence of price moves in the broad stock market, but also to mitigate the influence of currency price swings and interest rate moves.
Entry and exit strategies are employed to minimize transaction costs.
All of these strategies and models are monitored simultaneously in real time. A change in any single element can impact any or all of the other elements. As but one example, a signal by one predictive technique to buy a set of securities and sell another set of securities requires the entire portfolio to be rebalanced.
The trading model is dynamic—that is, it changes over time to adjust for changing market conditions, which dictate dropping or revising some predictive techniques and introducing new ones.
I have no idea—and for that matter will never know—how close the foregoing description is to reality. I think, however, that it is probably valid as far as providing a sense of the type of trading done at D. E. Shaw.
Shaw’s entrepreneurial bent emerged at an early age. When he was twelve, he raised a hundred dollars from his friends to make a horror movie. Since he grew up in the L.A. area, he was able to get other kids’ parents to provide free help with tasks such as special effects and editing. The idea was to show the movie to other kids in the neighborhood for a 50-cent admission charge. But the plan went awry when the processing lab lost one of the rolls of film. When he was in high school, he formed a company that manufactured and sold psychedelic ties. He bought three sewing machines and hired high school students to manufacture the ties. The venture failed because he hadn’
t given much thought to distribution, and going from store to store proved to be an inefficient way to market the ties.
His first serious business venture, however, was a success. While he was at graduate school at Stanford, he took two years off to start a computer company that developed compilers [computer code that translates programs written in user languages into machine language instructions]. Although this venture was very profitable, Shaw’s graduate school adviser convinced him that it was not realistic for him to earn his Ph.D. part-time while running a company. Shaw sold the company and completed his Ph.D. work at Stanford. He never considered the alternative of staying with his entrepreneurial success and abandoning his immediate goal of getting a Ph.D. “Finishing graduate school was extremely important to me at the time,” he says. “To be taken seriously in the computer research community, you pretty much had to be a faculty member at a top university or a Ph.D.-level scientist at a leading research lab.”
Shaw’s doctoral dissertation, “Knowledge Based Retrieval on a Relational Database Machine,” provided the theoretical basis for building massively parallel computers. One of the pivotal theorems in Shaw’s dissertation proved that, for an important class of problems, the theoretical advantage of a multiple processor computer over a single processor computer would increase in proportion to the magnitude of the problem. The implications of this theorem for computer architecture were momentous: It demonstrated the inevitability of parallel processor design vis-à-vis single processor design as the approach for achieving major advances in supercomputer technology.
Shaw has had enough accomplishments to fulfill at least a half dozen extraordinarily successful careers. In addition to the core trading business, Shaw’s firm has also incubated and spun off a number of other companies. Perhaps the best-known of these is Juno Online Services, the world’s second-largest provider of dial-up Internet services (after America Online). Juno was launched as a public company in May 1999 and is traded on Nasdaq (symbol: JWEB). D. E. Shaw also developed DESoFT, a financial technology company, which was sold to Merrill Lynch, an acquisition that was pivotal to the brokerage firm’s rollout of an on-line trading service. FarSight, an on-line brokerage firm, and D. E. Shaw Financial Products, a market-making operation, were other businesses developed at D. E. Shaw and subsequently sold.
In addition to spawning a slew of successful companies, D. E. Shaw also has provided venture capital funding to Schrödinger Inc. (for which Shaw is the chairman of the board of directors) and Molecular Simulations Inc., two firms that are leaders in the development of computational chemistry software. These investments reflect Shaw’s strong belief that the design of new drugs, as well as new materials, will move increasingly from the laboratory to the computer. Shaw predicts that developments in computer hardware and software will make possible a dramatic acceleration in the timetable for developing new drugs, and he wants to play a role in turning this vision into reality.
By this time, you may be wondering how this man finds time to sleep. Well, the paradox deepens, because in addition to all these ventures, Shaw has somehow found time to pursue his political interests by serving on President Clinton’s Committee of Advisors on Science and Technology and chairing the Panel on Educational Technology.
The reception area at D. E. Shaw—a sparsely furnished, thirty-one-foot cubic space, with diverse rectangular shapes cut out of the walls and backlit by tinted sunlight reflected off of hidden color surfaces—looks very much like a giant exhibit at a modern art museum. This bold, spartan, and futuristic architectural design is, no doubt, intended to project the firm’s technological identity.
The interview was conducted in David Shaw’s office, a spacious, high-ceilinged room with two adjacent walls of windows opening to an expansive view to the south and west of midtown Manhattan. Shaw must be fond of cacti, which lined the windowsills and included a tree-size plant in the corner of the room. A large, irregular-polygon-shaped, brushed aluminum table, which served as a desk on one end and a conference area on the other, dominated the center of the room. We sat directly across from each other at the conference end.
* * *
You began your career designing supercomputers. Can you tell me about that experience?
From the time I was in college, I was fascinated by the question of what human thought was—what made it different from a computer. When I was a graduate student at Stanford, I started thinking about whether you could design a machine that was more like the brain, which has huge numbers of very slow processors—the neurons—working in parallel instead of a single very fast processor.
Were there any other people working to develop parallel supercomputers at that time?
Although there were already a substantial number of outstanding researchers working on parallel computation before I got started, most of them were looking at ways to connect, say, eight or sixteen processors. I was intrigued with the idea of how you could build a parallel computer with millions of processors, each next to a small chunk of memory. There was a trade-off, however. Although there were a lot more processors, they had to be much smaller and cheaper. Still, for certain types of problems, theoretically, you could get speeds that were a thousand times faster than the fastest supercomputer. To be fair, there were a few other researchers who were interested in these sorts of “fine-grained” parallel machines at the time—for example, certain scientists working in the field of computer vision—but it was definitely not the dominant theme within the field.
You said that you were trying to design a computer that worked more like the brain. Could you elaborate?
At the time, one of the main constraints on computer speed was a limitation often referred to as the “von Neumann bottleneck.” The traditional von Neumann machine, named after John von Neumann, has a single central processing unit (CPU) connected to a single memory unit. Originally, the two were well matched in speed and size. Over time, however, as processors became faster and memories got larger, the connection between the two—the time it takes for the CPU to get things out of memory, perform the computations, and place the results back into memory—became more and more of a bottleneck.
This type of bottleneck does not exist in the brain because memory storage goes on in millions of different units that are connected to each other through an enormous number of synapses. Although we understand it imperfectly, we do know that whatever computation is going on occurs in close proximity to the memory. In essence, the thinking and the remembering seem to be much more extensively intermingled than is the case in a traditional von Neumann machine. The basic idea that drove my research was that if you could build a computer that had a separate processor for each tiny chunk of memory, you might be able to get around the von Neumann bottleneck.
I assume that the necessary technology did not yet exist at that time.
It was just beginning to exist. I completed my Ph.D. in 1980. By the time I joined the faculty at Columbia University, it was possible to put multiple processors, but very small and simple ones, on a single chip. Our research project was the first one to build a chip containing a number of real, multibit computers. At the time, we were able to place eight 8-bit processors on a single chip. Nowadays, you could probably put 512 or 1,024 similar processors on a chip.
Cray was already building supercomputers at the time. How did your work differ from his?
Seymour Cray was probably the greatest single-processor supercomputer designer who ever lived. He was famous for pushing the technological envelope. With each new machine he built, he would use new types of semiconductors, cooling apparatus, and wiring schemes that had never been used before in an actual computer. He was also a first-rate computer architect, but a substantial part of his edge came from a combination of extraordinary engineering skills and sheer technological audacity. He had a lot more expertise in high-speed technology, whereas my own focus was more on the architecture—designing a fundamentally different type of computer.
You mentioned earlier that your
involvement in computer design had its origins in your fascination with human thought. Do you believe it’s theoretically possible for computers to eventually think?
From a theoretical perspective, I see no intrinsic reason why they couldn’t.
So Hal in 2001 is not pure science fiction.
It’s hard to know for sure, but I personally see no compelling reason to believe that this couldn’t happen at some point. But even if it does prove feasible to build truly intelligent machines, I strongly suspect that this won’t happen for a very long time.
But you believe it’s theoretically possible in the sense that a computer could have a sense of self?
It’s not entirely clear to me what it would mean for a computer to have a sense of self, or for that matter, exactly what we mean when we say that about a human being. But I don’t see any intrinsic reason why cognition should be possible only in hydrocarbon-based systems like ourselves. There’s certainly a lot we don’t understand about how humans think, but at some level, we can be viewed as a very interesting collection of highly organized, interacting molecules. I haven’t yet seen any compelling evidence to suggest that the product of human evolution represents the only possible way these molecules can be organized in order to produce a phenomenon like thought.
Did you ever get to the point of applying your theoretical concepts to building an actual working model of a supercomputer?
Stock Market Wizards Page 28