Picture your favorite supermarket. You go to the store with your shopping list in hand, a list similar to that in Figure 5.2. As you push your cart, you stop and grab items off the shelves that correspond to your list. When you have gotten everything on the list, you go to the front of the store, pay for the merchandise, and leave.
In this analogy, your shopping list corresponds to the software you want to run, and the speed at which you move through the store is your clock rate. The “program” terminates when you leave the store. The other correspondences depend on the type of microarchitecture.
Consider the venerable Intel 486 processor, an in-order, single-pipeline scalar microarchitecture with a cache but no branch prediction. The 486 shopper gets a cart, but no map of the store, and has to pick items off the shelf in the order they appear on the list (no looking ahead allowed). Lacking a branch predictor, the 486 shopper starts at the beginning of aisle 1 and linearly searches the entire store for item 1 on his list. When he finds item 1, he returns to the beginning of aisle 1 and repeats the process for item 2, and eventually for the remaining items on the list. Because the 486 had very slow clock rates, relative to the chips that came later, our 486 shopper not only traverses the aisles many times, but he does so quite slowly. He has no branch prediction table, so he learns nothing as he goes; he must start over in order to find each new item on the list. But he eventually does manage to find every item on his list and successfully exits the store.
The shopper who corresponds to the Pentium processor (also an in-order architecture) must search for items in 1, 2, 3, … , n order as well. However, because the Pentium is a two-pipe superscalar design, the Pentium shopper gets two shopping carts. Whenever the items on the shopping list happen to be near each other on the store shelves, the Pentium shopper can grab them both, one in each hand, and toss them into their respective carts. The Pentium also has a branch predictor, so after the shopper has traversed the store a time or two, he has constructed a kind of map and does not have to return to aisle 1 and linearly search the entire store for each new item. Finally, since the Pentium has a faster clock rate, he walks the aisles faster, and gets the job done much faster than the 486 shopper.
The shopper who corresponds to the P6 microarchitecture has a unique shopping experience. P6 is an out-of-order, three-way, superscalar, superpipelined machine with branch prediction and speculative execution. So the P6 shopper gets three independent shopping carts with three helpers to push them. Each helper gets a copy of the shopping list, but this time can grab items in any order that is convenient (because the architecture is out-of-order). The helpers know where everything is because of P6’s advanced branch prediction, and they all wear special running shoes because the P6 clock rate is much higher than the Pentium’s. But they can also grab items off the shelf and put them into their cart before they have even read the whole shopping list; that is the speculative execution. The shopper sends the helpers off to load their carts while she stands at the front of the store. When the helpers have finished their lists, she checks their carts before paying, in the likely case that they got creative and brought something she did not want.
Fiaure 5.2. The uncanny correspondence between shopping lists and computer programs.
When I first presented the “shopping cart analogy” to our executives, I was a bit worried that they might feel it was oversimplified or condescending. But they loved it. When they heard about this analogy, our marketing department also loved it, and one of them produced a graphical simulation of the whole thing. Eventually, P6 marketing put this simulation on Intel’s Web site, where it stayed until a Pentium marketing person saw it and realized that, although it made P6 look great, it did so at the expense of Pentium, which was still being sold.
Our marketing folks also used this simulation to explain P6 to outside groups, including a TV station’s camera crew, who got completely carried away and proceeded to go to a local supermarket where they staged the whole thing. I do not think their version really got the point across, though; you had to see the items being found and crossed off the shopping list, and remember the equivalent rates for the other two microarchitectures, to reach the right conclusions. The sheer spectacle of TV reporters running around throwing random store items into their carts was definitely entertaining, but even I had trouble seeing any relationship to what is in a microprocessor.
Interestingly, I also showed the computer-based shopping cart simulation to groups of grade school children, and they enjoyed it to the same extent and for much the same reasons as our executives had. I leave it to the reader to glean any significance from that.
MANAGING TO THE NEXT PROCESSOR
Part of the production phase necessarily involves looking beyond the current project. Engineers love novelty, so they will not normally do the same thing over and over unless they are compelled to do so. The exception is the basis for much marketing folklore: engineers who will not stop polishing long enough to get the product to market. Typically, however, most of us look forward to skipping the mundane issues of speed paths, performance divots, and functional errata, along with the daily management browbeating to meet impossible schedules. The temptation can be overwhelming to chuck it all and jump into the marvelous new project that is welcoming fresh ideas.
The draw is intensified by the knowledge that a new design project is often marked by a kind of land rush. In 1889, the U.S. government sponsored a race in what is now Oklahoma. To settle the land, the government promised that homesteaders would own up to 160 acres after they had settled on and cultivated the land for at least five years. A new design is parceled out in a similar way: The functional blocks are up for grabs and get locked down fairly quickly. The way in which various engineers “settle” a design’s various parts has a profound effect on the eventual design and on job satisfaction over the next several years.
An experienced engineer-the kind you want working on the preproduction team-is likely to be anxiously watching the rest of the team driving their mules at a furious pace across Oklahoma’s landscape, while trying without much enthusiasm to resolve the tricky issues of finishing the “old” chip. Ultimately, these faithful few must trust project leadership to reserve some homesteads for them. Those leaders would be wise to honor that trust.
THE WINDOWS NT SAGA
There was a time when you could design a computer without using a computer to do it, but that time was well before the P6 era. For the P6, the question was which computers to use. With only a few senior engineers on the project, it was relatively simple to try a few engineering workstations, and then ask management to standardize on the one we liked best. This turned out to be IBM’s RS6000 line, running their AIX version of Unix.
All computing platforms have their unique set of quirks and vagaries. We dealt with the RS6K’s by dedicating a set of very talented software engineers to serve as the design team’s first line of defense. These folks were invaluable at figuring out when an unexpected problem was a designer’s own pilot error, the fault of the tools being used, a design flaw in the workstation or its operating system, or some combination of these possibilities.
As we were entering the home stretch of P6 development and beginning to plan the follow-on Willamette project, the question of what engineering workstation to use became an issue.
I argued that engineers ought to use their own designs whenever possible because they can then devote themselves to their creations in a way that abstract professionalism does not quite reach. An engineer who worked on the controls for the jet engines on the Boeing 747 told me that he and a colleague went along on that engine’s first flight. A military pilot once told me that a parachutist commonly packs his own chute, and when he can’t, the person who did pack it has to jump, too, using a randomly chosen chute. Both these practices tend to focus the practitioner’s full attention on the task and to expose that person to any flaws in its execution. Errors can still be made under such circumstances but, surely, the odds of careless mistakes
being made will be minimized.
At the P6 project’s outset, no Intel microprocessor-based workstation was suitable for use by an industrial design team. The P6 itself turned out to be a formidable performer for engineering design, however, and I wanted the Willamette design team to “eat their own dog food.”
Unfortunately, Intel was a newcomer to the workstation market in the mid-1990s and although the traditional desktop PC vendors were offering very reasonable workstation designs, we saw no operating system as the obvious right choice. Our design team had lived in a Unix environment for so long and so successfully that they believed the only thing worth arguing about was the Unix vendor.
But Intel upper management had other concerns. We were told on several occasions that the entire design tool chain-both Intel-developed and Intel-licensed-would migrate to Microsoft’s Windows NT as soon as possible. Moreover, management was worried that no credible Unix vendor would listen to Intel’s complaints or entreaties when we needed special features or fast bug fixes. Intel managers also profoundly distrusted the emerging Internet model in which seconds after you posted a question about a bug, 20 responses would appear, 18 of which were blatantly wrong, but two of which were potentially quite useful.
So we Willamette project managers were given a choice. Either we could go with Unix, but from a standard vendor with its own self-supported Unix, or we could go with any reasonable x86/P6 hardware platform vendor, but run Windows NT. The only standard vendors with their own Unix at that time were IBM, HewlettPackard, Sun, and Silicon Graphics, and their machines were much more expensive than standard desktops.
With perfect hindsight, I wish we had chosen a Unix-based solution, any Unix-based solution. Instead, we chose to switch to Windows NT, the start of a struggle that grew worse with time.
It was not that Windows NT was substantially less stable than the competing Unix alternative. It may or may not have been, but the real problem was primarily that the entire design team had spent many years creating and maintaining software tools under Unix and no time at all doing the equivalent under Windows. In a misguided and ultimately counterproductive effort to make Unix folks feel at home in Windows, our tools group created a range of tools, shells, and alias lists so that we could migrate the hundreds of design tools from Unix to Windows. What we did not anticipate was that, for many reasons, it is better to rewrite tools from scratch when migrating them to Windows from Unix. Instead, we ended up wielding these seductive tools to get a quickie port on Windows, only to discover that they were of the same flaky consistency as grandmother’s best dinner rolls.
This list conveys a flavor of what these quickie ports from hell collectively evinced:
1. A bug sighting is reported to the software tools team. A software wizard is dispatched to reproduce the problem, which we will call Problem A.
2. The software expert wrestles with the bug report and, after playing with the various reported facts, believes she has successfully reproduced Problem A. Remember that she has had access to the operating-system source code all her career, but with Windows NT she does not, so she can only guess at any bugs or limitations it might be causing.
3. With her understanding of the bug, she proposes a fix, usually a change to the quickie-port scripts and tools. She tests the fixed script for a few days and becomes convinced that she has solved Problem A. Her fix is released to the general toolusing population.
4. A new problem, which we will call Problem B, is reported. Because it is seemingly unrelated to Problem A, Problem B goes to a different tools expert, who begins the same process outlined in (2) and (3), but, unbeknownst to him, Problem B is indeed a direct result of Problem A’s “fix.” Worse, Problem A was not really fixed; lack of access to the operating system source code caused the first expert to guess slightly wrong. She was right enough to make the symptoms of Problem A go away temporarily, but wrong enough that only minor changes to tool use or the dataset make Problem A come back. Meanwhile, the change to fix Problem A has now broken something that used to work.
Figure 5.3 gives you the graphical equivalent of this destructive cycle.
After a few more of these “coincidentally” timed bugs, the software tools folks began to suspect that something more deliberate was causing them, and they laboriously puzzled out more of what the real operating-system/tools interactions must be on the basis of observed behavior. They began to propose more refined fixes, and the tools gradually got less flaky.
Many people believe that if you throw a frog into a pot of boiling water, it will jump right back out, but if you put it into room temperature water and gradually heat it to boiling, it will stay in it until it is too late.3 That is how our forced migration away from Unix felt. Because the tools environment seemed to get gradually more reliable, we stuck to the plan, hoping that when the software-tools folks found and fixed the few really big bugs, our design tool chain would once again exhibit the overall level of reliability to which we had become accustomed. But it never did. Instead, the software tools folks just kept patching and guessing and trying to infer what was wrong. And as the development schedule marched on, they had less and less time to backtrack and replace major sections of the tool chain with code specifically written for the Windows NT environment.
Some of them also began exhibiting worrisome behavior. Whenever they saw me, they would point in their best Ghost of Christmas Future imitation, and in a hollow, eerie voice wail, “You did this to me! I hate you! I quit!” And I would gingerly talk them down from their Windows-induced psychosis. But upper management held firm-Windows NT was the future, and we simply had to adjust.
Later in the project, as the RTL matured, the validation team began using massive cycles for their validation exercises, and they found that the overall tools environment had matured to an asymptote that was an order of magnitude away from what they could live with. In classic Intel tradition, they did not ask permission, but simply brought up Linux on several hundred validation servers and reported whatever tools were needed. Within a few days, they were once again merrily running their tests and enjoying the kind of computing system stability we had not seen in several years.
And in the ultimate irony, the same management that made us walk the plank in this manner eventually bestowed the company’s highest technical achievement award on the validation and computing support personnel who had reverse-migrated us back to Linux. Another quintessentially Dilbertian moment in the annals of technology.
PRODUCT ROLLOUT
Like most large companies, Intel carefully stage-manages product rollouts. Senior marketing executives collect information from the technical people who created the product and combine that information with their own imaginations to come up with the glitzy extravaganzas you see at rollout affairs. Rollouts also require a certain awareness of what you should and should not say during interviews, which takes more training than you might think. (Or maybe just more than I had.)
Figure 5.3. Lack of access to source code forces guessing and more bugs.
On Stage with Andy Grove
Marketing rollouts of new high-tech products usually include selected product users and early adopters, whose job it is to say, “Without Intel’s latest and greatest processor, my life would be devoid of meaning,” and “Now that the Pentium Pro exists, my applications leap tall buildings in a single bound.” I am not criticizing them, of course; I went to Bill Daniels’ lectures (the human behavior expert cited in the section, “Awards, Rewards, and Recognition” in Chapter 4), and I am no different from anyone else who likes praise, regardless of the circumstances.
As Randy Steck and I were soaking up the adulation, even the less-than-spontaneous variety, Andy Grove invited us onto the stage. The applause that followed seemed genuine, and after nodding our thanks, we moved out of the spotlight and waited for the final fireworks.
They came a little earlier than I expected. During the audience question-and-answer session immediately after our appearance, Andy fielded the firs
t few questions, and then someone remarked, “Rumor has it that the Pentium Pro’s 16-bit performance is hardly better than Pentium’s. What caused a design gaffe like that?” Andy had no idea if the questioner’s premise was correct, and if it was, what the best answer might be. So he looked back at us.
My feet moved almost of their own volition, and I found myself standing in front of a microphone, squinting into very bright lights, with Andy Grove looking at me expectantly. I took a deep breath and began what I hoped was a spirited defense of our design choices in the matter of 16-bit performance. I do not remember if I inflicted the entire history of this choice on the questioner, but I do remember that my aim was to drive a stake through this question’s heart. When I got back to Oregon, many members of the team thanked me for my performance. I asked them if my answer made any sense, and they gallantly said, “It doesn’t matter. It was the conviction with which you spoke that counted.” It was a great experience, even if it made me feel at the time like a waterboy who had accidentally been sent up to bat against a major league pitcher.
What still irks me about that question is that it shows how little people understand design decisions. If events had transpired in a way that made the early P6’s 16-bit performance seriously deficient, then someone could legitimately criticize our collective judgment on this issue. But to insinuate that this choice was an unconscious, default decision, a simple oversight, is to cast aspersions on our competence as designers. Judgments that turn out wrong are crucially different from judgments that no one ever made, and that difference is what distinguishes great design teams from mediocre ones Great teams can make wrong calls, but they make all their calls based on the best information available at the time. It is this deliberate, informed decision making that stacks the odds in their favor.
The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips (Practitioners) Page 24