The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips (Practitioners)

Page 18

by Robert P. Colwell

Another reason for the metric’s substantial subjectivity had to do with managing your manager. I believe there is a generally well-placed but occasionally extremely dangerous penchant within Intel to insist on quantifiable data only. It is simply corporate culture that if someone asks, “Is this chip’s performance projection on track?” the preferred answer has the form, “A best-fit linear extrapolation of the last 6 weeks of data indicates a probability of 0.74 with a standard deviation of 0.61,” not, “The indicators say we’re marginally on track, but my instincts and experience say there’s a problem here.”

Management wanted to know when the chip would be ready to tape out, and they did not want to hear that the chip would tape out when the design team, the project leaders, and the validation team all agreed it was ready. They wanted a mechanical way of checking that all pieces were trending toward success. Then they could confidently reserve the right to “shoot the engineers” (see the section “Of Immediate Concern” in Chapter 5) and unilaterally declare tapeout.

Health and the Tapeout Target

The problem is that judging the health of a chip design database is not so easy. You can pick a metric at the project’s beginning and then measure against it every week, but the metric itself is subject to revision as the weeks roll on, and it is not easy to go back, add something to the metric, and extract the necessary data from the old records. Basically, whatever metric you pick at the beginning is what you are stuck with throughout. You can revise the weightings, but even that is problematical.

The RTL model achieves its full functionality only toward the end of the development cycle, and the validation team can no-holds-barred test only that mature model. When a bug is found and fixed, you should assume that the fix may break something that used to work, but you cannot repeat every test ever run on the project. Again, judgment is required to know how much additional testing is appropriate, given particular design errata.

We picked a score of 95 as our tapeout target, knowing that upper management could eventually hold this score against us. As big, expensive chip development projects near completion, a kind of tapeout frenzy tends to break out in the design team as well as across the management chain. On the plus side, it inspires all concerned to do whatever it takes to finish the job. On the minus side it encourages management to sometimes dis count the opinions of technical people, especially opinions that they do not want to hear, such as, “This chip is still too buggy. It needs at least three more weeks of testing.” It is a management truism that a “shoot the engineers” phase of any design project is necessary, because without it engineers will continue polishing and tweaking well past the point of diminishing returns. By picking a fairly lofty target, we hoped we were placing it sufficiently out of reach so that we would never face the problem of having management wave our indicator at us and say, “You said this score was good enough to tape out, so you have no right to make it any higher now. Tape this thing out.”

Metric Doldrums

Our HOTM metric did not behave as intended. We had hoped that if we weighted the five indicators properly, the overall score would start low and climb linearly week by week toward the final score that would suggest our new RTL was tapeout ready. What actually happened was that the overall score did start low and climb for awhile, but then it stubbornly parked for many weeks at an intermediate value that seemed much too low for anyone to accept as being of tapeout quality.

After several weeks of watching the HOTM score languish, I began upping the pressure on my validation manager, Bob Bentley. Bob patiently but firmly reminded me of all the pieces built into the metric and showed me how the RTL status and new changes were affecting each one. That made sense in isolation, but we had created this metric so that we could feel comforted as the model’s quality visibly climbed toward acceptability, and now that it wasn’t climbing, my comfort level was dropping precipitously.

Validation does not put the bugs in. They just find them and report them.

Finally, at one of these meetings Bob said (not so patiently), “Okay. You are pushing on the wrong thing here by pressuring validation. We don’t put the bugs in. We just find them and report them.” He was absolutely right. I turned my attention to the design team, its coding methods, individual testing, and code-release criteria, and we made many changes that immediately began paying off in terms of overall design quality.

The HOTM metric never did linearly climb to 95, but it started moving after that, so, in retrospect, I think it was a good exercise. Intel has since revised the HOTM many times to make it more useful as a project-guidance mechanism.

COORDINATING WITH OTHER PROJECTS

Another characteristic of the realization phase is that the project is no longer in a corporate vacuum. Its visibility has increased to the point that other projects are not just reading about it; they are comparing their own design projects to its official goals and methods. This exercise is vital, for several reasons.

The first and most important is that customers like to think that a big company like Intel has put a lot of corporate thought into creating and coordinating their various product offerings. They expect that the overall road map will be coherent and will allow them to design systems that will form their customer’s road map later on. If Intel puts multiple products into the same product space, it confuses the customers, and if those customers are foolish enough to design competing systems around them, the customers of the customers will also get confused.

Coordinating multiple design projects is also important because it can mean the difference between a potentially helpful interteam synergy or a destructive internecine turf war over influence with support groups, feature developments in common tool sets, and marketing campaign money. Big companies would like to believe that by fielding multiple teams, they are casting a bigger net over the design space; coordinating these teams is just collecting the best known methods from all of them and making those available to everyone.

In large companies like Intel, multiple design teams concurrently operate in various locations worldwide. Few are ever commissioned to directly compete with one another in the same product space; more often, each team’s output is expected to fill some important hole in the product road map a few years hence. But the performance and feature sets of the various chips under development must be compared somehow, precisely so that the management and marketing teams can ensure that the overall corporate road map will make sense to the customers. This is a much more difficult task than it may appear.

Comparing two chip developments gives rise to several first-order issues. One is performance estimation, which I look at in the next section. Another is methodology: the simulators used, how they work, and their sources of possible inaccuracies. Design teams will have different beliefs about what is “best,” and it’s a virtual certainty that what one team considers an absolute requirement in a tool or design methodology will be rejected as anathema by another. And never underestimate the unpredictability of human psychology, which can easily subdue any rational technical decision.

Performance Estimation

Early in the project, when all you have are grand plans and a little data, performance estimation is an uncertain art form. Nonetheless, it is an art form you must engage in. Early estimations are crucial, even if they do show enough variability to warrant the scale dismal to stellar.

This uncertainty stems from many sources. When a design project is young, many of its major functional blocks are still in rough form. No one knows each block’s ultimate contribution to overall performance; in fact, no one yet knows if a given block is buildable at all. Interactions between the various blocks are not yet fully understood. At higher levels of abstraction, there may be compiler, software library, and operating system developments that ultimately will affect delivered performance, but for now all you can do is guess at them.

The Overshooting Scheme. Dave Sager is a brilliant computer designer who was one of the principal architects of the Pentium
4. With many years of computer design experience, Dave has come to believe that the task of conceiving, refining, and realizing a computer micro architecture is a process of judiciously overshooting in selective areas of the design, in the resigned but practical expectation that various unwelcome realities will later intrude. These surprises will take many forms, but the one common element is that they will almost never be in your favor. That clever new branch-prediction scheme you are so proud of will turn out to be a very poor fit to some newly emerged benchmarks. Your register-renaming mechanism, which looked so promising in isolated testing, will turn out to require much more die area than you had hoped, and the circuit folks will be engaged in hand-to-hand combat to make it meet the required clock speed.

Given that surprises will occur and will not be in your favor, your overall plan had better be able to accommodate any required change. Dave’s overdesign approach assumes that you will eventually be forced to back off on implementation aggressiveness, or you will realize that the designs just do not work as well as you had hoped, either in isolation or in combination with other parts. He proposes that you not approach such thinking as if it were a contingency plan that such eventualities are almost a certainty, given the complexity and schedule pressures of contemporary designs.

In essence, Dave’s theory is that if the absolute drop-dead project performance goal is 1.3 x some existing target, then your early product concept micro architecture ought to be capable of some much higher number like 1.8x, to give you the necessary cushion to keep your project on track.

With both P6 and Willamette, we did, in fact, go through a process much like Dave’s anticipated sequence:

�� Early performance estimates are optimistic, and as performance projections gradually get more accurate, they yield a net loss in expected product performance.

�� As the design team continues implementation, they constantly face implementation decisions that trade off performance, die size, schedule, and power. Over time, the cumulative effect of all these small tradeoffs is a noticeable net loss in performance.

�� Projected die size tends to grow continuously from the same root causes so, eventually, the project faces a double threat with no degrees of freedom left from which to seek relief.

When we reached this state in P6, we basically suspended the project for two weeks, purposely placing the performance analysis team’s concerns on project center stage. Anyone not directly engaged in the performance war was tasked with finding ways to reduce die size. These “all hands on deck” exercises actually work quite well, and tend to give everyone a chance to step back from their daily grind and revisit the bigger picture. With that new vantage point, they can often spot things that would otherwise have gone unnoticed.

Not every team can follow Dave’s suggestion. Sometimes, it is all a team can do to even get close to the requested performance target, let alone try to overshoot it. The schedule may not let you innovate in the microarchitecture area, or the team may be comprised solely of circuits experts unqualified to tackle architecture. Of course, the corollary to Dave’s thesis is that if you do not purposely try to overshoot, and the customary bad news does arrive at your project’s doorstep, you may end up with a nonviable product. Dave’s suggestion is to face that issue at the beginning of the project, before a lot of time and money have been invested in it.

But even if every one of these design teams could successfully follow Dave’s mandate, the teams would still disagree on important issues, and these differences of opinion directly influence their respective performance projections. Different slots in the road map can and do emphasize different applications; the large transactional database applications that rule the server domain are anathema to the mobile laptop space, for example. So if two teams wanted to compare their designs so as to inform corporate management, neither would let the other dictate the benchmarks to be used in the competition.

Server, desktop, and mobile product spaces do overlap, however, and it makes good sense to at least do sanity checks among them. If a part being designed for a different market segment happens to outshine the mainline part on some important application, that is as clear an early warning signal as any development project is ever going to get.

Psychological Stuff. Although it seems to surprise the public every time it happens, well-intentioned and well-educated experts can start with the same basic set of facts and reach opposite conclusions. In the same sense, performance simulations, especially the early ones, require a great deal of interpretation and are thus subject to the whims and personalities of their wielders. Some design teams are conservative and will not commit to any target that they are not sure they can hit. (Our Pentium Pro team was one of these, at least partly because Randy Steck, Fred Pollack, and I believed in this approach so strongly.) Other Intel teams had internalized company politics and reflected that understanding in their performance goals-promise them what they ask for now; if you later fall short, they will have forgotten about it, and even if they haven’t, you can apologize later. Besides, tomorrow may never come.’ Still other teams would aggressively choose the highest numbers from any uncertainty ranges and if necessary stack these rosecolored-glasses numbers end to end, on the grounds that (a) they are very smart people, (b) there is plenty of time left before project’s end, and (c) nobody can prove the numbers cannot turn out this way.

You don’t have a problem with our project. You have a problem with theirs.

We sometimes faced hostile management questioning along the lines of “Team Z has officially accepted a goal of delivering performance not much different from yours in less time, with less money. Why should I keep funding your project?” We tended to reply with “You don’t have a problem with our project. You have a problem with theirs.” A long moment would then ensue with said manager looking doleful and dubious. After all, he liked their story better than ours. Management would then direct us to go reconcile our differences with Project Z and report back as to which team had had to change their official plan-of-record (POR). Almost always, after a perfunctory attempt at reconciling the two points of view, neither team changed anything, and we would all forget about the episode until it repeated about six months later.

I wish I could offer bulletproof, hard-won advice on this topic that anyone could follow and avoid the unpleasantness implied above, but I can’t. When I was a boy, my mother warned me that I could not change anyone else’s behavior, just my own. In the same way, I believe strongly that engineers must think straight and talk straight: tell themselves and their management the truth as best as they can.

If some other project cheats on this process, your project leaders will get an opportunity to point out these discrepancies, and then upper management must act, or at least actively interpret each project’s predictions as needed.

Simulator Wars

Our P6 development demonstrated the value of simulators for performance projections to Intel management and technical ranks. The original Pentium chip was a relatively simple extrapolation from the earlier 486 chip, and its presilicon performance tool consisted of a spreadsheet combining some rules of thumb about the machine’s microarchitecture with other rules of thumb about the benchmarks.

As I described earlier (see the section “A Data-Driven Culture” in Chapter 2), we began the P6 project by writing our own performance modeling tool, a simulator of sorts, called the dataflow analyzer (DFA). This program started out as a very general out-oforder-execution analyzer, but that in itself made it useful only to the P6 project. Those of us who wrote this tool and used it to make performance projections eventually achieved some level of confidence in it, but engineers outside the project had no real basis on which to understand it or use it for comparative performance studies in their own designs.

By the mid-1990s, we had real P6 silicon and could directly calibrate DFA’s predictions. But by then we were beginning the Willamette (Pentium 4) design and were finding that the basic differences between the
Willamette and P6 engines were such that we had to invest a lot of software design effort to make DFA useful to Willamette and, in fact, ended up calling it NDFA, for “new” DFA. In hindsight, although DFA was a brilliant success for P6, for Willamette we should have dumped DFA and used a custom simulator; it would have been more efficient and accurate overall.

Most of us were uncomfortable with how difficult Willamette’s NDFA was turning out to be, but few of us had the software skills to actually do something about the problem. One of us did: Mike Haertel complained long and loud about the problem, and when it became clear to him that we were going to try to stay with NDFA rather than formally commissioning what we believed would be an even riskier start-from-scratch simulator, he asked if he could write one himself. It is not uncommon for bright, creative engineers to become frustrated with their tools, to convince themselves that they could do much better, and to importune management to let them try. Sometimes, this is simply a case of hardware designers thinking software is easy. Most of the time, it seemed to me that they probably could do better, but it would take much more time than they were assuming, so it would be better for the project for them to continue tolerating the tool’s shortcomings than to drop their project responsibilities while they fixed it.

But Mike Haertel was not the usual dissatisfied engineer. Given his outstanding skills in both hardware and software design, if he said he could write the simulator in a certain amount of time, then he probably could.

I decided to let him proceed, but with the proviso that his new simulator had to work quickly or we would abandon it quickly. I reasoned that if his simulator ideas turned out to be unworkable, we would not have made a large investment, and his new simulator had to be accurate with respect to the Willamette microarchitecture for it to be worth doing at all.

Mike pulled it off in eye-popping fashion. He called his new simulator “Willy” and convincingly showed its intrinsic advantages over NDFA. He even had a better user interface. The Willamette architects loved it.

‹ Prev Next ›