The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips (Practitioners)
Page 12
ECO Control and the Project POR. During one of the infamous Willamette project POR meetings, it occurred to me that some of the overhead and friction that the architecture team was experiencing on the Pentium 4 was due to our need to reconcile decisions at the marketing/general manager level with those at the implementation/ECO level. With rare exceptions, our general manager and our marketing organization were not required signatories on our implementation ECOs, yet some of those design changes impacted project-level goals such as die size, power dissipation, performance, and schedule. The way to reduce the friction between these two parallel streams of project perturbation was to put the project POR on its own ECO control. The engineers would have to approve all changes to a POR regardless of the source.
I was proud of this insight. It seemed such an obviously good idea that I relished presenting it at the next staff meeting.
When staffers really like an idea, it is obvious: They tear their eyes away from the email on the laptop they always bring to the staff meeting, vigorously nod their heads, and share their enthusiasm with the staffers on either side of them.’ When staffers really hate an idea, it is not as clear, but still readable, because they keep looking at their laptop screens, but they make a face as though something they ate is winning a debate with their stomach. After hearing my POR-under-ECO-control idea, the staffers gave no reaction at all. It was as though Bob-canceling headphones had temporarily appeared on all their heads and my words had been erased from the room’s airwaves.
Not understanding their reaction, I tried again a week later. And then again. After a few weeks of this, it dawned on me that I was seeing a form of collective pocket veto. The rest of the staff did not have to debate me on this topic; they had only to do nothing to keep the status quo. In effect, they were voting me down without having to explain their reasons. Once I realized that, I began to wonder why they considered the status quo better than putting the project POR under ECO control.
One of the silent status quo proponents finally took pity on me and explained what I had been missing. With the Thursday meeting project-planning scheme, the general manager felt he could directly make crisp, clear, timely decisions about the project. (I’m not saying he had that ability; he only felt he had it.) Because marketing ran these meetings, set their agendas, and circulated the results via meeting minutes, they, too, felt they already had a direct say in the project POR. Both the general manager and his marketing organization viewed my proposal as a threat because now the design team would have to agree to all proposed changes before those changes became the official project POR.
I couldn’t blame them. From their viewpoint, the status quo model was perfect: Simply dictate to the design team what you want, and they implement it, no questions asked. Why would considering the design team’s feedback before adopting the desired changes result in a better product`? To the general manager and marketing, my proposal would result in an unnecessary power struggle. The prospect that someone several organizational levels below them could veto their pet feature was unthinkable.
People who have committed themselves to making the world’s best product, no matter what it takes, will turn out a world-class product every time.
Avoiding Mediocrity
All large companies have at least a few burnouts, folks who simply no longer engage in their work with the same passion and commitment they once did. Burned-out engineers resign themselves to simply accepting whatever requirements are imposed on them from above, knowing that, in the end, if things fall apart, they can always say, “Well, at least it wasn’t my fault” and shuffle over to another project to repeat the performance. With big projects, these people can still make a contribution because of the sheer workload to be distributed. But the people who really make the difference between a design that merely works and one that is truly stellar are those willing to defend the project against mediocrity.
Mediocrity can be imposed from above, by management that simply will not allow enough time to do the project right. It can arise from the design team’s poor execution. It can be preordained if the project’s goals are set too high or too low or the feature set is too timid or aggressive. Mediocrity is the default outcome of any project that does not value people with a total commitment to what they are designing. Management must do its part, but only the architects and design engineers who are actively involved in the project can accurately judge where mediocrity lies, and they must be given an opportunity to speak out whenever necessary to avoid its strong pull.
I consider that kind of reasoning and the attendant project structure paving stones on the road to disaster. Design teams that feel no tangible sense of ownership over what they design could, in principle, create a credible product, if they were exceptionally professional and skillful, but they are much more likely to yield an uninspired, insipid, loser product. Design teams that are emotionally engaged in their work and people who have committed themselves to making the world’s best product, no matter what it takes, will turn out a world-class product every time. Conversely, by treating a team like a j ob shop or guns for hire, project management removes the single best weapon any design project has pride of ownership by the people in the best position to improve the design. There is a reason Steve Jobs’ phrase “insanely great products” resonates in the hearts of good designers.
All electrical engineers are taught communications theory. When digitizing a signal stream, there is a critical sampling rate above which the digitized output can theoretically be used to perfectly reconstruct the original signal. But below that rate, some really ugly distortions arise. One way to think of this is that once you have sampled a signal, that signal had better not do anything very interesting before the next sampling time because your digital stream will not reflect it. The signal is simply changing faster than your digitization process can accommodate. It often struck me while struggling with the ECO-project-POR issue that this is exactly what was happening-the project planning process was sending continuous project-change signals to a design team that could not sample them fast enough.
Put your project POR under ECO control. It will not make your general manager cede any authority, and it does not cut marketing out of product planning. It just enforces the proper lines of communication and makes visible what should have been obvious anyway: The design team must be part of the planning process in guiding a product development to a successful conclusion.
THE BRIDGE FROM ARCHITECTURE TO DESIGN
It is not enough to convey to the design team what you think is needed; it is essential to also impart why you think that.
The essential challenge in bringing a project into the refinement phase is to transfer the core ideas from the heads of the architects into the heads of the design team without serious distortion. This transference involves far more than block diagrams and pipeline drawings; the architects must pass on the overarching philosophy of the approach they have conceived. It is not enough to convey to the design team what you think is needed; it is essential to also impart why you think that.
After the P6 project had been underway for several months, and the architects had settled on an out-of-order speculative engine design, it was time to bring the design team core up to speed. We had written a preliminary document that at least superficially covered all the key ideas, but the design was still changing a lot nearly every week. We decided to host a series of luncheons at which the architects would give a live overview of various microarchitecture parts, and the designers would ask questions between mouthfuls.
Our initial attempt at this was a fiasco. I spoke first and gave an overview of the entire microarchitecture as we then conceived of it, pointing out where we were confident and what we were most worried about. The material was challenging, which taxed my memory, my understanding, and my ability to communicate to their limits, so I spent most of the first half hour facing the board, drawing diagrams. When I finally turned around to entertain questions, I was shocked at the sea of horrified faces. Af
ter lunch, a design engineer confided that my overview had not quite met his expectations. Someone who called himself an architect ought to have answers, he explained. Instead, I seemed to be content merely to point out the questions, and what had he gotten himself into with this crazy project?
Over the next few months, we architects had to find ways to establish a useful working relationship with the design team, as well as keep the technical momentum going. We knew very well that architects could not be the repository of all knowledge. Our job was to conceive and maintain the project’s overall vision, incomplete and flawed as it was at times, and to earn the trust of the design team we were leading by exhibiting a willingness to do whatever it took for the project to succeed.
Meanwhile, the design team was growing, from the original 30 or so, to the eventual 200+. We quickly realized that luncheons would not suffice as the architecture/design communications channel because the next day, three new designers would appear, all of whom should have attended yesterday’s lunch.
We had to capture these sessions permanently without requiring the architects to spend the next several months writing. We chose to present the information as a series of videotaped lectures, including an extensive Q&A session with the audience at the end. Apart from my own introductory lecture, which we had to do twice because our professional camera crew failed to record any audio the first time,’ these videotape sessions went smoothly and the tapes became hot commodities within the project. Upon joining the P6 team, new designers would spend several days watching television, but after this intensive ramp-up, they knew the names of the chip’s various units, had at least a vague idea of what the units did, and had faces and names to which they could address their detailed questions.
For mainstream projects like P6 and Pentium 4, groups outside the main project have to understand the project’s direction as well as the microprocessor’s feature set, projected performance, and other development aspects. For the Pentium 4, that meant any groups designing related chip sets had to develop their products synergistically with the CPU. Compiler, operating system, and tools groups also had to understand the new microprocessor, sometimes in nearly as much detail as the designers. Such related groups found the videotaped lecture series extremely useful.
The videotapes were a record of important intellectual property, so we had to carefully control them. Although most groups requesting the tapes had a clear need to know, we encountered more than a few requests from the merely curious. Our rule was simple: If you would not have had access to the paper equivalent of the tapes’ contents, the request was denied.
Focus Groups
A project’s concept phase must be driven by a very small group of people but, typically, that small group cannot finish the project by themselves because there is too little time and too much to do. The exception is a startup company, in which the entire technical staff might be that small group. In those circumstances (in which Dave Papworth and I had previously found ourselves while at Multiflow Computer in the 1980s), the team must automate the design process wherever possible, give up all semblance of a life outside work, and cut whatever corners look nonfatal, so as to hit the competitive market window.
Intel cannot approach product design that way and does not need to. The schedule pressure still feels brutal (although readers who are not engineers might find that hard to believe, given that both P6 and Willamette took more than four years from start to pro duction), but there are a lot of engineers available. The difficulty lies in teaching them what they need to know to help realize the design.
If the concept phase is too early to engage the design engineers, then when? We were not sure of the answer to this question as we architects emerged from the concept phase, but opportunity came knocking when Randy Steck pointed out that while we were conceptualizing, he had built the nucleus of the design team, and they were ready to go. Could we use them?
Well, sure, but not in the sense of “Here’s what we want, please go build us one.” Far from supplying the answers, which some design team members believed was our job as architects, we were mostly arming them with a long list of questions and nudging them in the general direction of a solution. Worse, Dave Papworth and I had only recently joined Intel, so we had neither a track record nor personal relationship with the designers. Luckily, Glenn Hinton had both, and his well-earned and well-deserved personal credibility went a long way toward assuaging the design team’s early worries over our general trustworthiness. At any rate, no one quit over the temporary loss of confidence. (They probably couldn’t resist sticking around to see where these clowns would take them next.)
We deputized about 30 design engineers as junior architects and split them into teams of five to seven. To each group, we assigned one concept architect who was commissioned to solve some subset of important questions. We then assigned the groups to respective functional subsets of the overall design, such as out-of-order core, frontside bus, and branch prediction. We found this group-to-subset mapping convenient because we were then assured of having at least one expert in key chip areas.
Having a concept architect in each group was critical because the architect could ensure that the focus group observed the fundamental assumptions built into the overall design. If pipeline stalls were to be handled in only one way, then each focus group had to assume that method in all they did.
Careful intergroup coordination was also critical. Focus groups are investigating and resolving open technical issues, and their results must be communicated to any focus groups the resolution would affect. Otherwise, a focus group could claim that its “solution” to a hard problem is to move it to some other group (where that problem might be even more difficult to resolve). This is not an unreasonable temptation. A complex design has many overarching problems that several focus groups could reasonably own. In the same sense that design errors tend to flourish between engineers, the big issues can fall into the cracks between groups that assume another unit will “take care of that.”
Finally, focus groups must have some global pacing mechanism because the project cannot tape out until the last unit is ready. Pacing is the fine art of ensuring that no group dives too deeply into an issue while still providing the necessary rigor so that decisions made will not have to be repealed and repaired later in the project, when the cost of change is much higher.
PRODUCT QUALITY
Humans make mistakes. Even well-trained, highly motivated engineers, working at the top of their games, make mistakes-big ones, small ones, funny ones, subtle ones, and bonehead ones. Some design errors are so subtle that even when revealed, they generate no reaction from other designers except sympathetic acknowledgment that they would have made the same mistake. Other errors are so blatant that even the design engineer in question cannot explain why they occurred. In 1987, I was working at Multiflow and had just completed the design for the floatingpoint divide/square-root unit. Because I had designed it, I knew where the design’s corner cases were, and my initial test suite made sure the unit got the right answers for those cases. I released the design for limited preproduction and made plans to take the weekend off for a short vacation. A few hours before I was to take my well-earned break, a performance analyst appeared in my office. With a puzzled look, he told me that his program appeared to be getting a wrong answer and that he had tracked it to the square-root unit. The unit got exactly the right answer unless the result was a perfect power of two, in which case it yielded a result that was exactly half the right answer. I pondered this for all of about 45 seconds before realizing that perfect-power-of-two answers are precisely those for which the accumulated mantissa bit pattern was all ones, including the rounding and guard bits. And when the mantissa overflows, you must adjust the exponent upward. If you do not adjust the exponent, you know what happens [22].
Even well-trained, highly motivated engineers, working at the top of their games, make mistakes-big ones, small ones, funny ones, subtle ones, and bonehead ones.
&n
bsp; Another example of a classic bonehead error, and this one wasn’t my fault, caused NASA’s Mars Explorer spacecraft to crash into the planet instead of orbiting it. Embedded in its navigation sequence was confusion between metric and English units [10]. Oops!
Mismanaging Design Errors
It is a regrettable characteristic of human nature that we often learn what to do by first learning what not to do, sometimes by being extraordinarily counterproductive. Dealing with design errors seems destined to be one of these learn-by-not-doing areas. The following three strategies, in particular, almost never work.
Make an Example of the Offender. In this method, the engineer responsible is put in the corporate equivalent of stocks for public display and humiliation. All the other project engineers are sure to pay attention, which the boss wants them to do, but they will also do two things the boss does not want them to do. First, they will take no more risks, no matter how valuable the potential payoff. Second, they will waste intellectual energy hiding their tracks so that no one can trace any project issue that arises to their personal decisions or designs.
Hire Only Geniuses. Some companies simply do not hire any engineers who do not have a certified genius license. This strategy fails because a company should always strive to hire smart engineers anyway, and there is no such thing as a genius license. If there were, geniuses would probably make mistakes anyway, because they are the ones who would be most willing to take risks. If some of the things they try do not fail, they are probably not trying hard enough. It is not a risk if it cannot fail, and those who do not take risks will generally lose in the marketplace to those who do.
Flog Validation. When all else fails, blame it on validation. They let the bug get by them, didn’t they? And isn’t it their job to prevent that by looking carefully at the validation plan and its execution for holes?