The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips (Practitioners)

Page 11

by Robert P. Colwell

These meetings continued through all four years of the project. Every Thursday morning at 7 A.M., the supplicants would timidly enter the great hall, kneel before the king, and present their petitions. About the only thing missing from the general ambiance was a booming voice, bursts of flame, and the man behind the curtain. After due deliberation, which somehow always took exactly two hours no matter how simple or complicated the issue, the court would inform the petitioner that his or her issue had been duly considered and dismiss the hapless party. Sometimes, attendees actually got to see the meeting’s minutes, and even less frequently, the minutes indicated what, if any, decision the court had made.

All in all, this is not a good way to run a design project. I can think of at least four maj or problems.

First, a meeting-based product planning process does not gracefully accommodate the reality that engineers travel-a lot. It is true that coordinating meetings with multiple diverse attendees is extremely difficult, and the usual practice is to send a representative if you can’t attend. But some of these decisions required an extensive context that a representative could not be expected to bring.

A meeting-based product planning process does not gracefully accommodate the reality that engineers travel-a lot.

Second, given that virtually all the proposals being considered for POR change will affect the silicon, it is clearly imperative that the design team consider the proposal for feasibility before anyone decides anything. Forcing a design team leader to make a snap judgment that the team is then expected to implement is dangerous. At the very least, the design representative should be able to request a one-week postponement on any issue so that she can consult the rest of the team. Such a time lag may also help throttle the rate at which project changes are made, and that is a good thing. It is very easy to generate project change requests at rates far higher than the design team can actually absorb, let alone implement.

It is clearly imperative that the design team consider the proposal for feasibility before anyone decides anything.

Third, at least in Willamette, only the meeting attendees of the meeting were informed of the meeting, and even these were not always informed of its outcome (if there was one). The meeting chair was reputed to have kept a red-cover POR document up to date, but to get such a document, you had to formally request it from document control and then surrender the old one. No one was willing to do this every week.

Fourth, meeting-based product planning does not allow time to deeply consider how a decision will impact the design itself. Changes appear easier, more enticing, more feasible, and less threatening than they really are. Many proposed changes were, in isolation, fairly simple and low-impact. But a substantial number interacted in subtle, entertaining ways with other requested changes or previously accepted features. Only a design team representative fully conversant with all these interactions could be expected to spot such problems, and possibly not even such an expert could.

The irony was that the design team had its own POR, and a means to carefully change it: our engineering change order (ECO) process. Modern microprocessor design teams make hundreds of important daily decisions, day after day, week after week, year after year. Each decision is a careful tradeoff between die real estate, power consumption, complexity, performance, schedule, design and project margin, risk, and the user-visible feature set. Each decision helps set the context for future choices good prior decisions will afford the project good options when new issues arise. Conversely, cutting corners early or naively attempting to incorporate conflicting ideas will eventually lead to a day of reckoning that will at best severely damage the schedule. Our internal POR was the mechanism by which we kept score of project decisions made and those pending, and by which project designers were informed of the same.

Engineering Change Orders

Naively attempting to incorporate conflicting ideas will eventually lead to a day of reckoning that will at best severely damage the schedule.

ECOs have been around as long as engineering itself, but they used to be a piece of paper with a sign-off list, rather than today’s e-mail or Webbased mechanisms. Each signatory would sign a line to accept or reject it and route the document to the next signatory. It was mandatory to explain any rejection.

Some people believe that if you “properly” plan a project from the beginning, you will never need to change it en route. I have had the misfortune of working closely with such people, and a good time was had by none. Their (partly correct) argument is that most ECOs end up costing the design team more work and, therefore, negatively impact the project schedule. But just because ECOs are not free does not necessarily imply that all ECOs are bad. Nor do ECOs constitute tacit proof that the architects have no discipline. Some ECOs fix errors, others simplify the design, and still others simply reflect midcourse corrections that are reactions to new information about the competition or the customers. Some pushback on ECOs, and more as the project nears its end, is probably healthy for the project. Generalized antipathy toward the entire change process is not.

ECOs do not constitute tacit proof that architects have no discipline.

The Origin of Change. In Willamette, ECOs often came from the project’s general management and marketing meeting series and were aimed at managing POR changes. These were primarily ECOs for the design team, since the architects were expected to convert feature ideas from the great hall to implementable ideas. The P6 project did not suffer from this change stream because marketing did not own the POR then, and even if they had, their cubicles were only a few feet away from the design team. This proximity is important to nurture the feelings of being on the same side. Having marketing physically removed from the Willamette team led, I believe, to an almost automatic us-versus-them psychology on nearly all issues affecting the design.

Another stream of design changes on both Willamette and P6 stemmed from performance analysis. As the project design progresses, the performance analysis engineers gradually turn their attention from the abstract performance models on which the project design is based, to the RTL itself. Early RTL is useless for performance studies because it simply lacks the design detail to make performance analysis interesting and may not even implement enough of the instruction set to run the benchmark code. But the RTL gets more capable every day, and eventually becomes mature enough for the analysts to draw useful inferences. Typically, they find that many benchmarks run within the performance envelope as expected, which is great news. They will also find some benchmarks that run outside the envelope and will then work with a team of architects and design engineers to track down the performance shortfall’s root cause and propose design changes via an ECO to fix it.

Functional validation also generates a stream of high-priority ECOs to fix design errors that are causing incompatibility or wrong answers.

Finally, changes can occur just because the design is so complex. Architects are monitoring the project status, watching for design areas that are generating more than their share of errata-a sign that often means too much complexity has been assigned to that area. Architects must translate marketing POR changes into practical ECOs, watch the competition, and think about ideas that might have barely missed the cut for this chip but that could still be implemented. Any or all of these may be a source of additional design changes.

So given that there will be ECOs, despite best efforts, best intentions, and unenlightened attitudes to the contrary, the issue becomes how to handle them. Difficult questions and tradeoffs abound:

1. When does the project go “under ECO control”?

2. What parts of the design effort require ECOs?

3. Whose signatures are on the ECO sign-off list?

4. How can the process be kept timely, especially for controversial ECOs?

5. How do resolved (accepted or rejected) ECOs become visible to the project?

6. Given that other related projects such as tool development or chip-set design need to know about some ECOs or ev
en have signatory input, who decides on the need to know and what process ensures necessary inclusion and notification?

When, Where, Who. As I mentioned at the beginning of this chapter, the refinement phase has a natural tension between the free-wheeling, all-things-considered mindset and the need to converge on select workable ideas in a timely way. Typically, the project architects will supply the force pointing toward new ideas and away from schedule, and the project managers and designers will provide the countervailing force to rein the architects back in. The project leadership must make a judgment call as to when the architects have done enough exploration. In the P6 project, we architects informed our management that we had arrived at a workable idea (out-of-order), but this was atypical. In general, the architects and project management must negotiate direction convergence.

The initial project direction is the concept phase’s output, and the team should formally document ideas that come from that phase to help synchronize and educate the project’s growing design team. This document should be available to all team members, as well as to select people outside the team, such as upper management and project leaders for related developments (tools, chip sets, and so on). As I argue in “ECO Control and the Project POR” on page 57, I believe this newly minted POR should go under ECO control for the rest of the project.

At this stage, the design itself is not under any ECO control. There is a point at which the amount of intellectual property that ECO control guards is worth the overhead of the ECO mechanism; before that, ECOs are worse than useless. Until the behavioral model has been designed and is returning useful results, the project is better off without ECOs. The program source codes, on the other hand, should still be kept under revision control, just to maintain sanity in a programming environment in which many people are contributing code, checking out and building models, and filing bug reports.

There is a point at which the amount of intellectual property that ECO control guards is worth the overhead of the ECO mechanism; before that, ECOs are worse than useless.

Eventually, the project will migrate from a basis that is mostly behavioral modeling to one that is mostly structural. That migration point is approximately the right time to consider placing the RTL model itself under ECO control.

As the project gains momentum and the tapeout deadline begins to tower menacingly over the team, the team’s engineers find themselves working longer and longer hours. A growing sense of fatigue combined with an emotional determination to succeed at this project can lead to a tendency to grab a broadsword and stride into the midst of the enemy, flailing wildly in all directions. But as project time remaining grows ever shorter, so does the risk that a change will inject an unnoticed functional or performance bug into the chip. Project leadership must continually tighten the screws on the project as time goes on. The ECO process becomes the tiller by which they steer the ship, so its importance grows as the project proceeds.

All ECOs will have a signature list on their front page. This list will typically have 10 to 20 names. Every ECO will have the ECO Czar’s (see “The ECO Czar” on page 57) name as a signatory so that he or she can begin tracking that ECO for timely disposition, and ensure that the ECO as submitted contains the necessary information for the other signatories to reasonably deal with it.

The ECO Czar also tries to make sure that the signature list is appropriate for each ECO. Most ECOs have a limited scope. If a design engineer realizes that there is a simpler way to achieve some function in her part of the design without compromising anything else, and her peers and supervisor agree, higher-level signatories need not be involved.

Some ECOs have obvious project-level impact, and must be considered by many people besides the engineer initiating the ECO. Suppose the project POR calls for the chip’s frontside bus to run at 1 GHz, but several months into the design, the bus designer realizes that this target is unattainable. The ECO he files requesting that the project POR be adjusted from 1 GHz down to, say, 500 MHz will require concurrence from project leaders, upper management, marketing, the chip-set team, performance analysts, and many others.

Communicating Change. The list of people who need to know what ECOs are under consideration, which have been accepted, and which have been rejected is much longer than the list of signatories. In our ECO system, e-mail was generated automatically whenever a new signature had been obtained (both acceptance and rejection requires a signature), which helped prompt the next signatory to spend some time on a particular ECO, and also informed the larger audience of what was being considered and how earlier ECOs had fared.

Because the system was automatic, it was a reliable and timely way to collect signatures. For informing a list of final disposition, however, we probably could have used a few human comments about the motivation for the ECO, the reasons it was accepted or rejected, and the implications of that action. Assiduous ECO readers could infer most of that information by carefully reading the ECO itself, but miscommunications are among the biggest time-wasters in a development project. Avoiding them is well worth the trouble.

TIMELY RESOLUTION, No POCKET VETOES. Occasionally, when someone proposes a bad idea in the form of an ECO, simply ignoring it for awhile causes the right thing to happen. Given time to consider the idea, its submitter comes to realize that it won’t work, logs onto his computer, and retracts the ECO. Problem solved. This does not mean that routinely ignoring an ECO is a good tactic.

ECO signatories must typically grapple with submitted ECOs directly, not just hope they will go away. Politicians can use a pocket veto to “assertively ignore” an issue. The politician has to deal with some popular legislation he would like to reject and would like it to die without having to go on record as formally being against it. So he sends it to an obscure committee that sits on the bill beyond some deadline that automatically kills it.

But although the pocket veto might be popular in politics, it has no place in engineering. Doing nothing inevitably hurts the project because it can derail a change and set off a domino effect of missed errata. If the ECO is a paper document and each signatory on a sequential list of, say, 15 names took an extra week to sign an ECO, the proposed change would not be part of the project for nearly four months. (With Webbased ECO tracking, all signatories can see who has accepted or rejected the ECO and can participate in any order another reason for having e-tracking.) If the requested change was intended to fix a compatibility design error, whatever code had originally failed is still failing four months later, obscuring whatever other design errors are hiding behind the one that was found.

A rejection, on the other hand, can actually help the project focus attention where it might be most needed. By reading the accompanying explanation, the ECO submitter might gain insights into alternative ECOs or perhaps want to contact the rejecting party and clarify wording or address issues that person raised. Sometimes, signatories simply do not understand the ECO, or they have some other agenda. A rejection brings to light any misunderstandings or differing directions.

The Folly of the Preemptive Signature

Ironically, I was often the final ECO sign-off and, consequently, the ECO Czar’s frequent target. For a time, he attempted preemptive strikes by getting me to sign off before all the other signatories had weighed in. The folly of this approach became clear when the engineer who had submitted the ECO rejected it after I had signed it, a juxtaposition that he gleefully made certain I knew about. Apparently, the ECO submitter realized he had made a blunder that rendered the ECO clearly unworkable. By signing it, I had proven that either I had a remedial-level understanding of the issue or had barely read what I signed. I threw various objects at my heckler until he went away, and then resolved never to sign another ECO until the submitter, the implementer, and the implementer’s supervisor had already signed. Keeping the ECO stream moving is important, but keeping a design team’s confidence in its leadership is even more so, second only to maintaining your self-confidence as that leader.

Miscommunications are among the biggest time-wasters in a development project

THE ECO CZAR. Project leaders immediately and emphatically agree when you point out that no one should sit on an ECO and that signatories must either accept or reject it. But their good intentions do not automatically translate into the necessary actions. In the Willamette project, getting even noncontroversial ECOs through the sign-off loop was taking so long that designers had begun speculatively implementing them, thus subverting the entire process. To combat this, I appointed an ECO Czar, Warren Morrow, whose job was to prod extremely busy people by asking them to replace whatever they had thought was their highest priority with the ECO sign-off task. Keeping the ECO process moving along took a high degree of indefatigability and an uncommonly diplomatic touch. Warren could often be found waiting patiently in the offender’s cubicle until that person pondered the ECO and signed off on it. The embarrassment factor usually worked in Warren’s favor, as did the reality that sooner or later, any laggard’s refusal to sign an ECO for or against could and would be overridden by the laggard’s boss, who was sure to remember the episode at the next employee performance review.

In general, the surest way to prevent a task from falling into the cracks of a large project is to assign someone reliable to that task, and check on their progress every now and then. In the ECO Czar’s case, that meant asking if the task was getting done and if the Czar was going crazy. If I got a yes and no, or at least a yes and not quite yet, then all was well. Thank you, Warren.

‹ Prev Next ›