The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips (Practitioners)
Page 8
Teams are assemblages of individuals, each of whom has unique physical and intellectual capabilities, as well as individual circumstances in their personal lives that will impact their effectiveness.
Randy understood deeply that a design team is no less a team than a group of professional basketball players who must be assigned their roles for maximum aggregate effectiveness. Teams are assemblages of individuals, each of whom has unique physical and intellectual capabilities, as well as individual circumstances in their personal lives that will impact their effectiveness throughout the game, project, or whatever other goal they must accomplish together. With basketball teams, it is easy to remember that players are not equally capable at every task. Clearly, the really tall players should be centers, and the short players who are good shooters should play the outside and handle the ball. If the coach botches these assignments, the same set of players will deliver a much different result because there’s a reason those really tall players seldom try to dribble the ball very far.
Design team members need the same kind of careful positioning-a reality that technical design managers often forget. They see a baccalaureate degree, coupled with hiring (passing the firm’s sanity check), which they take to mean that someone is at least minimally competent at some set of engineering or programming tasks and can learn more on the job.
But really good teams go way beyond this job assignment level, actively judging each design engineer so as to give her or him the best possible point of leverage on the project. Some engineers thrive in an assignment with high uncertainty and high pressure; they enjoy the challenge and feeling of being the first to wrestle with the concepts that must coalesce if the project is to succeed. These folks tend to gravitate toward architectural and performance analysis assignments. Others like the logic and stability of a design assignment. You tell them exactly what a unit has to do, give them constraints for die space, power dissipation, clock rates, connections to other units, and schedule, and they will apply their ingenuity to solving that N-dimensional problem. Still others live for technical arcana; they get an intellectual thrill out of knowing how many open page table entries any single x86 instruction can possibly touch, and the attendant implication for how the microprocessor must handle page misses in general. These folks are the microcoders.
Place people where they can excel at what they’re gifted at doing and you are on your way to a winning team. This seems obvious enough, but it’s surprising how many managers act as if they don’t get this. Treating engineers as if they are all alike, the human equivalent of fieldreplaceable units, is a sure recipe for mediocrity.
Roles and Responsibilities
Treating engineers as if they are all alike is a sure recipe for mediocrity.
After a few weeks of intense deliberation, we had a preliminary allocation of design engineering heads to known tasks. And we had also turned up some new and interesting issues about the relationship between design and architecture. Who did presilicon validation? Who did microcode development? Where was performance analysis to be performed? What was this new “MP” group and who owned it? Who owned the process to develop the register-transfer logic (RTL) model?
There were no useful corporate precedents. Previous projects like the 486 were so small (by P6 standards) that for them, any of several organizing methods would work equally well. The concurrent P5 project being developed in Santa Clara had separate architecture and design organizations, as we did with P6, but they arrived at that organizational structure too late for us to learn much from it. So we worked through the issues with a week of focused, daily meetings.
We ended up with an overall organization similar to that in Figure 2.3. The general manager of this division was originally Randy Young, who recruited me to this project. Within six months, however, he was replaced by Pat Gelsinger. A couple of years later, Will Swope and Dadi Perlmutter took over, then Dadi and Randy Steck, and then Randy Steck and Mike Fister. The general manager position has a great deal of authority and responsibility; thousands of engineers report to these folks, and they are the primary voice of a project to upper management, as well as the executives’ main communication channel back to the project.
The general managers had other projects beside P6, but P6 was their main focus and where most of their headcount resided. The P6 design manager had groups devoted to design tools and circuit libraries and expertise, but most of the design group was partitioned into clusters, with each cluster further subdivided into functional units.
The architecture organization consisted of microcode development, performance analysis, presilicon validation, frontside bus design, the microarchitects, and a group devoted to getting our multiprocessing story straight.
Other groups at the divisional staff level included the P6 chip-set development team, a postsilicon validation organization, a marketing group, a finance group, and several others.
Presilicon Validation. The two basic types of validation for microprocessors occur before and after silicon. The presilicon validation tests the register-transfer logic (RTL) model as it becomes available, and is aimed at preventing design errata from ever appearing in the actual chip. Postsilicon validation, a combination of two Intel groups, Compatibility Validation and System Validation, tests the actual CPU silicon against its specifications and ensures that the new CPU successfully runs all legacy software.
I felt strongly that presilicon validation had to be separate from, but embedded within, the design team. Design engineers often have invaluable knowledge of the corner cases of their unit, the places where they made the most complex tradeoffs or where the most confusion was present during design. For experienced validation engineers, such knowledge is extremely useful, because they know that where complexity and confusion reign, design errata follow. Dedicating validation engineers to the units they are testing also helps the validators overcome any us-against-them thinking on the part of the design engineers. Validators can more easily learn the design engineers’ lingo and become accepted team members. So it may seem logical to embed validators into the design team itself.
Figure 2.3. MD6 (Microprocessor Division 6) basic organizational chart.
Offsetting those advantages are the simple design logistics. Design projects are always late or under extreme schedule pressure. Surprises come along as the design progresses, unanticipated events that are never in the designer’s favor. If you ever do get close to hitting your schedule, the odds increase that your unit will be asked to contribute one of its design engineers to some other unit in worse trouble. This pressure gives rise to a tempting but self-defeating strategy: After accepting the validators onto a design team, absorb them as “temporary” extra design heads until the next schedule crunch has been averted. This syndrome seems very logical because, after all, you cannot test something until it has been designed. Also, the looming schedule crunch is not an isolated event (more are behind the one that is visible), and skimping on validation does not seem to hurt right away. Poor validation becomes painfully evident only much later in the design project.
Our final answer to organizing presilicon validation was to require that unit owners, the designers who actually wrote the RTL code, write sanity-check regression tests for their own code. Before the designer officially released a new, latest-and-greatest version of his unit’s code for use by other designers or validators, he had to have tested it to at least some minimal level. This turned out to be a really good idea. Full-chip models comprised several dozen units, and even a minor typo in any of them could result in a full-chip model that would not compile or run any useful code. Having a high-level wizard (more on wizards next) debug such problems at the full-chip level is possible’ and sometimes necessary, but it is never as efficient as having the unit owners do some testing before release.
We also reallocated a small fraction of the expected presilicon validation headcount to the various units, with the proviso that some unit testing would now be required. This requirement for at least
minimal self-testing helped interpersonal relationships, too. Few things are as annoying to a too-busy design engineer as the feeling that someone else saved himself a little work at that engineer’s expense.
We slotted the rest of the presilicon validation heads into a team under the architecture organization. From that vantage point, validators could get the technical help they needed to track down malfunctions and could muster the authority needed to get alwaysoverloaded unit owners or design engineers to fix things properly before moving on. Being officially in another organization inoculated the validators from possible conscription or from being sacrificed to the schedule gods.
Wizard Problem Solving. We ran the P6 project primarily with the “bigger wizard” stratagem. If a designer at a project “leaf node” had the wherewithal to resolve a problem, then she was expected and empowered to do so. If she could not, she would discuss the issue with her technical supervisor and often they would come up with an answer. If that did not resolve the problem, they went in search of a bigger wizard. This process would repeat until the issue ended up on the desk of the biggest wizards in the project, those who had to resolve the problem. Watching brilliant people like Dave, Glenn, Michael, Andy, and circuits wizard Paul Madland pull technical rabbits out of hats was always a thrill, and they came through for the project again and again. I was always left with the impression that when confronted by a really sticky technical problem, they would somehow mentally leave this galaxy, go check around in some other one, find the answer, and return home with the solution in tow.
Paul Madland deserves a special mention here. In the early concept days of the project, the basic structures we were considering for the microarchitecture were changing radically on almost a weekly basis. Paul had an uncanny ability to guess which structures were likely to survive our ruminations, and he would proactively design circuits that would be needed for such structures, so that the project would not get inadvertently oversubscribed. Paul’s creativity and eye for circuit detail were major reasons that P6 (and its progeny) hit their clock targets.
Making Microcode a Special Case. As Figure 2.3 shows, we kept the multiprocessor (MP) group under architecture, since it was clearly an architectural task to get all the corner cases of cache coherence straight and to figure out how to start up four CPUs with indeterminate amounts of time for each to race through its power-on self-test and become ready to run the operating system.
We also kept the microcode group under architecture organization, but that decision was not as clear-cut as the MP group placement. Conceptually, microcode really is another design unit, just like the instruction decoder or the branch predictor, but it is special in many ways. In a chip-design sense, I think of x86 microcode as being similar to the “default” case in a C program switch statement:
Other units are explicitly called out in the switch statement; they know where their locus of complexity lies. It’s not that those units are simple or easy to design; it’s only that their unit’s job has some identifiable boundaries, and it’s reasonable to assume that ignoring what lies beyond will be a safe tactic.
As the project progresses, various units become silicon-area landlocked or power constrained.
Microcode cannot assume anything. When the chip powers up, it starts executing code that is in the microcode ROM. Complex instructions, of which the x86 has many, rely on microcode for their implementation. Faulting instructions will end up in microcode. Floating point has an extensive footprint in x86 microcode.
Most insidious of all, however, is that as the project progresses, various units become silicon-area landlocked or power constrained. With the passage of project time, designers lose degrees of freedom in handling surprises-they don’t have the die space to include new circuitry, or they require the cooperation of neighboring units that lack space. Microcode, on the other hand, is a combination of a ROM storage area and the programming it contains. Experienced designers know to always leave them selves room to maneuver whenever possible and as they use up their margin, they look outside their local “box” for solutions. Microcode is so flexible that you can often use it to solve problems in surprising ways. Perhaps a design error is discovered late in the project in which unit X is not properly handling corner case Y. It is entirely possible that sufficiently clever microcode can prevent corner case Y from ever happening.
Exquisite judgment is required to play the microcode card properly. Used judiciously, microcode fixes can keep a project on track; used profligately, the microcode space will fill up, or the complexity injected by all those unrelated and unexpected fixes will cause more bugs in the microcode itself. It is also possible that a microcode fix could cause an unacceptable performance slowdown. It was primarily to help modulate this process that we kept microcode development on the architecture side of the project organization chart.
Cubicle Floorplanning
We quickly discovered that the design engineers’ physical proximity profoundly affected their mutual communication bandwidth. “Out of sight, out of mind” comes close but doesn’t quite capture the problem. It is certainly possible to design complex software or hardware in a geographically distributed way, as the Gnu Freeware development efforts demonstrated, but it is not optimal. After many years of designing and leading designs, I am convinced that all the same tribal inclinations that drive wars are present in any endeavor that involves multiple humans, including chip design. Simply put, people you see often are much less likely to end up on your subconscious us-versusthem list.
The design engineers’ physical proximity profoundly affected their mutual communication bandwidth.
When I was a graduate student at Carnegie Mellon in the late 1970s, computing consisted of sitting in a room full of timeshare terminals connected to centralized minicomputers. I literally sat elbow-to-elbow with my fellow students. All of us grumbled about needing to walk to that room to use computers, but looking back, the proximity was extremely beneficial. If you had any questions about how things worked or what some error message meant, all you had to do was say it out loud. Brian Reid, creator of the Scribe word formatting system, or James Gosling, creator of the Unix Emacs editor and the Java programming language, or a dozen other computer science luminaries would immediately provide both the answer to your question and (intuiting what you were really trying to do) a better way to tackle your problem in the first place. They not only answered your question authoritatively, they solved your real problem, and they did so in such a way that you and everyone else in the room benefited.
The same kind of high-bandwidth communication is achievable in a design project for the same reason: carefully wrought physical proximity.
As part of organizing the P6 project, I had helped Randy figure out who would do what, at least initially. We took as a given that designers working in a single unit would have adjacent cubicles. But what would be the best cubicle floor plan for architecture as opposed to design? And where should we place the unit groups?
Coincidental to our cubicle floorplanning was the first cut at a P6 die layout that aimed to minimize long wiring so as to decrease die size and power dissipation and increase the clock rate. It occurred to me that a cubicle layout fashioned after the chip layout could achieve the same result. After all, units are made adjacent neighbors because they must have many interconnections to work properly. Likewise, placing the designers’ cubicles next to each other would tend to facilitate the one-on-one communication needed to get the chip’s interconnections right. This stratagem worked well for a few months, until we hired so many new people that we had to choose between neighbor and intraunit communication. But by then we had worked out the interfaces.
We chose to put the relatively small (60 person) architecture team in the middle of the design cubicles specifically to avoid the us-versus-them psychology and to make sure the architects felt the designers’ pain. Performance maximizing always comes at a cost. Balancing the win against that cost is the very essence of engineering, and architec
ts must thoroughly understand both ends of the tradeoff before they make it. Architects who always try to resolve design issues with whatever alternative maximizes performance are not doing their jobs properly.
Architects, Engineers, and Schedules
The more distant production is, the more farsighted the architects must be and the higher the risk the project must bear.
In drawing up their plans and specifications for the overall design, architects are essentially writing checks that the design engineers must later cash.
Projects that require hundreds of people and last more than four years are extremely expensive. Corporate executives tend to notice this and consequently exert constant pressure on project leaders to find ways to shrink the overall schedule. Better tools, better methods, more engineers, more vehement tongue-lashings-no means to this end seem unreasonable. I cannot fault the executives for their basic motivations. Long projects are expensive, and the more distant production is, the more farsighted the architects must be and the higher the risk the project must bear.
The concept phase itself is an obvious point of high leverage on the schedule in at least two ways. In drawing up their plans and specifications for the overall design, architects are essentially writing checks that the design engineers must later cash. Any mistakes the architects make will eventually translate into schedule slips. On any given issue, the time spent getting the design right up front will typically more than offset the time required for a diving save to fix a conceptual error later.
But it is also true that a four-year project may spend a third of its overall schedule leveraging the efforts of only a few people, the architects. Improving their efficiency and shortening this time will have an immediate impact on overall project schedule.