But then I had a second thought. I said, “In order to speed things up, I’ll tell you what I’m doing, so you’ll know where I’m aiming. I want to know whether there’s the same lack of communication between the engineers and the management who are working on the engine as we found in the case of the booster rockets.”
Mr. Lovingood says, “I don’t think so. As a matter of fact, although I’m now a manager, I was trained as an engineer.”
“All right,” I said. “Here’s a piece of paper each. Please write on your paper the answer to this question: what do you think is the probability that a flight would be uncompleted due to a failure in this engine?”
They write down their answers and hand in their papers. One guy wrote “99-44/100% pure” (copying the Ivory soap slogan), meaning about 1 in 200. Another guy wrote something very technical and highly quantitative in the standard statistical way, carefully defining everything, that I had to translate—which also meant about 1 in 200. The third guy wrote, simply, “1 in 300.”
Mr. Lovingood’s paper, however, said,
Cannot quantify. Reliability is judged from:
• past experience
• quality control in manufacturing
• engineering judgment
“Well,” I said, “I’ve got four answers, and one of them weaseled.” I turned to Mr. Lovingood: “I think you weaseled.”
“I don’t think I weaseled.”
“You didn’t tell me what your confidence was, sir; you told me how you determined it. What I want to know is: after you determined it, what was it?”
He says, “100 percent”—the engineers’ jaws drop, my jaw drops; I look at him, everybody looks at him—”uh, uh, minus epsilon!”
So I say, “Well, yes; that’s fine. Now, the only problem is, WHAT IS EPSILON?”
He says, “10-5.” It was the same number that Mr. Ullian had told us about: 1 in 100,000.
I showed Mr. Lovingood the other answers and said, “You’ll be interested to know that there is a difference between engineers and management here—a factor of more than 300.”
He says, “Sir, I’ll be glad to send you the document that contains this estimate, so you can understand it.”*
I said, “Thank you very much. Now, let’s get back to the engine.” So we continued and, just like I guessed, we went faster near the end. I had to understand how the engine worked—the precise shape of the turbine blades, exactly how they turned, and so on—so I could understand its problems.
After lunch, the engineers told me all the problems of the engines: blades cracking in the oxygen pump, blades cracking in the hydrogen pump, casings getting blisters and cracks, and so on. They looked for these things with periscopes and special instruments when the shuttle came down after each flight.
There was a problem called “subsynchronous whirl,” in which the shaft gets bent into a slightly parabolic shape at high speed. The wear on the bearings was so terrible—all the noise and the vibration—that it seemed hopeless. But they had found a way to get rid of it. There were about a dozen very serious problems; about half of them were fixed.
Most airplanes are designed “from the bottom up,” with parts that have already been extensively tested. The shuttle, however, was designed “from the top down”—to save time. But whenever a problem was discovered, a lot of redesigning was required in order to fix it.
Mr. Lovingood isn’t saying much now, but different engineers, depending on which problem it is, are telling me all this stuff, just like I could have found out if I went down to the engineers at Thiokol. I gained a great deal of respect for them. They were all very straight, and everything was great. We went all the way down to the end of the book. We made it.
Then I said, “What about this high-frequency vibration where some engines get it and others don’t?”*
There’s a quick motion, and a little stack of papers appears. It’s all put together nicely; it fits nicely into my book. It’s all about the 4000-cycle vibration!
Maybe I’m a little dull, but I tried my best not to accuse anybody of anything. I just let them show me what they showed me, and acted like I didn’t see their trick. I’m not the kind of investigator you see on TV, who jumps up and accuses the corrupt organization of withholding information. But I was fully aware that they hadn’t told me about the problem until I asked about it. I usually acted quite naive—which I was, for the most part.
At any rate, the engineers all leaped forward. They got all excited and began to describe the problem to me. I’m sure they were delighted, because technical people love to discuss technical problems with technical people who might have an opinion or a suggestion that could be useful. And of course, they were very anxious to cure it.
They kept referring to the problem by some complicated name—a “pressure-induced vorticity oscillatory wa-wa,” or something.
I said, “Oh, you mean a whistle!”
“Yes,” they said; “it exhibits the characteristics of a whistle.”
They thought the whistle could be coming from a place where the gas rushed through a pipe at high speed and split into three smaller pipes—where there were two partitions. They explained how far they had gotten in figuring out the problem.
When I left the meeting, I had the definite impression that I had found the same game as with the seals: management reducing criteria and accepting more and more errors that weren’t designed into the device, while the engineers are screaming from below, “HELP!” and “This is a RED ALERT!”
The next evening, on my way home in the airplane, I was having dinner. After I finished buttering my roll, I took the little piece of thin cardboard that the butter pat comes on, and bent it around in a U shape so there were two edges facing me. I held it up and started blowing on it, and pretty soon I got it to make a noise like a whistle.
Back in California, I got some more information on the shuttle engine and its probability of failure. I went to Rocketdyne and talked to engineers who were building the engines. I also talked to consultants for the engine. In fact one of them, Mr. Covert, was on the commission. I also found out that a Caltech professor had been a consultant for Rocketdyne. He was very friendly and informative, and told me about all the problems the engine had, and what he thought the probability of failure was.
I went to JPL and met a fellow who had just written a report for NASA on the methods used by the FAA* and the military to certify their gas turbine and rocket engines. We spent the whole day going back and forth over how to determine the probability of failure in a machine. I learned a lot of new names—like “Weibull,” a particular mathematical distribution that makes a certain shape on a graph. He said that the original safety rules for the shuttle were very similar to those of the FAA, but that NASA had modified them as they began to get problems.
It turned out that NASA’s Marshall Space Center in Huntsville designed the engine, Rocketdyne built them, Lockheed wrote the instructions, and NASA’s Kennedy Space Center installed them! It may be a genius system of organization, but it was a complete fuzdazzle, as far as I was concerned. It got me terribly confused. I didn’t know whether I was talking to the Marshall man, the Rocketdyne man, the Lockheed man, or the Kennedy man! So in the middle of all this, I got lost. In fact, all during this time—in March and April—I was running back and forth so much between California, Alabama, Houston, Florida, and Washington, D.C., that I often didn’t know what day it was, or where I was.
After all this investigating on my own, I thought I’d write up a little report on the engine for the other commissioners. But when I looked at my notes on the testing schedules, there was some confusion: there would be talk about “engine #12” and how long “the engine” flew. But no engine ever was like that: it would be repaired all the time. After each flight, technicians would inspect the engines and see how many cracked blades there were on the rotor, how many splits there were in the casing, and so on. Then they’d repair “the engine” by putting on a new casing, a new
rotor, or new bearings—they would replace lots of parts. So I would read that a particular engine had rotor #2009, which had run for 27 minutes in flight such-and-such, and casing #4091, which had run for 53 minutes in flights such-and-such and so-and-so. It was all mixed up.
When I finished my report, I wanted to check it. So the next time I was at Marshall, I said I wanted to talk to the engineers about a few very technical problems, just to check the details—I didn’t need any management there.
This time, to my surprise, nobody came but the three engineers I had talked to before, and we straightened everything out.
When I was about to leave, one of them said, “You know that question you asked us last time—with the papers? We felt that was a loaded question. It wasn’t fair.”
I said, “Yes, you’re quite right. It was a loaded question. I had an idea of what would happen.”
The guy says, “I would like to revise my answer. I want to say that I cannot quantify it.” (This guy was the one who had the most detailed answer before.)
I said, “That’s fine. But do you agree that the chance of failure is 1 in 100,000?”
“Well, uh, no, I don’t. I just don’t want to answer.” Then one of the other guys says, “I said it was 1 in 300, and I still say it’s 1 in 300, but I don’t want to tell you how I got my number.”
I said, “It’s okay. You don’t have to.”
An Inflamed Appendix
ALL during this time, I had the impression that somewhere along the line the whole commission would come together again so we could talk to each other about what we had found out.
In order to aid such a discussion, I thought I’d write little reports along the way: I wrote about my work with the ice crew (analyzing the pictures and the faulty temperature readings); I wrote about my conversations with Mr. Lamberth and the assembly workers; and I even wrote about the piece of paper that said “Let’s go for it.” All these little reports I sent to Al Keel, the executive officer, to give to the other commissioners.
Now, this particular adventure—investigating the lack of communication between the managers and the engineers who were working on the engine—I also wrote about, on my little IBM PC at home. I was kind of tired, so I didn’t have the control I wanted—it wasn’t written with the same care as my other reports. But since I was writing it only as a report to the other commissioners, I didn’t change the language before I sent it on to Dr. Keel. I simply attached a note that said “I think the other commissioners would be interested in this, but you can do with it what you want—it’s a little strong at the end.”
He thanked me, and said he sent my report to everybody.
Then I went to the Johnson Space Center, in Houston, to look into the avionics. Sally Ride’s group was there, investigating safety matters in connection with the astronauts’ experiences. Sally introduced me to the software engineers, and they gave me a tour of the training facilities for the astronauts.
It’s really quite wonderful. There are different kinds of simulators with varying degrees of sophistication that the astronauts practice on. One of them is just like the real thing: you climb up, you get in; at the windows, computers are producing pictures. When the pilot moves the controls, the view out of the windows changes.
This particular simulator had the double purpose of teaching the astronauts and checking the computers. In the back of the crew area, there were trays full of cables running down through the cargo bay to somewhere in the back, where instruments simulated signals from the engines—pressures, fuel flow rates, and so on. (The cables were accessible because the technicians were checking for “cross talk”—interferences in the signals going back and forth.)
The shuttle itself is operated essentially by computer. Once it’s lit up and starts to go, nobody inside does anything, because there’s tremendous acceleration. When the shuttle reaches a certain altitude, the computers adjust the engine thrust down for a little while, and as the air thins out, the computers adjust the thrust up again. About a minute later, the two solid rocket boosters fall away, and a few minutes after that, the main fuel tank falls away; each operation is controlled by the computers. The shuttle gets into orbit automatically—the astronauts just sit in their seats.
The shuttle’s computers don’t have enough memory to hold all the programs for the whole flight. After the shuttle gets into orbit, the astronauts take out some tapes and load in the program for the next phase of the flight—there are as many as six in all. Near the end of the flight, the astronauts load in the program for coming down.
The shuttle has four computers on board, all running the same programs. All four are normally in agreement. If one computer is out of agreement, the flight can still continue. If only two computers agree, the flight has to be curtailed and the shuttle brought back immediately.
For even more safety, there’s a fifth computer—located away from the other four computers, with its wires going on different paths—which has only the program for going up and the program for coming down. (Both programs can barely fit into its memory.) If something happens to the other computers, this fifth computer can bring the shuttle back down. It’s never had to be used.
The most dramatic thing is the landing. Once the astronauts know where they’re supposed to land, they push one of three buttons—marked Edwards, White Sands, and Kennedy—which tells the computer where the shuttle’s going to land. Then some small rockets slow the shuttle down a little, and get it into the atmosphere at just the right angle. That’s the dangerous part, where all the tiles heat up.
During this time, the astronauts can’t see anything, and everything’s changing so fast that the descent has to be done automatically. At around 35,000 feet the shuttle slows down to less than the speed of sound, and the steering can be done manually, if necessary. But at 4000 feet something happens that is not done by the computer: the pilot pushes a button to lower the landing wheels.
I found that very odd—a kind of silliness having to do with the psychology of the pilots: they’re heroes in the eyes of the public; everybody has the idea that they’re steering the shuttle around, whereas the truth is they don’t have to do anything until they push that button to lower the landing gear. They can’t stand the idea that they really have nothing to do.
I thought it would be safer if the computer would lower the landing wheels, in case the astronauts were unconscious for some reason. The software engineers agreed, and added that putting down the landing wheels at the wrong time is very dangerous.
The engineers told me that ground control can send up the signal to lower the landing wheels, but this backup gave them some pause: what happens if the pilot is half-conscious, and thinks the wheels should go down at a certain time, and the controller on the ground knows it’s the wrong time? It’s much better to have the whole thing done by computer.
The pilots also used to control the brakes. But there was lots of trouble: if you braked too much at the beginning, you’d have no more brake-pad material left when you reached the end of the runway—and you’re still moving! So the software engineers were asked to design a computer program to control the braking. At first the astronauts objected to the change, but now they’re very delighted because the automatic braking works so well.
Although there’s a lot of good software being written at Johnson, the computers on the shuttle are so obsolete that the manufacturers don’t make them anymore. The memories in them are the old kind, made with little ferrite cores that have wires going through them. In the meantime we’ve developed much better hardware: the memory chips of today are much, much smaller; they have much greater capacity; and they’re much more reliable. They have internal error-correcting codes that automatically keep the memory good. With today’s computers we can design separate program modules so that changing the payload doesn’t require so much program rewriting.
Because of the huge investment in the flight simulators and all the other hardware, to start all over again and replace the millions
of lines of code that they’ve already built up would be very costly.
I learned how the software engineers developed the avionics for the shuttle. One group would design the software programs, in pieces. After that, the parts would be put together into huge programs, and tested by an independent group.
After both groups thought all the bugs had been worked out, they would have a simulation of an entire flight, in which every part of the shuttle system is tested. In such cases, they had a principle: this simulation is not just an exercise to check if the programs are all right; it is a real flight—if anything fails now, it’s extremely serious, as if the astronauts were really on board and in trouble. Your reputation is on the line.
In the many years they had been doing this, they had had only six failures at the level of flight simulation, and not one in an actual flight.
So the computer people looked like they knew what they were doing: they knew the computer business was vital to the shuttle but potentially dangerous, and they were being extremely careful. They were writing programs that operate a very complex machine in an environment where conditions are changing drastically—programs which measure those changes, are flexible in their responses, and maintain high safety and accuracy. I would say that in some ways they were once in the forefront of how to ensure quality in robotic or interactive computer systems, but because of the obsolete hardware, it’s no longer true today.
I didn’t investigate the avionics as extensively as I did the engines, so I might have been getting a little bit of a sales talk, but I don’t think so. The engineers and the managers communicated well with each other, and they were all very careful not to change their criteria for safety.
I told the software engineers I thought their system and their attitude were very good.
One guy muttered something about higher-ups in NASA wanting to cut back on testing to save money: “They keep saying we always pass the tests, so what’s the use of having so many?”
'What Do You Care What Other People Think?' Page 15