Shuttle, Houston

Home > Other > Shuttle, Houston > Page 32
Shuttle, Houston Page 32

by Paul Dye


  The same thing was true in mission planning. I expected people to front-load their work and get things done before the actual deadlines. This was often because the goalposts frequently moved—payloads made changes, the program made changes, the timing of the mission changed, the rendezvous target had changes—and all of them came after you had done your work. So that meant you had to do it over. The only way to deal with these late changes was to be early with your normal work so that you had extra capacity to work the new stuff when it came in.

  I worked a huge number of planning shifts in my days as a Shuttle Flight Director. I sort of liked the predictability of the process, even though the actual work we’d do could be very unpredictable. The one thing that all those shifts had in common—when I was leading them—was that I refused to ever be late. I expected my team to have their inputs done early, have their reviews done early, and be ready for uplink of the new daily plan early. That way, there was no final rush, no panic, and no missed deadlines. Be early and you won’t be late.

  The First Answer Is Always Wrong

  Because spaceflight is fast-paced, and because we always wanted to show everyone that we were on top of what was happening and had quick reactions (and minds), there was a tendency for flight controllers to want to solve every problem within milliseconds of seeing something happening in their system. This tendency toward fast answers is admirable as such—but it also leads to problems, because in most cases of real failures the first answer is always wrong. If you take a little time, look around, see what else is happening, and put what you see in context, you almost always find a better answer or better path to follow than you initially thought. Flight controllers got caught in this trap over and over again, which is why we tried to encourage fast thinking but slow action.

  Part of the problem with making snap judgments was the way that we trained. In part-task systems trainers or full-up integrated simulations, the instructor put in a failure and the student was expected to recognize it and react to it. There was generally a single answer to each problem, so the only measures of “goodness” were if you got the right answer and how quickly you got there. Hence, we “trained in” the trait of being fast, of having the right answer as quickly as possible. There is nothing wrong with this per se, but it isn’t how the real world of spaceflight works.

  Generally speaking, failures seen in flight are fuzzy—we see some symptoms that we have never seen before, and nothing bad happens right away. A pressure starts fluctuating, a temperature begins a funny cyclic behavior, or some digital bits start showing weird patterns. We look in our bag of failures that we had envisioned before the mission, and that we had experienced in the simulator, and we try to match what we see to what we trained for. Our brain then comes up with the closest match, spitting out an answer. The problem is that we are generally trying to fit a square peg in a round hole, because what we trained for is not what is actually happening. The strange pressure fluctuation isn’t a leak across a pair of seals in an accumulator but is simply the variation of pressure due to temperature changes induced by a nearby cooling line that is alternately freezing and thawing because a payload box mounted on the cooling system is operating intermittently. The temperature is cycling in a way we have never seen before, due to shading by a solar array on the Hubble Space Telescope temporarily mounted in the payload bay for repairs—and our attitude changes weren’t taking that into account at the altitude we are flying. And the funny digital bit pattern is there because someone programmed the wrong decoding table into the MCC workstation, so the data is gobbledygook—nothing is wrong on board.

  There are many ways in which you can be fooled into thinking that you know what you’re seeing. But if you come up with a quick answer, and if you spit it right out to the Flight Director (or the Flight Director tells the MMT what is happening too early), you will almost certainly be wrong. Being the first with the wrong answer is never a career-builder—and it doesn’t help the program, the mission, or the flight either.

  Now of course there are cases where you must act now, and you have to be right. That is simply the game in a high-risk, high-speed flight environment. If you have to take a course of corrective action immediately, then you want to try and take the most conservative action you can that will leave you options for recovery if (when) you come up with the real correct action. Head in the direction of “goodness” but be ready to change course when you gather a little more information. This is an awful lot like saying “take your best guess.” But sometimes the best first answer is an informed guess.

  The important thing to remember is not to be paralyzed by the fear that your answer will be wrong but to understand the tendency toward rushing an answer when we think we know what is going on—and to fight the tendency to be the first with a wrong answer. Know this is going to happen and temper your response, tailoring your quickness to the time you have available. Know when you need to act, and then take just the action needed—and always leave yourself an out so that you can take the correct action when you figure out what that is.

  Never Make the Right Decision Too Soon

  The sister axiom to “The first answer is always wrong” is to never make the right decision too soon. All problems have a time when their effect becomes critical. Unless it is something actually exploding, you almost always have a little time to figure out what is happening, what the effects will be on the vehicle (and the mission), and what the potential actions are that can fix (or mitigate) the problem. So the first thing you have to determine in the first milliseconds that you see a problem is “How much time do I have to solve this?”

  Being the first one to get to the wrong answer never gets you any points. Even if you look up and see that you are about to drive over a cliff, you probably have at least a little bit of time to figure out if it would be better to go right or left before you turn the wheel. Remember that old pilot saying that goes, “In an emergency, the first thing to do is to wind your watch”? The adage reminds us that we should never just react. In most cases, if you’re not already dead, you should be deliberate, figure out what is happening, gather some data and information, formulate choices, and only then decide. “Winding your watch” is a metaphor for taking this pause.

  Many folks see this piece of advice and ask, “Why not just choose an answer and get on with it? What’s wrong with being initially decisive?” Well, the bottom line is that we never have enough information to make any decision. By “enough” I mean all the relevant data that goes into choosing the best answer in any given situation. We can always collect more data, always ask for more clarification—right up until the time that the decision has to be made. Making good decisions is about knowing as much as you can about the alternatives, the desired outcome, and the place you find yourself in. Without building a good knowledge base before making a decision, what you are really doing is just guessing. A guess is really not a decision—it’s just a stab in the dark.

  By agreement with the Shuttle program, the flight control team (through the Flight Director) had the authority to take whatever action they felt was necessary for the safe and successful completion of the mission. This agreement was basically the flight rules document. If anything happened that is outside of what was covered in the preapproved rules, then one of two things occurred. If there was no time to consult anyone outside the flight control team, then the Flight Director made a good decision based on all the information they had at the time. But, if a quick response was not required, then the decision got shuffled off to the MMT. The team had to put together all the necessary data, options, and recommendations so they could brief the MMT so that the program management could decide what to do.

  The key thing to remember in either of these cases is that we should always take as much time as we have available to make the decision—especially if the decision is irrevocable—so that we don’t head down a dead-end path. What do we wait for? More information, new ideas, or a change in the situation. Any of those could affect
what we eventually decide to do… so why make the right decision too early?

  There Are Always Alternatives (Maybe Not Good Ones)

  I always taught flight controllers that before they made any decision regarding a course of action, they should come up with at least one alternative. Sometimes, that alternative might be to do nothing; other times, it might be to do the opposite of what their first inclination might be. In any case, even if you are facing a problem that requires a fast response, taking just a moment to think of at least one alternative might just make you realize that you were headed down a poor path. Or, as I also told them, the alternative might be so stupid as to defy all logic—but that just proved your original idea probably wasn’t that bad, and you can proceed with confidence.

  If there are no alternatives to a decision that you are making, then truly it isn’t a decision at all, is it? You are simply executing the next step of an inevitable process. Stopping yourself to come up with an alternative will make you feel better either because you now realize that you have made a choice between multiple options or because you realize that the decision was out of your hands anyway.

  It was not uncommon in the Shuttle world for a discipline to have a problem, and while they were describing it to the Flight Director, another discipline would pipe up with an alternative because they saw other data that made it clear there were larger issues afoot. The MMACS (Mechanical Maintenance Arm and Crew Systems) Officer might have lost a heater on a hydraulic line, for instance, and it turned out that this tipped off the EGIL (Electrical Generation and Illumination) controller that a power bus had failed, and that this was, in turn, going to affect other heaters that belonged to other operators. This type of discussion was important for the flight control team, because by exploring options they determined the real scope of the problem.

  The day that the Columbia broke up on entry, the first indication of the problem came from the MMACS console when he lost a tire pressure indication on the left landing gear. Usually, an indication like low tire pressure would be interpreted as an instrumentation failure: because there were multiple measurements for each tire, if one indicator went away suddenly, it was more likely that the instrument failed than the tire leaked. On that day, there were multiple instruments showing a loss of pressure. But the tire hadn’t actually leaked; the breach in the wing had burned through all of the wiring. It was quickly determined that more instrumentation was failing and, in this case, there were no alternatives. But the discussion that went on in that short period of time was exactly the kind of thing that made one think of alternatives—if it wasn’t the tire going flat, then what was it?

  The bottom line was that with a vehicle and operation as complex as the Shuttle, if you couldn’t think of at least one alternative to any course of action, then you probably weren’t creative enough to be working on the flight control team anyway.

  Trust Your Data—But Never Believe It

  Of course, all this talk of looking for alternatives, taking more time to look for better data, and not reacting too quickly depends on data. We have to trust our data, but we also need to be suspicious of our data. It’s a paradox, one that you lived with in the Shuttle program every day. You always had to ask yourself, “Could the data be lying to me?”

  Data from the vehicle was obtained through electronic means. A temperature measurement, for example, started as an analog voltage that would be converted to a digital word that computers could use, and that word would be inserted in a stream of data words that then were transmitted to the ground. The ground computers would then decode the data stream, breaking it into its component word. The different console display applications (programs) would then calibrate the word and display it. In this case, it would appear as a temperature to the controller. It was a long route for the data to take, and all sorts of things could happen to it along the way.

  Maybe the temperature sensor itself was experiencing a mechanical failure. Perhaps it became unstuck from whatever it was supposed to be measuring. If so, then it could have been subject to some corrosion that could have changed its electrical characteristics such as its resistance. Or perhaps the voltage supplied to the sensor might have varied, altering the accuracy of the output. Or maybe the black box that converted the voltage to a digital word was having a problem. All along the way, there could have been communications failures that froze the data on its last value. Whatever the failure, the flight controller looking at the screen might have seen a steady temperature, when in fact it was dropping or climbing.

  So the first rule of trusting your data was to never trust your data… at least not one single data point. The trick was in understanding how instrumentation could lie to you. A temperature measurement that went from 100 degrees to 0 degrees in an instant was much more likely to be a failed instrument than an actual temperature drop. If you watched it drop steadily over time, then it was more likely to be an actual temperature drop—most of the time. In order to make sure that critical data was being monitored, the normal design would include redundant temperature measurements—a second transducer right next to the first, with its own data channel. But there is an old saying that goes, “A man with two watches never knows what time it is.” In other words, two measurements could help, but you couldn’t always know which was correct unless one failed high or low.

  A lot of the time we were flying with new science or engineering payloads that had never been tried before. As a result no one really knew what the data might look like before we got into space. In these cases, of course, we would generally be suspicious of everything and tend not to make snap decisions based on anything we saw until we gained some experience with it. This was fine for things that didn’t affect vehicle safety. But if vehicle safety was at issue, then we relied on our flight controllers to be experts in their field. They had to draw from a wide range of experiences and apply their personal knowledge about how their system might work. It was why we had engineers rather than technicians watching over the vehicle—with a new and complex machine, you never knew exactly what you were going to encounter.

  Overall, we found that the data was pretty reliable. But we always needed to take a moment when we saw something we didn’t expect to ask how the data might be lying.

  In God We Trust—All Others Must Bring Data

  Even though we were always asking ourselves how the data was fooling us, the fact of the matter was that data was king. In any argument or problem-solving session, the person with good, hard, solid data was going to win over someone who was just speculating. The Mission Evaluation Room (which housed the engineering support teams for the missions) had a sign over the door that said “In God We Trust—All Others Must Bring Data!” We didn’t decide things based on guesswork.

  The Shuttle was a multi-billion-dollar program responsible for priceless national assets—and the lives of their crews. Guessing just didn’t cut it. It was okay to run up against a problem that no one had ever seen before, or that had no apparent solution. In such cases we did what we were paid to do—find the reason behind what was happening and come up with potential solutions. It was hard work, sometimes costly—and it could take a long time.

  A good example of this was the “summer of hydrogen leaks.” It began with a launch scrub due to a high concentration of hydrogen in the aft compartment of Columbia on the launch pad. The launch attempt was for the STS-35 mission carrying Astro-1 in May 1990. The fear (totally rational, backed up by testing) was that if you lit off the engines in that environment, you could have quite an explosion. The Shuttle was drained of fuel and the crew was sent home. The engineers scratched their heads looking for the source of the leak. It was not obvious—there were no smoking guns that showed where the leak had occurred. The first response was to tighten a few things and try again. New countdown—same result. And so began the long summer of troubleshooting that involved two Orbiters (Atlantis was tried after Columbia was rolled back to the hangar), numerous disassembles, destackings, reassemblies, and
restackings. There were tests, tests, and more tests. Thousands of engineers got involved, and massive fault trees were worked through.

  The problem was that they were dealing with multiple failure modes from a few different sources. The hydrogen leak was different on each vehicle. The sources were eventually found and repaired, with numerous process changes as a result, and the Shuttles returned to flying for a long time. But all the work was based on testing and analysis—not guessing, not wishing, and not hoping for a better result. The work was data driven. I remember sitting through countless meetings, with reams of data being presented, trying to sort out all the different variables. It took a large tiger team to eventually sort out the issues, but that is what engineering is all about.

  We expected no less of our operation engineers. While we had to make decisions at a fast pace while the Shuttle was in flight, we expected decisions to be based on data—not guesses. That meant flight controllers had to study the history of their systems and remember all the various tests and results that occurred over the decades of Shuttle design, construction, and testing. There was no substitute for knowing your system intimately—it was the price you paid for walking through the Flight Control Room door.

 

‹ Prev