157
- SIX -
SCENARIOS
Systems construction and planning processes
He that builds before he counts the cost, acts foolishly;
And he that counts before he builds, finds he did not count wisely.
- Benjamin Franklin
just as john started to renovate his house, TransCorp initiated a billion-dollar project. Why do house construction projects, start-ups or product development ventures take more time, money and effort than we expect? For example, one study found that of 3,500 projects executed, the project budget was often exceeded by 40 - 200%. Studies by Planning Professor Bent Flyvbjerg found that about 9 out of 10 transportation infrastructure projects had cost overruns. His studies also found large deviations between forecast and actual traffic demand volumes. He gives us several examples: Boston's Central Artery Tunnel was 275% or $11 billion over budget in constant dollars. Costs for Denver's $5 billion International Airport were dose to 200% higher than estimated and passenger traffic in the opening year was half of that projected. Denmark's Great Belt underwater rail tunnel had cost over-runs by 110%. Other examples on projects with cost overruns and benefit shortfalls are Bangkok's Skytrain, Los Angeles' convention center, Quebec's Olympic stadium, the Eurofighter military jet, the Channel Tunnel, the Pentagon Spy Satellite Program, and Athens' 2004 Olympics.
A project is composed of a series of steps where all must be achieved for success. Each individual step has some probability of failure. We often underestimate the large number of things that may happen in the future or all opportunities for failure that may cause a project to go wrong. Humans make mistakes, equipment fails, technologies don't work as planned, unrealistic expectations, biases including sunk cost-syndrome, inexperience, wrong incentives, contractor failure, untested technology, delays, wrong deliveries, changing requirements, random events, ignoring early warning signals are reasons for delays, cost overruns and mistakes. Often we focus too much on the specific project case and ignore what normally happens in similar situations (base rate frequency of outcomes-personal and others). Why should some project be any different from
158
the long-term record of similar ones? George Bernard Shaw said: "We learn from history that man can never learn anything from history."
The more independent steps that are involved in achieving a scenario, the more opportunities for failure and the less likely it is that the scenario will happen. We often underestimate the number of steps, people and decisions involved.
Add to this that we often forget that the reliability of a system is a function of the whole system. The weakest link sets the upper limit for the whole chain.
TransCorp wants to develop a new product.
To predict the probability of developing the new product we need to know all the steps in the product development chain and the probability of each one. The project is composed of 6 steps and each step is independent of the others. Each step has an 80% probability of success. Based on similar development programs performed under the same conditions, TransCorp estimates that 8 out of 10 times each step is successful. In 2 times out of 10 something happens that prevents each step from succeeding. But since each step is independent, the probabilities must be multiplied together. The probability the company finally succeeds in developing the product is 26% - meaning that TransCorp should expect success one time out of four. So even if each step had an 80% probability of success, when combined, the probability of product success decreases to 26%.
Every time we add another step to some system, the probability that the system works is reduced.
John is thinking of investing in a biotech start-up.
Professor and startup coach John Nesheim, who has been involved in some 300 plus startups, tells us in High Tech Startup, that only six out of one million high tech ideas turn into a public company. This base rate frequency tells us there is a low prior probability of turning into a public company.
Take a biotech venture as an example. Studies show that of every 10,000 to 30,000 drug candidate molecule entering discovery, only 250 make it to pre clinical evaluation; only 5 to 10 to clinical; and only one gets approved. There are so many things that must go right before it becomes a business that generates money. Factors like technological virtue, product safety, cost-effectiveness, manufacturing, patent issues, product stability, regulatory matters, market assessment, competitive position, financial need (and availability), etc. How can we put a probability number on all these factors? And even if we can, these factors must all work to achieve the desired scenario. Ask: What is the prior probability of success for this type of venture before I consider this specific case?
159
Warren Buffett says on biotech:
How many of these companies are making a couple of hundred million dollars a year? It just doesn't happen. It's not that easy to make lots of money in a business in a capitalistic society. There are people that are looking at what you're doing every day and trying to figure out a way to do it better, underprice you, bring out a better product or whatever it may be.
The compensation we need for taking a risk is really a function of the wanted outcome in relation to all possible outcomes. Take rolling a die as an example. How likely is it that we get a six four times in a row? If we have to invest $1 to play this game once, we need to get back $1,296 to go even. There are 1,296 outcomes and only one of them is favorable (6,6,6,6).
The more negative things that can happen - or positive things that must happen - the better compensated we must be for taking on the risk. Ask: What can happen and what are the consequences? Anticipate unforeseen obstacles.
If you do venture investments, follow the advice of Warren Buffett:
You may consciously purchase a risky investment - one that indeed has a significant possibility of causing loss or injury - if you believe that your gain, weighted for probabilities, considerably exceeds your loss, comparably weighted, and if you can commit to a number of similar, but unrelated opportunities. Most venture capitalists employ this strategy. Should you choose to pursue this course, you should adopt the outlook of the casino that owns a roulette wheel, which will want to see lots of action because it is favored by probabilities, but will refuse to accept a single, huge bet.
We can demonstrate Buffett's advice mathematically. Suppose a start-up has a 40% probability of succeeding. The probability that 10 mutually independent start-ups (with the same probability of success) all succeed is 0.01% but the probability that at least one succeeds is 99.4%. Here we assumed that the fate of each venture is independent of the fate of the other. That one start-up fails make it no more likely that another start-up fails.
"How can we fond this venture if we don't present a great fature?"
Consider bias from incentives. To sell in a venture, expected returns are often overestimated. Warren Buffett says, "We expect all of our businesses to now and then have ups and downs. (Only in the sales presentations of investment banks do earnings move forever upward.)"
160
Systems failure and accidents
On July 25th, 2000, a Concorde bound ftom Paris to New York crashed shortly after take off All I 09 people on board were killed, along with 4 on the ground.
A stray metal strip on the runway lost by another aircraft caused the event. As a result a tire burst. Its explosion sent pieces of rubber into the fuel tank, causing a fuel leak and fire.
We underestimate how likely it is that an event happens when it may happen one way or another. Accidents happen if they have opportunities to happen.
Astronomy Professor Carl Sagan said in Carl Sagan: a life in the cosmos: "The Chernobyl and Challenger disasters remind us that highly visible technological systems in which enormous national prestige had been invested can nevertheless experience catastrophic failures."
System safety doesn't reside in one component but in the interactions of all the components. If one key component fails, the system may fail. Assume a space
shuttle is composed of 2,000 independent parts or smaller systems, each with a workingprobabilityof99.9%. All parts need to work for the shuttle to work. The probability that at least one of the parts doesn't work causing the shuttle to malfunction is 86% (many parts means many opportunities for failure).
The blackout began at 10:30 p.m. in the Omaha area and the domino effect caused a cascade of electricity cuts throughout U.S.
Some systems are more prone to accidents than others because of the number of components, their connections and interactions. The more variables we add to a system, and the more they interact, the more complicated we make it and the more opportunity the system has to fail. Improving certain parts in highly interconnected systems may do little to eliminate future problems. There is always the possibility of multiple simultaneous failures and the more complicated the system, the harder it is to predict all possible failures. The exception to this is systems that serve as a substitute if the present system breaks down. We must ensure that backup systems don't cause unwanted effects or that some parts share the same defects.
Separate between independent and dependent events. The probability that an airplane navigation system works is 99% and the probability that the backup navigation system works is 90%. The probability that the backup system fails is not influenced by whether the primary system fails or not. The probability that neither navigation system works is one tenth of a percent (0.0lx0.1). Navigation system reliability is therefore 99.9% (at least one navigation system will work).
But if the systems are dependent - the probability of the backup failing rises
161
if the primary system fails - the overall probability of a system failure increases. We can't assume that events are independent of each other. What happens next in a chain of events may not be independent of the previous outcome. Subsystems may share something in common. For example, aircraft engines draw fuel from a common supply and a common pump. Dependence can also be caused by the fact that parts are of the same design, manufactured by the same company.
Unlikely things happen if enough time passes. An event that has one chance in 20 of happening in any given year (assume that the probability stay the same over time) is nearly certain to happen over 50 years (92.3%). If we reduce the probability to one chance in 40, the probability of the event happening at least once over 50 years is decreased to 71.8%.
Suppose there are 40 independent ways for a nuclear accident to happen in any given year, each with a probability of 1 in 1000. The probability that an accident happens in any given year is 3.9%. The probability that at least one nuclear accident happens during the next 10 years is 33%.
We might reduce the probability of accidents, but not eliminate them.
At 3:42pm, San Francisco was shaken by a major earthquake.
Based on frequency and scientific data, scientists estimated in 2003 that there is a 62% probability (the error range could be anywhere from 38 to 87%) of at least one magnitude 6.7 or greater earthquake, striking somewhere in the Bay region before 2032. The probability of a major earthquake happening in any given year is therefore 3.2% (assuming the probability of a major earthquake happening in any given year stays the same). The probability that a major earthquake will happen at least once during the next 5 years is 15%.
Regardless of the factors that are considered in predicting earthquakes, chance plays a role in whether a large earthquake happens.
Can we predict the time, location and magnitude of a future earthquake? Berkeley Statistics Professors David Freedman and Philip Stark say in their report What is the chance of an earthquake that a larger earthquake in the Bay Area is inevitable, and imminent in geologic time: "Probabilities are a distraction. Instead of making forecasts, the USGS [U.S. Geological Survey] could help to improve building codes and to plan the government's response to the next large earthquake. Bay Area residents should take reasonable precautions, including bracing and bolting their homes as well a securing water heaters, bookcases, and other heavy objects. They should keep first aid supplies, water, and food at hand. They should largely ignore the USGS probability forecast."
162
"Our technology was foolproof How could this happen?"
Many systems fail because they focus on the machines, not the people that use them. For example, a study of anesthesiologists found that human error was involved in 82% of preventable accidents. The remainder was due to equipment failure.
Even if the probability that some technology works is 99.99%, human fallibility makes the system less reliable than technological reliability alone. Humans are involved in designing, execution and follow-up. Excluding ignorance and insufficient knowledge, given the complexity of human and non-human factors interacting, there is a multitude of ways in which things can go wrong.
In 1983 Korean airlines Flight 007 was shot down for into Russian territory for violating Russian air space. All 269 people on board were killed.
The plane had deviated close to 360 miles from its predetermined track. It was later shown that a chain of accidental events led the plane off track. It started when the plane left Anchorage, Alaska. The captain and crew were tired when the plane took off. A series of small events, each trivial, combined to cause a catastrophe.
"Sorry, I left a metal instrument in your abdomen. "
Doctors sometimes make mistakes - both in diagnosing and treating patients. For example, a surgeon leaving a metal instrument in a patient's abdomen, a patient having the wrong leg amputated, a cardiac surgeon bypassing the wrong artery, a doctor prescribing the wrong drug, a missed diagnosis of complete heart block, a missed colon cancer, a wrong diagnosis of pulmonary embolism, or a blood product being mislabeled.
In the Harvard Medical Practice Study (1991), a random sample of 30,000
patients from 51 hospitals in the state of New York was selected. Medical records were examined to detect evidence of injuries caused by medical mismanagement. The study showed that 3.7% of patients (negligence accounting for 1%) had complications that either prolonged their hospital stay or resulted in disability. Later studies have shown that medical errors account for between 44,000 to 98,000 deaths in the U.S. every year and that medication errors are the leading eighth cause of death.
A study of 100 cases of diagnostic errors in internal medicine showed that system-related factors contributed in 65% of the cases and cognitive factors in 74%. Cognitive and system-related factors often co-occurred. The single most common cause of cognitive-based errors was the tendency to stop considering other possible explanations after reaching a diagnosis.
Studies of autopsies show that a U.S. institution with an autopsy rate of 5%,
163
can expect to misdiagnose a principal underlying disease, or the primary cause of death, about 24% of the time. Even a hospital that does autopsies on everyone should expect an error rate of about 8%.
Henry Ford said: "Don't find fault, find a remedy." Don't assign blame. Look for causes and preventive methods. Often it is better to prevent future errors by designing safety into systems than punishing individuals for past error. Blame does little to improve safety or prevent others from making the same mistake. For example, aviation assumes that errors of judgment happen and that it is better to seek out causes than assign blame. That is why the Federal Aviation Administration (FM) has an Aviation Safety Reporting System (ASRS) for analyzing and reporting aviation incidents. FM utilizes NASA as a third party to receive aviation safety reports. This cooperation invites pilots to report to NASA actual or potential deficiencies involving aviation safety. That NASA is the receiver ensures confidentiality and anonymity of the reporter and all parties involved in an incident. There has been no breach of confidentiality in more than 20 years of the ASRS under NASA management. Pilots who report an incident within ten days have automatic immunity from punishment.
Safety factor
"We always consider variability and unpredictability when setting safety factors. We act as if we were building a
bridge. We are very conservative. "
Ancient Rome used incentives in the design and construction of safe bridges. The designer of the bridge had to stand under it after completion while chariots drove over the top. This put both the designer's life and those who used the bridge at risk. This increased the probability that designers made sure the bridge held up. Engineers and architects add a safety factor to accommodate uncertainty. This factor depends on the consequences of failure, how well the risks are understood,
systems characteristics and degree of control.
Assume that accidents will happen and prepare for when people and technology don't work as planned. Systems should be designed to eliminate the probability of bad events or limit their consequences if they happen. We can borrow an idea from aviation where incidents are thoroughly investigated to learn what went wrong and how to do better next time - critical incident analysis. Ask: How do specific accidents evolve? What major factors contribute? Are there any common patterns? We need to add a factor of safety for known and unknown risks. We have to consider break points, build in defense systems and contingency plans. We must also simplify and standardize equipment and processes and use checklists to
decrease the likelihood of operator errors.
Seeking Wisdom Page 23