Book Read Free

B00AZRBLHO EBOK

Page 13

by Kim, Gene


  I groan, thinking about the wasted time and effort. I keep listening as Patty continues, “Someone else said that she couldn’t implement her change because there was an outage in progress. And a bunch of other people said, um…”

  She looks uncomfortable, so I prompt her to continue. “Well, they said they needed Brent for a portion of their changes, and he wasn’t available,” she says reluctantly. “In some cases, Brent’s involvement was planned. But in other cases, they discovered they needed his help only after they started implementing and had to abort when Brent wasn’t available.”

  Before Patty is even finished speaking, I’m seeing red.

  “Goddamnit! Brent again? What is going on? Just how has Brent managed to wedge himself into everyone’s path?

  “Oh, shit!” I exclaim when it hits me what’s happening. “Did we create this problem by focusing Brent solely on Phoenix? Is this new policy a mistake?”

  She says after a moment, “You know, that’s an interesting question. If you genuinely believe that Brent should only be working on the most important projects, then I think the new policy is correct, and we shouldn’t change it back.

  “I think it’s also important to note that until recently, Brent was helping people implement their changes, without that dependency recorded anywhere. Or rather, he’d try to. But he’d invariably be too busy to help everyone, so many of these changes wouldn’t have been completed, even in the old way.”

  I pick up my phone and speed-dial Wes, telling him to join us.

  When he arrives a couple of moments later, he takes a seat and then looks at my old laptop, saying, “Jeez. You still carrying that thing around? I’m sure we have a couple of newer eight-year-old laptops that you could use.”

  Ignoring his comment, Patty quickly brings him up to speed. His reaction to her revelation isn’t much different than mine.

  “You’ve got to be kidding me!” he says angrily, slapping his palm on his forehead. “Maybe we should allow Brent to help people make changes?”

  I quickly say, “No, that can’t be the answer. I suggested that, too. But Patty pointed out that this would imply that the blocked changes are more important than Phoenix. Which they aren’t.”

  I think aloud, “Somehow, just like we’re breaking the habits of people asking Brent to help with break-fix work, we need to do the same with change implementation. We’ve got to get all this knowledge into the hands of people actually doing the work. If they can’t grok it, then maybe we have a skills problem in those teams.”

  When no one says anything, I tentatively add, “How about we take those same level 3 engineers that are dedicated to protect Brent from break-fix to help with these change issues?”

  Wes quickly responds, “Maybe. But it’s not a long-term fix. We need the people doing the work to know what the hell they’re doing, not enable more people to hoard knowledge.”

  I listen to Wes and Patty brainstorm ideas to reduce yet another dependency on Brent when something starts to bother me. Erik called WIP, or work in process, the “silent killer,” and that inability to control WIP on the plant floor was one of the root causes for chronic due-date problems and quality issues.

  We just discovered that sixty percent of our changes didn’t complete as scheduled.

  Erik had pointed to the ever-growing mountain of work on the plant floor as an indication that the plant floor managers had failed to control their work in process.

  I look at the mountain of change cards piled up on today’s date on the calendar, as if a giant snowplow had pushed them all forward. Suddenly, it’s starting to seem like the picture Erik painted on the plant floor eerily describes the state of my organization.

  Can IT work really be compared to work on a plant floor?

  Patty interrupts my deep contemplation as she asks, “What do you think?”

  I look back up at her. “For the last couple of days, only forty percent of the scheduled changes were completed. The rest are being carried forward. Let’s assume that this continues for a bit longer, while we figure out how to disseminate all the Brent knowledge.

  “We have 240 incomplete changes this week. If we have four hundred new changes coming in next week, we’ll have 640 changes on the schedule next week!

  “We’re like the Bates Motel of changes,” I say in disbelief. “Changes go in but never come out. Within a month, we’ll have thousands of changes that we’ll be carrying around, all competing to get implemented.”

  Patty nods, “That’s exactly what’s bothering me. We don’t have to wait a month to see thousands of changes—we’re already tracking 942 changes. We’ll cross over one thousand pending changes sometime next week. We’re running short of space to post and store these change cards. So why are we going through all this trouble if the changes aren’t even going to get implemented!”

  I stare at all the cards, willing them into giving me an answer.

  An ever-growing pile of inventory trapped on the plant floor, as high as the forklifts could stack it.

  An ever-growing pile of changes trapped inside of IT Operations, with us running out of space to post the change cards.

  Work piling up in front of the heat treat oven, because of Mark sitting at the job release desk releasing work.

  Work piling up in front of Brent, because of…

  Because of what?

  Okay, if Brent is our heat treat oven, then who is our Mark? Who authorized all this work to be put in the system?

  Well, we did. Or rather, the CAB did.

  Crap. Does that mean we did this to ourselves?

  But changes need to get done, right? That’s why they’re changes. Besides, how do you say no to the onslaught of incoming work?

  Looking at the cards piling up, can we afford not to?

  But when was the question ever asked whether we should accept the work? And on what basis did we ever make that decision?

  Again, I don’t know the answer. But, worse, I have a feeling that Erik may not be a raving madman. Maybe he’s right. Maybe there is some sort of link between plant floor management and IT Operations. Maybe plant floor management and IT Operations actually have similar challenges and problems.

  I stand up and walk to the change board. I start thinking aloud, “Patty is alarmed that more than half our changes aren’t completing as scheduled, to the extent that she’s wondering whether this whole change process is worth the time we’re investing in it.

  “Furthermore,” I continue, “she points out that a significant portion of the changes can’t complete because Brent is somehow in the way, which is partially because we’ve directed Brent to reject all non-Phoenix work. We think that reversing this policy is the wrong thing to do.”

  I take a mental leap, following my intuition. “And I’d bet a million dollars that this is the exact wrong thing to do. It’s because of this process that, for the first time, we’re even aware of how much scheduled work isn’t getting done! Getting rid of the process would just kill our situational awareness.”

  Feeling like I’m getting on a roll, I say adamantly, “Patty, we need a better understanding of what work is going to be heading Brent’s way. We need to know which change cards involve Brent—maybe we even make that another piece of information required when people submit their cards. Or use a different color card—you figure it out. You need to inventory what changes need anything from Brent, and try to satisfy it instead with the level 3 engineers. Failing that, try to get them prioritized so we can triage them with Brent.”

  As I’m talking, I’m more confident that we’re heading down the right path. At this point, we might not be fixing the problem, but at least we’ll be getting some data.

  Patty nods, her expression of concern and despair now gone. “You want me to get my arms around the changes that are heading to Brent, indicating them on the change cards and maybe even requiring this information on all new cards. And to get
back to you when we know how many changes are Brent-bound, what the changes are, and so forth, along with a sense of what the priorities are. Did I get that right?”

  I nod and smile.

  She types away on her laptop. “Okay, I’ve got it. I’m not sure what we’ll find out, but it’s better than anything I came up with by a long shot.”

  I look over at Wes, “You look concerned—anything you want to share?”

  “Uh…” Wes says eventually. “There’s not much to share, really. Except that this is a very different way of working than anything I’ve seen in IT. No offense, but did you switch medication recently?”

  I smile wanly, “No, but I did have a conversation with a raving madman on a catwalk overlooking the manufacturing plant floor.”

  But if Erik was right about WIP in IT Operations, what else was he right about?

  Chapter 12

  • Friday, September 12

  It’s 7:30 p.m. on Friday, two hours after the Phoenix deployment was scheduled to start. And things are not going well. I’m starting to associate the smell of pizza with the futility of a death march.

  The entire IT Operations team was assembled in preparation for the deployment at 4 p.m. But there was nothing to do because we hadn’t received anything from Chris’ team; they were still making last minute changes.

  It’s not a good sign when they’re still attaching parts to the space shuttle at liftoff time.

  At 4:30 p.m., William had stormed into the Phoenix war room, livid and disgusted that no one could get all of the Phoenix code to run in the test environment. Worse, the few parts of Phoenix that were running were failing critical tests.

  William started sending back critical bug reports to the developers, many of whom had already gone home for the day. Chris had to call them back in, and William’s team had to wait for the developers to send them new versions.

  My team wasn’t just sitting around, twiddling our thumbs. Instead, we were frantically working with William’s team to try to get all of Phoenix to come up in the test environment. Because if they couldn’t get things running in a test environment, we wouldn’t have a prayer of being able to deploy and run it in production.

  My gaze shifts from the clock to the conference table. Brent and three other engineers are huddling with their QA counterparts. They’ve been working frantically since 4 p.m., and they already look haggard. Many have laptops open to Google searches, and others are systematically fiddling with settings for the servers, operating systems, databases, and the Phoenix application, trying to figure out how to bring everything up, which the developers had assured them was possible.

  One of the developers had actually walked in a couple of minutes ago and said, “Look, it’s running on my laptop. How hard can it be?”

  Wes started swearing, while two of our engineers and three of William’s engineers started poring through the developer’s laptop, trying to figure out what made it different from the test environment.

  In another area of the room, an engineer is talking heatedly to somebody on the phone, “Yes, we copied the file that you gave us… Yes, it’s version 1.0.13… What do you mean it’s the wrong version… What? When did you change that?… Copy it now and try again… Okay, look, but I’m telling you this isn’t going to work… I think it’s a networking problem… What do you mean we need to open up a firewall port? Why the hell didn’t you tell us this two hours ago? Goddamnit!”

  He then slams the phone down hard, and then pounds the table with his fist, yelling, “Fucking developers!”

  Brent looks up from the developer laptop, rubbing his eyes with fatigue. “Let me guess. The front-end can’t talk to the database server because someone didn’t tell us we need to open a firewall port?”

  The engineer nods with exhausted fury, and says, “I cannot freaking believe this. I was on the phone with that jackass for twenty minutes, and it never occurred to him that it wasn’t a code problem. This is FUBAR.”

  I continue to listen quietly, but I’m nodding in agreement at his prognosis. In the Marines, we used the term FUBAR, meaning “fucked up beyond all recognition.”

  Watching tempers fray, I look at my watch: 7:37 p.m.

  It’s time to get a management gut check from my team. I round up Wes and Patty and look around for William. I find him staring over the shoulder of one of his engineers. I ask him to join us.

  He looks puzzled for a moment, because we don’t normally interact, but he nods and follows us to my office.

  * * *

  “Okay, guys, tell me what you think of this situation,” I ask.

  Wes speaks up first, “Those guys are right. This is FUBAR. We’re still getting incomplete releases from the developers. In the past two hours, I’ve already seen two instances when they’ve forgotten to give us several critical files, which guaranteed that the code wouldn’t run. And as you’ve seen, we still don’t know how to configure the test environment so that Phoenix actually comes up cleanly.”

  He shakes his head again. “Based on what I’ve seen in the last half hour, I think we’ve actually moved backward.”

  Patty just shakes her head with disgust and waves her hand, adding nothing.

  I say to William, “I know we haven’t worked much together, but I’d really like to know what you think. How’s it looking from your perspective?”

  He looks down, exhaling slowly and then says, “I honestly have no idea. The code is changing so fast that we’re having problems keeping up. If I were a betting man, I’d say Phoenix is going to blow up in production. I’ve talked with Chris a couple of times about stopping the release, but he and Sarah ran right over me.”

  I ask him, “What do you mean by you ‘can’t keep up’?”

  “When we find problems in our testing, we send it back to Development to have them fix it,” he explains. “Then they’ll send back a new release. The problem is that it takes about a half hour to get everything set up and running, and then another three hours to execute the smoke test. In that time, we’ll have probably gotten three more releases from Development.”

  I smirk at the reference to smoke tests, a term circuit designers use. The saying goes, “If you turn the circuit board on and no smoke comes out, it’ll probably work.”

  He shakes his head and says, “We have yet to make it through the smoke test. I’m concerned that we no longer have sufficient version control—we’ve gotten so sloppy about keeping track of version numbers of the entire release. Each time they fix something, they’re usually breaking something else. So, they’re sending single files over instead of the entire package.”

  He continues, “It’s so chaotic right now that even if by some miracle Phoenix does pass the smoke test, I’m pretty sure we wouldn’t be able to replicate it, because there are too many moving parts.”

  Taking off his glasses, he says with finality, “This is probably going to be an all-nighter for everyone. I think there’s genuine risk that we won’t have anything up and running at 8 a.m. tomorrow, when the stores open. And that’s a big problem.”

  That is a huge understatement. If the release isn’t finished by 8 a.m., the point of sale systems in the stores used to check out customers won’t work. And that means we won’t be able to complete customer transactions.

  Wes is nodding. “William is right. We’re definitely going to be here all night. And performance is worse than even I thought it would be. We’re going to need at least another twenty servers to spread the load, and I don’t know where we can find so many on such short notice. I have some people scrambling to find any spare hardware. Maybe we’ll even have to cannibalize servers in production.”

  “Is it too late to stop the deployment?” I ask. “When exactly is the point of no return?”

  “That’s a very good question.” Wes answers slowly. “I’d have to check with Brent, but I think we could stop the deployment now with no issues. But when we start conv
erting the database so it can take orders from both the in-store POS systems and Phoenix, we are committed. At this rate, I don’t think that will be for a couple of hours yet.”

  I nod. I’ve heard what I’ve needed to hear.

  “Guys, I’m going to send out an e-mail to Steve, Chris, and Sarah to see if I can delay the deployment. And then I’m going to find Steve. Maybe I can get us one more week. But, hell, even getting one more day would be a win. Any thoughts?”

  Wes, Patty, and William all just shake their heads glumly, saying nothing.

  I turn to Patty. “Go work with William to figure out how we can get some better traffic coordination in the releases. Get over to where the developers are and play air traffic controller, and make sure everything is labeled and versioned on their side. And then let Wes and team know what’s coming over. We need better visibility and someone to keep people following process over there. I want a single entry point for code drops, controlled hourly releases, documentation… Get my drift?”

  She says, “It would be my pleasure. I’ll head up to the Phoenix war room for starters. I’ll kick down the door if that’s what it takes and say, ‘We’re here to help…’”

  I give them all a nod of thanks and head to my laptop to write my e-mail.

  From: Bill Palmer

  To: Steve Masters

  Cc: Chris Anderson, Wes Davis, Patty McKee, Sarah Moulton, William Mason

  Date: September 12, 7:45 PM

  Priority: Highest

  Subject: URGENT: Phoenix deployment in major trouble—my recommendation: 1 week delay

  Steve,

  First off, let me state that I want Phoenix in production as much as anyone else. I understand how important it is to the company.

  However, based on what I’ve seen, I believe we will not have Phoenix up by the tomorrow 8 AM deadline. There is SIGNIFICANT RISK that this may even impact the in-store POS systems.

 

‹ Prev