The Phoenix Project
Page 24
“I’m experimenting with putting kanbans around our key resources. Any activities they work on must go through the kanban. Not by e-mail, instant message, telephone, or whatever.
“If it’s not on the kanban board, it won’t get done,” she says. “And more importantly, if it is on the kanban board, it will get done quickly. You’d be amazed at how fast work is getting completed, because we’re limiting the work in process. Based on our experiments so far, I think we’re going to be able to predict lead times for work and get faster throughput than ever.”
That Patty is now sounding a bit like Erik is both unsettling and exciting.
“What I’ve done,” she continues, “is take some of our most frequent service requests, documented exactly what the steps are and what resources can execute them, and timed how long each operation takes. Here’s the result.”
She hands me a piece of paper proudly.
It’s titled, “Laptop replacement queue.” On it is a list of everyone who’s requested either a new or replacement laptop or desktop along with when they submitted the request and the projected date they’ll receive it. They’re sorted by the oldest requests first.
I’m apparently fourteenth in line, with my laptop projected to arrive four days from now.
“You actually believe this schedule?” I say, trying to be skeptical. However, it really would be fantastic if we could actually publish this to everyone, and be able to hit those dates.
“We worked on this all weekend long,” she replies. “Based on the trials we’ve done since Friday, we’re pretty confident that we understand the time required go from start to finish. We’ve even figured out how to save a bunch of steps by changing where we’re doing disk mirroring. Between you and me, based on the time savings we’re generating, I think that we’ll beat these dates.”
She shakes her head. “You know, I did a quick poll of people we’ve issued laptops to. It usually takes fifteen turns to finally get them configured correctly. I’m tracking that now, and trying to drive this down to three. We’re putting in checklists everywhere, especially when we do handoffs within the team. It’s really making a difference. Error rates are way down.”
I smile and say, “This is important. Getting executives and workers the tools they need to do their jobs is one of our primary responsibilities. I’m not saying I don’t believe you, but let’s keep these time estimates to ourselves for now. If you can generate a week’s track record of hitting the dates, then let’s start publishing this to all the requesters and their managers, okay?”
Patty smiles in return. “I was thinking the same thing. Imagine what this will do to user satisfaction if we could tell them when they make the request how long the queue is, tell them to the day when they’ll get it, and actually hit the date, because we’re not letting our workers multitask or get interrupted!
“My plant supervisor friend also told me about the Improvement Kata they’ve adopted. Believe it or not, Erik helped them institute it many years ago. They have continual two-week improvement cycles, each requiring them to implement one small Plan-Do-Check-Act project to keep them marching toward the goal. You don’t mind that I’ve taken the liberty of adopting this practice in our group to keep us moving toward our own goals, right?”
Erik had mentioned this kata term and the continual two-week improvement cycles before. Once again, Patty is at least one step ahead of me.
“This is great work, Patty. Really, really well done.”
“Thanks,” she modestly responds, but she’s grinning from ear to ear. “I’m really excited by what I’m learning. For the first time, I’m seeing how we should be managing our work, and even for these simpler service desk tasks, I know it’s going to make a big difference.”
She points at the change board at the front of the room. “What I’m really looking forward to is to start using these techniques for more complex work. Once we figure out what our most frequently recurring tasks are, we need to create work centers and lanes of work, just like I did for my service requests. Maybe we can even get rid of some of this scheduling, and create kanban boards instead. Our engineers could then take any card from the Ready column, move them to Doing, until they’re Done!”
Unfortunately, I can’t visualize it. “Keep going. Just make sure you’re working with Wes on this, and that he’s onboard, okay?”
“Already on it,” she replies quickly. “In fact, I have a meeting with him later today to discuss putting a kanban around Brent, to further isolate him from our daily crises. I want to formalize how Brent gets work and increase our ability to standardize what he’s working on. It’ll give us a way to figure out where all of Brent’s work comes from, both on the upstream and downstream sides. And of course, it will give us one more line of defense from people doing drive-bys on Brent.”
I give her a thumbs-up, and get ready to leave. “Wait, the change board looks different. Why are the cards different colors?”
She looks at the board and says, “Oh, I haven’t told you? We’re color-coding the cards to help us get ready for when we lift the project freeze. We’ve got to have some way to make sure we’re working on the most important things. So, the purple cards are the changes supporting one of the top five business projects, otherwise, they’re yellow. The green cards are for internal it improvement projects, and we’re experimenting with allocating twenty percent of our cycles just for those, as Erik recommended we do. At a glance, we can confirm that there’s the right balance of purple and green cards in work.”
She continues, “The pink sticky notes indicate the cards that are blocked somehow, which we’re therefore reviewing twice a day. We’re also putting all these cards back into our change tracking tool, so we’re putting the change ids on each of the cards, too. It’s a bit tedious, but at least now part of the tracking is automated.”
“Wow, that’s…incredible,” I say, with genuine awe.
Later that day, I’m sitting down at another conference table with Wes and Patty to figure out how we’re going to turn the project faucet back on slowly enough so we can drink but don’t end up drowning.
“As Erik pointed out, we actually have two project queues that we need to sequence: business and internal projects,” Patty says, pointing to the thin stapled set of papers in front of us. “Let’s do the business projects first, because they’re easier. We have the top five most important projects identified, as ranked by all the project sponsors. Four of these will require some work from Brent. When the freeze lifts, we propose that we only release these five projects.”
“That was easy,” Wes laughs. “I can’t believe how much arguing, posturing, horse-trading, and backstabbing went on to get the top five projects identified. It was worse than Chicago politics!”
He’s right. But in the end, we got our prioritized list.
“Now to the hard part. We’re still struggling on how to prioritize our own seventy-three internal projects,” she says, her expression turning glum. “There’s still way too many. We’ve spent weeks with all the team leads trying to establish some sort of relative importance level, but that’s all we’ve done. Argue.”
She flips to the second page. “The projects seem to fall into the following categories: replacing fragile infrastructure, vendor upgrades, or supporting some internal business requirement. The rest are a hodgepodge of audit and security work, data center upgrade work, and so forth.”
I look at the second list, scratching my head. Patty is right. How does one objectively decide whether “consolidating and upgrading e-mail server” is more or less important than “upgrading thirty-five instances of sql databases”?
I run my fingers down the page, trying to see if anything jumps out at me. It’s the same list I saw during my first week on the job, and they still all look important.
Realizing that Wes and Patty have spent almost a week with this list, I try to elevate my thinking. There’s got to be some simple way to prioritize this list that doesn’t look like moving
a bunch of boxes around.
Suddenly, I remember how Erik described the importance of preventive work, such as the monitoring project. I say, “I don’t care how important everyone thinks their project is. We need to know whether it increases our capacity at our constraint, which is still Brent. Unless the project reduces his workload or enables someone else to take it over, maybe we shouldn’t even be doing it. On the other hand, if a project doesn’t even require Brent, there’s no reason we shouldn’t just do it.”
I say assertively, “Give me three lists. One that requires Brent work, one that increases Brent’s throughput, and the last one is everything else. Identify the top projects on each list. Don’t spend too much time ordering them—I don’t want us spending days arguing. The most important list is the second one. We need to keep Brent’s capacity up by reducing the amount of unplanned work that hits him.”
“That sounds familiar,” Patty says. She digs up the list of fragile services that we created for the change management process. “We should make sure we have a project to replace or stabilize each one of these. And maybe we suspend indefinitely any infrastructure refresh project for anything that’s not fragile.”
“Now hang on a minute,” Wes says. “Bill, you said it yourself. Preventive work is important, but it always gets deferred. We’ve been trying to do some of these projects for years! This is our chance to get caught up.”
Patty says quickly, “Didn’t you hear what Erik told Bill? Improving something anywhere not at the constraint is an illusion. You know, no offense, but you sort of sound like John right now.”
Despite my best attempts, I still laugh.
Wes turns red for a moment, and then laughs loudly. “Ouch. Okay, you got me. But I’m just trying to do the right thing.”
“Doh!” he says, interrupting himself. “I did it again.”
We all laugh. It makes me wonder how John is doing. To the best of my knowledge, no one has seen him all day.
While Wes and Patty are scribbling notes, I scan the list of internal projects again. “Hey, why is there a project for upgrading the bart database even though it’s going to be decommissioned next year?”
Patty peers down at her list and then looks embarrassed. “Oh, jeez. I didn’t see that because we never reconciled the business and it projects with each other. We’re going to have to scrub the lists one more time to find dependencies like this. I’m sure there are others.”
Patty thinks for a moment, “It’s strange. Even though we have so much data on projects, changes, and tickets, we’ve never organized and linked them all together this way before.
“Here’s another thing we can learn from manufacturing, I think,” she continues. “We’re doing what Manufacturing Production Control Departments do. They’re the people that schedule and oversee all of production to ensure they can meet customer demand. When they accept an order, they confirm there’s enough capacity and necessary inputs at each required work center, expediting work when necessary. They work with the sales manager and plant manager to build a production schedule so they can deliver on all their commitments.”
Again, Patty is way ahead of me. This answers one of the first questions that Erik tasked me with before I quit. I make a note for us to visit mrp-8 to see their production control processes.
I get a creeping suspicion that “managing the it Operations production schedule” should be somewhere in my job description.
Two days later, I’m surprised to see a new laptop in my office. My old laptop has been disconnected and moved to the side.
I look at my clipboard, flipping back to the laptop/desktop replacement schedule that Patty gave me earlier this week.
Holy crap.
Patty had promised laptop delivery for Friday, and I’m receiving it two days early.
I log on to make sure it’s been configured properly. All the applications seem to be there, all my data have been transferred, e-mail is working, the network drives show up like before, and I can install new applications.
I feel tears of gratitude welling up when I see how fast my new laptop is. Grabbing Patty’s schedule, I go next door. “I love the new laptop. Two days ahead of schedule, even. Everyone ahead of me got their systems, too, right?”
Patty grins. “Yep. Every single one of them. A couple of the early ones we delivered had a few configuration errors or were missing something. We’ve corrected it in the work instructions, and we seem to be batting one hundred percent delivering correct systems for the past two days.”
“Great work, Patty!” I say, excitedly. “Go ahead and start publishing the schedule. I want to start showing this off!”
CHAPTER 23
• Tuesday, October 7
As I drive into work the following Tuesday morning, I get an urgent phone call from Kirsten. Apparently, Brent is now almost a week late delivering on another Phoenix task—allegedly something that Brent said would only take an hour to do. Once again, the entire Phoenix testing schedule is in jeopardy.
On top of that, several other of my group’s critical tasks are late, putting even more pressure on the deadline. This is genuinely dispiriting to hear. I thought all our recent breakthroughs would solve these due-date performance issues.
How can we unfreeze more work if we can’t even keep up now?
I leave Patty a voicemail. To my surprise, it takes her three hours to call me back. She tells me that something is going terribly wrong with our scheduling estimates and that we need to meet right away.
Once again, I’m in a conference room, with Patty at the whiteboard, and Wes scrutinizing the printouts she’s taped up.
“Here’s what I’ve learned so far,” Patty says, pointing at one of the sheets of paper. “The task that Kirsten called about is delivering a test environment to qa. As she said, Brent estimated that it would take only forty-five minutes.”
“Sounds about right,” Wes says. “You just need to create a new virtualized server and then install the os and a couple of packages on it. He probably even doubled the time estimate to be safe.”
“That’s what I thought, too,” Patty said, but she’s shaking her head. “Except it’s not just one task. What Brent signed up for is more like a small project—there’s over twenty steps involving at least six different teams! You need the os and all the software packages, license keys, dedicated ip address, special user accounts set up, mount points configured, and then you need the ip addresses to be added to an acl list on some file server. In this particular case, the requirements say that we need a physical server, so we also need a router port, cabling, and a server rack where we have enough space.”
“Oh…,” Wes says, sounding exasperated, reading what Patty is pointing at. He mumbles, “Physical servers are such a pain in the ass.”
“You’re missing the point. This would still be happening, even if it were virtualized,” Patty says. “First, Brent’s ‘task’ turns out to be considerably more than just a task. Second, we’re finding that it’s multiple tasks spanning multiple people, each of whom have their own urgent work to do. We’re losing days at each handoff. At this rate, without some dramatic intervention, it’ll be weeks before qa gets what they need.”
“At least we don’t need a firewall change,” Wes says, snidely. “Last time we needed one of those, it took John’s group almost a month. Four weeks for a thirty-second change!”
I nod, knowing exactly what Wes is referring to. The lead time for firewall changes has become legendary.
Wait. Didn’t Erik mention something like this? For a firewall change, even though the work only required thirty seconds of touch time, it still took four weeks of clock time.
That’s just a microcosm of what’s happening with Brent. But what’s happening to us right now is much, much worse, because there are handoffs.
With a groan, I put my head on the conference table.
“You okay?” Patty asks.
“Give me a second,” I say. I walk up to the whiteboard and struggle
to draw a graph with one of the markers. After a couple of tries, I end up with a graph that looks like this:
I tell them what Erik told me at mrp-8, about how wait times depend upon resource utilization. “The wait time is the ‘percentage of time busy’ divided by the ‘percentage of time idle.’ In other words, if a resource is fifty percent busy, then it’s fifty percent idle. The wait time is fifty percent divided by fifty percent, so one unit of time. Let’s call it one hour. So, on average, our task would wait in the queue for one hour before it gets worked.
“On the other hand, if a resource is ninety percent busy, the wait time is ‘ninety percent divided by ten percent’, or nine hours. In other words, our task would wait in queue nine times longer than if the resource were fifty percent idle.”
I conclude, “So, for the Phoenix task, assuming we have seven handoffs, and that each of those resources is busy ninety percent of the time, the tasks would spend in queue a total of nine hours times the seven steps…”
“What? Sixty-three hours, just in queue time?” Wes says, incredulously. “That’s impossible!”
Patty says with a smirk, “Oh, of course. Because it’s only thirty seconds of typing, right?”
“Oh, shit,” Wes says, staring at the graph.
Suddenly, I recall my conversation with Wes right before Sarah and Chris decided to deploy Phoenix at Kirsten’s meeting. Wes complained about tickets related to Phoenix bouncing around for weeks, which delayed the deployment.
It was happening then, too. That wasn’t a handoff between it Operations people. That was a handoff between the Development and it Operations organization, which is far more complex.
Creating and prioritizing work inside a department is hard. Managing work among departments must be at least ten times more difficult.
Patty says, “What that graph says is that everyone needs idle time, or slack time. If no one has slack time, wip gets stuck in the system. Or more specifically, stuck in queues, just waiting.”