by Gene Kim
Maggie thanks him and Justine, applauding with the crowd. She then calls up Mark, the lead developer for the Parts Unlimited mobile app. He’s a tall man in his mid-thirties. His laptop is so covered in stickers of technologies and vendors that you can’t even tell what kind of laptop it is. “Good morning, and I’d like to just answer the question that you’re probably thinking. The answer is, yes, we’re the team that built the current mobile apps—both of them. We’re not proud, and we’re just glad users can’t rate an app with zero stars.”
People laugh. The Parts Unlimited app has been an embarrassment for years. “There’s so much we wanted to fix, but we were all put on other projects, so until recently, there’s been no full-time developers on the mobile apps. But as Maggie said, that has changed. Mobile is how our customers want to interact with us, so we’ve reconstituted the team, with a persona-driven approach that focuses on what our customers want,” he continues. “We’ve been working closely with the product owners to generate some quick wins and taking full advantage of what the Narwhal team has done.
“We’ve never had access to a store’s inventory levels before. We loved the idea of showing which stores nearest to the customer have a particular part in stock. We can use the geolocation data from the customer’s device, or they can have them put in a US zip code. Here’s what the page looks like now …”
He brings up an iPhone simulator and the Parts Unlimited app on the screen. “Getting inventory information from Narwhal was incredibly easy. So, when we click into the product page, you can see the item availability for all the stores around them. They can now reserve the item, so it’ll be guaranteed to be there for them to pick up, which again was made entirely possible by Narwhal. And now, we’re collecting information on how parts availability affects purchasing so we can compute how much of an effect this has.”
Wow. Maxine is impressed. She hasn’t seen any of this work before, and she loves what they’ve created.
And even though Mark had apologized about the app, Maxine thinks it looks really good. She’s always amazed by how great most mobile applications can look, presenting an incredibly rich amount of information—even the Parts Unlimited app. She’s used to engineering prototypes that she and other developers build, which look more like 1990s-era websites. It’s clear that the mobile app team had professional designers working on it. This polish is something that consumers now demand. If an app looks shabby, they’ll likely won’t use it, let alone open it a second time.
“All these changes have already been pushed out to the app stores. All we need to do to enable it for customers is flip a switch,” he says. “We’re also logging a ton more data back to Narwhal to help the Marketing teams perform experiments. We’re especially interested in what exactly should and shouldn’t be presented to the user in the search results and on the product pages to increase conversion rates. Narwhal performance is awesome—none of it slows down the user experience.”
He continues, “We’ve done hundreds of iterations internally, and we’re ready to use all the user telemetry to perform experiments with real customers. We’ve never been able to do anything like this before. This has been a fantastic experience for me and my team. Keep up the great work!”
Maggie thanks Mark and everyone claps in appreciation, then she turns to address the room again. “You’ve just seen demos of the progress we’re making. All these give us confidence that we’ll be able to execute some very exciting Thanksgiving promotions.
“We spent the month trying to come up with the best promotions, slicing and dicing the data in many different ways,” she continues. “We were able to spin up a bunch of compute resources in the cloud to do the necessary computations. We start the recommendations reporting run every evening and spin up hundreds of compute instances until we’re done, and then we turn them off. We’ve been doing this for the past four days, and it’s working well—really well. Right, Brent? Right, Shannon?”
Brent and Shannon are sitting at the front of the room, and they are beaming. Maxine is delighted that Brent, in particular, is so invested in the outcome. She’s never seen him so happy and having so much fun, which makes her thinks of the Second Ideal. And Shannon is rightly proud of getting Panther off the ground. There is absolutely no way that the teams could have generated these promotions without this new platform.
Panther was already making a huge difference in how teams worked with data. Errors in data uploads were being caught right away through automated tests. The teams could easily access any data from across the organization, and easily add new data, contributing to the entire collective knowledge that could be tapped to experiment and try out new ideas. It’s enabled scores of new reports and analyses to be conducted, leveraging an incredible variety of tools, many that Maxine has never heard of.
And to Maxine’s amazement, even the output of these discoveries and experiments are making it back into the Panther data platform, further enriching the data already there. Seeing and spreading learnings, as per Erik’s Third Ideal, Improvement of Daily Work.
Maggie shows a slide with a bunch of products on it. “These are the Unicorn promotions generated for my customer account. As you can see, it’s looked at my buying history and is letting me know that snow tires and batteries are fifteen percent off. I actually went to our website and purchased both because I need them. The company just made money because those are all items that we have excess inventory of and that have high profit margins.
“And here are the Unicorn promotions for Wes,” she continues, going to the next slide with a smile. “Looks like you got a discount on racing brake pads and fuel additives. That of any interest to you?”
“Not bad,” hollers out Wes.
“Given the incredible success of these initial experiments, here’s my proposal,” Maggie says. “As planned, I’d like to do an email campaign to one percent of our customers to see what happens. If everything goes well, we’ll go full blast on Black Friday.”
Maggie looks at the Ops leadership. “Sounds like a great plan,” Bill says. “Wes, is there any reason why we shouldn’t do this?”
From the front of the room, Wes says, “From an Ops perspective, I can’t think of any. All the hard work has already been done. If Chris, William, and Marketing have confidence that the code is working, I say go for it.”
Maggie cheers and says, “Everyone, we have a plan. Let’s make it happen!”
Maxine is cheering, along with everyone else. Suddenly curious, she looks around—again, Sarah is nowhere to be seen. You’d think she’d want to be here at a time like this, if anything to take all the credit. Her absence is conspicuous. And it makes Maxine nervous.
CHAPTER 15
• Tuesday, November 25
Despite the jubilant mood, everyone knows they’re a long way from being fully prepared for the Black Friday promotions. As Maggie said, the plan is to do a trial run against a small subset of customers to test their readiness for the full-scale campaign on Friday—so at eleven a.m. they will conduct a campaign to just one percent of their customers. They’re doing this in the middle of the day, when everyone will already be in the office and able to quickly respond to emergencies. This will help them find vulnerabilities and weaknesses in the process so they can fix them before Friday.
To Maxine, this decision alone shows how much has changed in the organization. A couple of months ago, they would not have conducted any trials. And they would surely have scheduled the campaign to start at midnight, requiring the teams to be in the office throughout the entire night.
At nine a.m. everyone is in the war room furiously dealing with last-minute details in preparation for the one percent mini-launch. The Orca team is still fine-tuning the customer offers. Maxine is a little alarmed to learn that they’re still deciding which one percent of the customer base they’re targeting—but if they aren’t panicking, she won’t panic either. They’ve earned that level of trust over the last several weeks.
Even though
they’re sending an email to only one percent of their customer mailing list, the stakes are still huge. They’ll be sending nearly one hundred thousand emails to all the persona profiles, not just the Meticulous Maintainers and the Catastrophic Late Maintainers, to learn how each segment responds.
Countless things could still go wrong. If the response rate isn’t in the same ballpark as in their early experiments, all the hopes and dreams riding on the Unicorn Project will be dashed. If they promote the wrong items, or if those items are not in stock, or if they screw up the fulfillment, they will anger their customers.
This campaign represents many firsts for Parts Unlimited. It is the first time that emails will open up the mobile app if they are being read on a mobile phone. It’s the first time they’re presenting promotions through the app—people with the app installed will get a notification about this limited-time offer, which the Promotions team believes will have higher response rates than even their carefully designed emails.
Over the last week, they’ve been continuously performing experiments in their mobile app, zeroing in on what maximizes conversion rates, such as presenting promoted items differently, using different pictures, picture sizes, typography, and copywriting. Those lessons and learnings were then considered for the email campaigns too.
The results of all these experiments were being poured back into Panther to guide the next round of experiments and trials, along with all customer activity within the app. It was a lot of data, but it had the Analytics team salivating for more. People’s appreciation for the Panther data platform kept growing.
The mobile app team has also been working around the clock to make sure that things display properly and that the buttons actually do what they are supposed to, but they are also trying to streamline the purchasing process as much as possible. Noticing that many customers dropped off when prompted for a credit card, they licensed some technology to input this information by using their phone camera and offering different payment options like PayPal and Apple Pay in the hopes that one of these might reduce order abandonment rates.
The big gamble is that all this investment in their mobile app will result in significantly higher sales than just using the mobile phone browser. It’s a gamble, but a well-informed gamble, made by an organization that is obviously and constantly learning.
But preparation and practice time is over; now it’s game time, Maxine thinks. She sees many of the technology teams starting to assemble, but the Narwhal data team is already huddled around their screens, going through checklists and whispering back and forth, making sure that everything can handle the traffic they’re expecting. Over the last week, Brent and his team have been stress-testing the entire system, routinely causing parts of the technology landscape to blow up. And then in a blameless post-mortem, they’d all work together to figure out how to fix things so that they’ll survive the actual launch.
The results of these “Chaos Engineering” exercises resulted in some surprising things breaking. But everyone has been working diligently, trying to ensure that they are as prepared as they can be for the big launch event. A few days ago, a small test run of the offer generation process kept crashing because they didn’t increase the limits for an external service they used. They had gotten in the habit of scaling everything down to save on costs, and someone had forgotten to scale it up before the test.
We still have so much to learn before we’re experts at this, Maxine thinks.
At times, it’s difficult to know who works on which team, because people are moving so fluidly between them. When everyone knows what the goals are, as Erik predicted, teams will self-organize to best achieve those goals. To Maxine, it’s been amazing to see how people are acting and reacting to each other, especially when compared to the big Phoenix launch two months ago. People across different disciplines—Dev, QA, Ops, Security, and now even Data and Analytics—are working together daily as fellow teammates instead of adversaries. They are working toward a common goal. They realize that they are on a journey of learning and exploration, and that making mistakes is inevitable. Creating ever-safer systems and continual improvement is now viewed as part of daily work.
This is worthy of the Third Ideal of Improvement of Daily Work that Erik painted many weeks ago, Maxine thinks.
Thanks to the pioneering work by Data Hub, code is now being promoted into production multiple times per day, smoothly, quickly, and mostly without incident, with any issues being resolved quickly and without blame or undue crisis. Even now, Maxine sees that there are production deployments happening, as teams are pushing out last-minute changes to ensure the success of the mini-launch.
Twenty minutes ago, someone noticed that one API was returning a bunch of “500” HTTP errors. Apparently, yesterday, someone had committed a code change that accidentally misclassified “400” user-caused errors as “500” server-caused errors. Wes pulled together a huddle, and Maxine was astonished when Wes recommended pushing out a fix, even though it was less than an hour before the mini-launch.
“If we don’t fix it, these errors could potentially hide important signals if we have a real outage,” he said. “We’ve proven repeatedly that we can push out these one-line changes safely.”
The best part was it was a developer who detected the error and who pushed out the fix. We finally trust developers, she thinks. If someone had told her a month ago that Wes would support something like this, she would have never believed it.
And best of all, Maxine’s worst fears about developers going amok and ruining the integrity of the data in the Narwhal platform never materialized. Left to their own devices, development teams will often optimize everything around themselves. This is just the parochial and selfish nature of individual teams. And that’s why you need architects, thinks Maxine.
Because they provided access to the data through versioned APIs, things remained very controlled and teams were able to keep working independently. Maxine is not just relieved—she’s elated. They designed these platforms to optimize for the system as a whole and ensure the safety and security of the entire organization.
“Sending emails and notifications to the mobile apps in 3, 2, 1 … now. Here we go, everyone,” says the marketing launch coordinator in a calm voice. Maxine looks at her watch. It’s 11:12 a.m. Emails and mobile app notifications are now going out to a hundred thousand customers.
The launch is starting twelve minutes late because of a couple unforeseen issues—a configuration problem was found in the Narwhal systems and someone noticed that there were too many email addresses in the campaign, requiring a recalculation and regeneration of the email list in Panther. Maxine gave a thumbs-up to Shannon when the Unicorn teams quickly generated and uploaded the new data in record time.
On the one hand, Maxine is mildly irritated that these details were caught so late in the launch process. But on the other hand, that’s what rehearsals are for and why everyone is assembled in the war room. Everyone needed to make these types of last-minute calls are in the room and everyone agreed that it made sense. Maggie, Kurt, the team leads, and many others are assembled here, as well as Wes and key Ops people.
Maxine looks around. Again, Sarah is nowhere to be seen. Maxine wonders if she’s the only one who is suspicious that Sarah might up to no good.
She turns her attention to what everyone else in the room is watching—the large monitor hanging on the wall. Everyone is holding their breath. On the screen are a bunch of graphs, dominated by the number of emails sent and the order funnel: this shows how many people viewed a product page, added a product to the shopping cart, hit the checkout button, had their order processed, and had their order fulfilled. The bottom shows where the most drop-offs are occurring, as well as the number of orders and revenue booked.
Underneath those graphs are the technical performance metrics: CPU loads for all the various compute clusters, number of transactions being processed by the services and databases, network traffic, and much, much more.
r /> She could see several spikes associated with the massive calculations enabled by Panther. But now, most of the graphs are at zero. Several of the CPU graphs are at twenty percent. Those are the services that need to stay warm to make sure they don’t go to sleep. In one of their launch rehearsals, they were horrified when this happened to a key system, requiring six minutes for the system to wake up and scale out.
Nothing happens. One minute goes by. Another minute goes by. Maxine is starting to get worried that the launch was a complete dud. Maybe something terrible has happened in their infrastructure. Or maybe something terrible has happened that prevented the emails from being received. Or maybe their worst fears about terrible recommendations had come true, and they had accidentally sent offers of snow tires to people who don’t live near snow.
Maxine audibly sighs in relief when the graph for product page views suddenly jumps to ten, twenty, fifty … and keeps going up.
Everyone cheers, including Maxine. She is staring at the technical metrics, praying that the infrastructure doesn’t fall over like during the Phoenix release. She’s relieved when the CPU loads are starting to climb across the board, showing that things are actually being processed.
Minutes later, almost five thousand people are in various stages of the order funnel. So far, so good, Maxine thinks, watching the numbers continue to creep up. Again, people cheer as the number of processed orders continues to climb … Ten orders completed, then twenty, and it still continues to climb. To her excitement, the revenue generated from this campaign surges past $1,000.