by Jim DuBois
39
Myth: public cloud is less secure.
This was once true. Today it is easy to show that public clouds are at least as secure as what any company can do efficiently alone. Consider the significant additional sensors and the benefits from analyzing security events across millions of customers using huge investments in data science to improve security. No company can keep up alone. Build a shared responsibility model with cloud providers around security. You are still accountable, and own some tasks, but share many tasks with capable cloud providers.
40
Myth: public cloud is more expensive.
There are many who still believe this. When I dig in, I always find they are not looking at total cost. Include all costs, not just what is in your budget. Stop analyzing incremental costs such as virtual machine (VM) rates. Instead, change to a cloud perspective. Save more by eliminating the need for VMs rather than reducing their rate. Leverage cloud capabilities to materially reduce total costs while dramatically accelerating delivery.
Chapter Five
Move to the Public Cloud Faster
When Satya took over running Microsoft’s cloud business a couple of years before he was promoted to CEO, he implemented some significant changes as he discusses in his book, Hit Refresh. One thing he wanted was better direct customer feedback. As a proxy, he asked me to attend his weekly product review meetings and play the role of enterprise customer. This meant expanding my connections with my IT peers at our customers around the world, and driving our own internal deployments faster so we could provide the feedback he wanted. As a very early adopter of the public cloud, we made a lot of mistakes. There were few other companies to learn from deeply. We started to go before the Microsoft cloud was ready. In hindsight, that push helped make it ready.
We made a lot of assumptions in the beginning, and only some of them held true over time. By pushing the edge, we made progress but also stubbed our toes and learned a ton. In the summer of 2016, when we were 60 percent migrated, I remember asking Rick Stover, who led our shared infrastructure services team, “We have a schedule to migrate faster this year, but didn’t we already do the easy stuff? Isn’t the rest going to be more difficult?” He reminded me that everything didn’t work back when we started. He said, “We do have the hard stuff left, but we’ll go faster now that we know what we’re doing.” And he was right. The migration accelerated, getting to over 90 percent in less than a year, and beating both our aggressive schedule and our cost savings goals.
A good example of our learning was around pre-production environments. Early on, when the team told me that we were closing an old data center housing only test environments. I thought I was brilliant by insisting we move it all to Azure instead of keeping it on premise in space they’d allocated in one of our new data centers. We moved it fast, closed the data center and saved a lot of money. Then we discovered that we’d moved things that didn’t need to be moved. Our costs were higher than expected. We turned off what wasn’t being used to cut the waste. Then we saved even more by recognizing that test environments weren’t used around the clock. We leveraged the snooze feature in Azure to automate whole environments, so we could turn them off in the evening, and turn them back on the next morning. We cut our costs by a third. This was only the start until we learned more.
In trying to get the incentives right, I moved accountability for the Azure spend to the application teams that owned the services rather than keeping the accountability with the central infrastructure team who could only influence some of the costs. I didn’t move the budget or cross charge, but kept track of costs as part of our migration dashboard. Teams had to reduce in other areas if they overspent their cloud forecast, while getting to reallocate any savings if they could drive more efficiency in their cloud usage. This opened the flood gates for more innovation.
Soon teams were using test automation to provision cloud capacity as part of test automation, executing the test, and then turning everything back off. This allowed us to eliminate the need for any persistent test environments. We also focused on production usage where further savings came from turning off or reducing capacity provisioned for peak loads or disaster recovery. New automation allowed scaling up when needed. If a test environment was required to validate a fix, a team could take a snapshot of the pro-duction environment, deploy the fix, test it, and then even point production at this new environment and turn off the previous production capacity. The agility for teams from using the cloud was amazing. And the net effect on total costs? After effectively migrating everything, our Azure bill in total was less than just the sum of our premigration spend on running data centers and what we saved on infrastructure operations. The hardware costs that we spent so much time comparing in the beginning were now essentially free. We documented what we did, and what the team continues to learn. Detail lesson writeups can be found at Microsoft.com/itshowcase.
41
Identify appropriate workloads to move first.
Don’t spend too much time analyzing before you start. Leverage a good catalog of apps and services. Start by moving simple apps or services: low regulatory impact, less mission critical, fewer interfaces, less latency-sensitive. Also consider those that need new hardware or have larger nonproduction environments. These return the largest savings and have less friction to move. Get started and learn. Save money from initial migrations to accelerate the rest.
42
Go. Cloud migration can fund itself.
Don’t wait for permission. Start full cloud migration now. Move packaged apps to SaaS, removing unnecessary customizations. Convert apps and capabilities to PaaS wherever possible. Containerize legacy and move to IaaS where it isn’t practical to convert yet. Move first the workloads with clear savings. Learn and use the migration savings to move more. Use the savings to fund both migrations and refactoring the portfolio to take advantage of the cloud.
43
Move budget accountability to application teams.
Maybe the best decision I made to improve migration was moving the accountability for Azure spend to the teams that could more impact the spend. It changed the incentives. Deep reporting on cloud utilization allowed teams to optimize their cloud spend. The incentive to drive the optimization increased when savings allowed them to accelerate other progress for their business partners. This incentive drove amazing innovation, leading to many other lessons.
44
Cloud migration isn’t about cost savings.
The real value of public cloud migration is in agility and new capabilities possible with cloud scale, not cost savings. We did have savings and used them to accelerate the migration, based on a budget assuming a fast migration. But learnings and innovations through the migration drove dramatic improvements in our agility, and value that we could not have delivered before the cloud. Adjustable capacity removed friction. With effectively all our on-premise environment mi-grated, significant new opportunities arose.
45
Measure migration progress through on-premise reduction.
As we migrated, we learned that counting in the cloud is different than on-premise. After migrating almost 55K on-premise VMs, we used less than 20K Azure VMs. The difference went to SaaS apps, PaaS services, or just weren’t needed anymore with our new agility. The best way to measure migration is measuring from where we started on-premise to what’s left. This means at 95% migrated, only 5% of our original capacity on-premise was left.
46
Eliminate static test environments through automation.
Where preproduction environments are still needed, we learned not to leave them on outside working hours. The Azure snooze feature could automate shutdown and restart as necessary. Then we learned to just automate provisioning the test environments as part of a test run, deleting all after the test, so no permanent preproduction environments are needed. This became the standard. Less cost and more agility.
47
Rethink your definition o
f an application.
Savings can quickly fund new investments needed to refactor some apps with cloud services. For example, an app that employees used to update their personal information was replaced by a process initiated by end-users that starts a cloud service, allows input, writes updates through Azure Integration Services directly to our HR module in SAP, and then turns everything off. No persistent app. No separate data base to reconcile. New microservices like this along with consistent APIs helped simplification.
48
Infrastructure teams need new cloud goals.
After migration, significant infrastructure work goes away or is self-service enabled for app teams. Give opportunities to infra-structure people for new security and app roles. App teams need some of the infra skills to design provisioning automation. Those that are left can re-engineer wireless networks for a better modern workplace, make these networks internet-only, or govern cloud usage for application teams, advising them where to improve, and leveraging templates to simplify adoption while ensuring security controls.
49
Empowerment drives greater application team responsibility.
Application teams need new skills to best handle their new empowerment. Historically, infrastructure teams provisioned new capacity, built databases, and managed storage and backups for the application teams. To effectively go fast in the cloud, application teams must automate these capabilities themselves, seamlessly using templates from infrastructure advisory teams to automate the provisioning, resiliency, security, and other capabilities for their cloud applications and services.
50
After migration, continue to eliminate IaaS.
Ideally, you don’t need to lift and shift anything from compute and storage on-premise to the equivalent commodity compute and storage in the cloud. The reality is you will initially just move some legacy applications because they are integrated to something else you want to move. Leverage containers where possible, and come back to optimize these and get more value later with SaaS migration where possible, new PaaS capabilities, and simple microservices.
Chapter Six
Modernize Legacy Practices to Simplify Acceleration
In 2011, two years before I was asked to be Microsoft’s CIO, I got an offer from Amazon for what was the equivalent of their CIO role. Fortunately, Amazon recruiting took forever to get the actual offer to me, so delayed that Bryan Valentine, the Jeff Bezos direct report I would have worked for, felt so bad he drove the offer letter to my house personally.
Meanwhile, I’d told Tony Scott, Microsoft’s then CIO that I was getting an offer from Amazon. Tony and Kevin arranged for me to meet Steve Ballmer to get his perspective on the opportunity. I’d been in a lot of meetings with Steve over the years, but never a 1:1 about me.
I was scheduled for a twenty-minute meeting in Steve’s office on a Saturday morning. I told Steve right away my reasoning in even looking at an offer from Amazon. Beyond my family’s passion for traveling the world, we had no desire to ever leave our Seattle home, and I loved being in IT in the software industry where our role included product feedback and customer engagement. If I was ever going to leave Microsoft, my choices were limited. Amazon was on a very short list at the time. I also explained that for all the reasons I loved Microsoft, we could learn some lessons from Amazon. They had better customer focus, they integrated and treated IT as part of their product teams, and had modernized many legacy practices. Steve was curious about my perspective. We debated some myths and talked about how legacy simplification could accelerate our progress. Ninety minutes later, he realized he was going to be late for a commitment with his wife, but he had more questions. He asked me to walk with him to his car. And then continuing to ask questions, had me get in his big white SUV so he could drive me around to my car. He committed to champion the issues I raised if I would stay and help drive them.
I decided to stay. I was skeptical whether Steve would follow through, but determined I was going to make a difference for the better anyway. A couple weeks later, I got a call from J Ritchie who led Microsoft’s employee compensation process. J told me that he was driving a project for Steve to improve employee compensation. He wasn’t sure why, but Steve had told him that he needed to take my input on the plan. I realized if Steve was doing this, I had all the support I needed to drive hard for modernization. Microsoft misses Steve’s unbridled enthusiasm, but Satya continues to push for a lot of the changes I was hoping to see in the culture at Microsoft: looking at everything from a customer’s perspective, focusing on inclusion of all perspectives, working together as one company, and providing people opportunities to make a positive impact on the world.
I knew that IT had an opportunity to help the company make radical improvements for the good, but to do that we needed to change ourselves quickly. We looked at modern practices, not just for developing, integrating and implementing software, but how we incorporated design thinking to enhance what we were doing, and simplifying adoption for our users by deploying smaller bite-size changes more often rather than huge releases that required massive change management. Satya also helped me realize that we needed to stop feeling good about our brilliant engineering to enable Microsoft products to work for us, or with other third-party products. Instead he asked me to drive better accountability with the appropriate product teams. If we needed to add glue code to make it work, then many of our customers would need to add the same, which was bad.
We had a lot to transform. Modernizing how we worked not only helped us go faster with better results, but eliminating layers and fixing bad processes reduced our costs and increased the level of trust we received from the rest of the company.
51
First consolidate or eliminate everything possible.
With each data center closure, we found material capacity we could just turn off. We learned to monitor usage more proactively to simplify first. Don’t just consolidate data centers to the cloud. Also, streamline teams doing similar work, and eliminate duplicate application instances and infrastructure processes. Containerize or convert legacy apps and services. Expect at least 20 to 30% savings just from this simplification. Also, eliminating this technical debt improves speed.
52
Recognize Agile requires a mindset transformation.
There are many books that teach Agile practices. I’ll just focus on our lessons learned from adopting them. First, Agile is a culture change itself. Real Agile adapts to changing priorities and learns as you go. Build a minimum viable product, and then iterate. Ideas from any learnings can get prioritized immediately into the backlog you are grooming. Cultivate a mindset to experiment. Fail fast, learn and use data from telemetry to make decisions.
53
Implement real Agile, not shorter waterfalls.
Transformation is not going faster with the same historical IT practices, or getting more done by simply working longer hours, or adding capacity with more lower cost resources. Don’t just overlay what teams knew historically on top of Agile practices. Using shorter waterfalls completely misses the point. Fundamental change is required. Dedicated coaches can help teams really transform faster.
54
Telemetry provides data to make decisions.
Possibly the most important thing I learned in our transformation was the importance of telemetry. Implemented everywhere, you see immediately when something isn’t working, so you can fail fast. You can see how customers and all users are interacting with your system to quickly iterate improvements. You see the results of your experiments. I don’t know how we would have thrived without telemetry. You can do this while protecting privacy by collecting data about behaviors, not necessarily who did what.
55
Cultivate a DevOps mindset, not roles.
DevOps isn’t a separate role or separate team. It is the way everyone needs to think and act going forward. With a DevOps mindset, everyone plans how something will run, while building it, not after the fact. They dri
ve for automation, security, and integration. They design telemetry to ensure it runs well. They understand live site reviews and maniacally drive to root cause. They prioritize the most important operational changes into Agile sprint backlogs.
56
Agile everywhere, not just software development.
Agile practices can apply to all teams, not just software development. To simplify every-where, we combined Agile practices with streamlined meeting conduct to infra-structure teams, security teams, live site reviews, business stakeholder backlog reviews--everywhere. Patrick Lencioni’s book, Death by Meetings, is great for overhauling meetings, including keeping meetings engaging, and separating tactical from strategic topics.
57
Consider where standardizing across teams accelerates.
Look for standardization that optimizes across teams, while still allowing for empowerment. For example, you can simplify estimating by making sprint team sizes consistent. And enable dependent releases by implementing consistent release schedules across teams. We planned dependent integrations to line up quarterly while allowing individual sprints to last two or three weeks. Also, standardizing development tools allows loaning of sprint teams to the highest priority business needs.