An Elegant Puzzle- Systems of Engineering Management

Home > Other > An Elegant Puzzle- Systems of Engineering Management > Page 10
An Elegant Puzzle- Systems of Engineering Management Page 10

by Will Larson


  The fixed cost of creating and maintaining a policy is high enough that I generally don’t recommend writing policies that do little to constrain behavior. In fact, that’s a useful definition of bad policy. In such cases, I instead recommend writing norms, which provide nonbinding recommendations. Because they’re nonbinding, they don’t require escalations to address ambiguities or edge cases.

  4.1.2 Exception debt

  Once you’ve supported your goals through constraints, all you have to do is consistently uphold your constraints. This is easy to say, but consistency requires no little bravery. Even with the best intentions, I’ve often gone astray when it was time for me to support my own policies.

  The two reasons that applying policy consistently is particularly challenging are:

  Accepting reduced opportunity space. Good constraints make trade-offs that deliberately narrow your opportunity space. Some of the opportunities that you’ll encounter within that space will be exceptionally good, and it’s hard to stay true when faced with concrete consequences.

  Locally suboptimal. Satisfying global constraints inevitably leads to local inefficiency, sometimes forcing some teams to deal with deeply challenging circumstances in order to support a broader goal that they may experience little benefit from. It’s hard to ask folks to accept such circumstances, harder to be someone in one of those local inefficiencies, and hardest yet to stick to the decisions at real personal cost to the folks you’re impacting.

  When we’ve picked thoughtful constraints to allow us to accomplish important goals, we need the courage to bypass these opportunities and accept these local inefficiencies. If we don’t summon and maintain that courage, we incur most of the costs and receive few of the benefits.

  Policy success is directly dependent on how we handle requests for exception. Granting exceptions undermines people’s sense of fairness, and sets a precedent that undermines future policy. In environments where exceptions become normalized, leaders often find that issuing writs of exception—for policies they themselves have designed—starts to swallow up much of their time. Organizations spending significant time on exceptions are experiencing exception debt. The escape is to stop working the exceptions, and instead work the policy.

  4.1.3 Work the policy

  Once you’ve invested so much time into drafting policy, you have to avoid undermining your work, and yourself, with exceptions. That said, you can’t simply ignore escalations and exceptions requests, which often represent incompatibilities between the reality you designed your policy for and the reality you’re operating in. Instead, collect every escalation as a test case for reconsidering your constraints.

  Once you’ve collected enough escalations, revisit the constraints that you developed in the original policy, merge in the challenges discovered in applying the policy, and either reaffirm the existing constraints or generate a new series of constraints that handle the escalations more effectively.

  This approach is powerful because it creates a release valve for folks who are frustrated with rough edges in your current policies—they’re still welcome to escalate—while also ensuring that everyone is operating in a consistent, fair environment; escalations will only be used as inputs for updated policy, not handled in a one-off fashion. The approach also maintains working on policy as a leveraged operation for leadership, avoiding the onerous robes of an exceptions judge.

  When you roll out a policy, it’s quite helpful to declare a future time when you’ll refresh it, which ensures that you’ll have the time to fully evaluate your new policy before attempting revision. It’s fairly common for folks to modify good, effective policy due to concerns that arise before the policy has had time to show its effect. At a sufficiently high rate of change, policy is indistinguishable from exception.

  The next time you’re about to dive into fixing a complicated one-off situation, consider taking a step back and documenting the problem but not trying to solve it. Commit to refreshing the policy in a month, and batch all exceptions requests until then. Merge the escalations and your current policy into a new revision. This will save your time, build teams’ trust in the system, and move you from working the exceptions to working the policy.

  4.2 Saying no

  Some years back, I was sitting in a room with my manager, our CTO, and a crisis. An engineer on my team had mishandled two alerts, which had cascaded into plausibly the worst production incident that the company had experienced to date. There were three root causes: alert fatigue, a lack of velocity context for out-of-disk-space alerts, and our reliance on a centralized database with little support for vertical scaling. At that moment, though, we were no longer talking about root causes. We were discussing whether to fire the on-call engineer, and I was saying no.

  It was in that era of my career that I came to view management as, at its core, a moral profession. We have the opportunity to create an environment for those around us to be their best, in fair surroundings. For me, that’s both an opportunity and an obligation for managers, and saying no in that room with my manager and CTO was, in part, my decision to hold the line on what’s right. However, there was a second no in that room, and it’s one you’ll use routinely even under the best of circumstances. That no is an expression of what is possible for the team you lead to do. I felt that the decision would be wrong, but also that the precedent of firing people for on-call mistakes would irreparably damage the morale of a team who already saw their phone batteries drained before the end of a 12 hour on-call shift.

  This no is explaining your team’s constraints to folks outside the team, and it’s one of the most important activities you undertake as an engineering leader.

  4.2.1 Constraints

  Folks who communicate a no effectively are not the firmest speakers, nor do they make frequent use of the word itself. They are able to convincingly explain their team’s constraints and articulate why the proposed path is either unattainable or undesirable.

  Articulating your constraints depends on the particulars of the issue at hand, but I find that two topics are frequent venues of disagreement. The first is velocity: Why is this taking so long when it should take a couple of hours? The other is prioritization: Why can’t you work on this other, more important, project?

  Let’s dig into how to have those conversations constructively.

  4.2.2 Velocity

  When folks want you to commit to more work than you believe you can deliver, your goal is to provide a compelling explanation of how your team finishes work. Finishes is particularly important, as opposed to does, because partial work has no value, and your team’s defining constraints are often in the finishing stages. The most effective approach that I’ve found for explaining your team’s delivery process is to build a kanban board2 describing the steps that work goes through, and documenting who performs which steps. You don’t have to switch to using a kanban system, although I’ve found it very effective for debugging team performance, you just have to populate the board once to expose your current constraints.

  Using this board, you’ll be able to explain what the current constraints are for execution, and help your team narrow suggestions for improvement down to areas that will actually help. If you don’t provide this framework, people tend to start making suggestions everywhere across your process, which at best means many ideas won’t reduce load where it’s most helpful, and at worst may inadvertently increase load.

  You want to focus your team on your core constraint, the single inefficient component that’s slowing down your throughput of finished work. Once you’ve focused the conversation on your core constraint, the next step is explaining what’s preventing you from solving for it. At many technology companies this comes down to technical debt or toil. However, the specters of technical debt and toil have been used to shirk so much responsibility that simply naming them tends to be unconvincing.

  Instead, you have to translate the problem into something resembling data. If you’re following a consistent pr
oject management methodology, this can be as easy as explaining how you decide the number of story points for each sprint, along with how that number has trended over time. If not, I’ve found it useful to borrow the approach of a sampling profiler: for a week, check what your team is working on at a few random moments across the day, and use that as an approximation of how time is being spent.

  Once you’re able to explain your constraints and how time is being spent, then you’re having a useful conversation about whether you can shift time from other behaviors toward your constraints. The final stage comes next, which is the discussion around adding capacity.

  There are two ways to add capacity: move existing resources to the team (away from what they’re currently doing) or create new resources (typically through hiring). Neither is a panacea, and they are explored in “A Case against Top-Down Global Optimization”3 and “Productivity in the Age of Hypergrowth,”4 respectively.

  Putting it all together, the best outcome of a discussion on velocity is to identify a reality-based approach that will support your core constraint. The second-best outcome is for folks to agree that you’re properly allocated against your constraints and to shift the conversation to prioritization. (Those are the only good outcomes.)

  4.2.3 Priorities

  Although shifting from a discussion about velocity to one about prioritization is a good outcome, expressing your priorities convincingly can be a difficult, daunting task. I recommend breaking it down into three discrete steps: document all your incoming asks, develop guiding principles for how work is selected, and then share subsets of tasks you’ve selected based on those guiding principles. Hopefully, documenting your incoming asks is as straightforward as auditing your team’s tickets, but it’s pretty common for some of the most important asks to be undocumented. What I’ve found effective is to blend existing planning artifacts (typically quarterly/annual plans) and your tickets into a list, and then test it against your most important stakeholders. Keep asking those who routinely have dependencies on your team, “Does this seem like the right list of tasks?” The result will be a fairly accurate artifact.

  From there, you have to pick the guiding principles that you’ll use for selecting tasks. How you’ll do that will depend on your team and your company—infrastructure teams will pick different guides5 than product teams will, but they’ll likely be grounded in your company’s top-level plans and will intersect with your team’s mission. (The most controversial guides tend to be statements about the value of current work versus future work, for example doing investment work today that will pay off in two years but limit value in the short term. One technique that I’ve found useful for this particular scenario is specifying quotas for both immediate and long-term work.)

  The last step is sitting down with your team and applying your guiding principles to the universe of incoming asks, then finding the subset to prioritize. You’ll continuously get more requests to do work, so it’s important that this process is lightweight enough that you can rerun it periodically.

  Which, it so happens, is exactly what you should do when a stakeholder disagrees with your priorities. Understand their requests, and sit down with them to test their ask against your guiding principles and your currently prioritized work. If their request is more important than your current work, shift priorities at your next planning session. (To limit churn created by shifting priorities, it’s useful to wait for the next planning session instead of making these changes immediately. This does mean that you’ll need to be refreshing your plan at least monthly.)

  4.2.4 Relationships

  If you’ve poured time into explaining both your velocity and priorities but your perspective still isn’t resonating, then it’s fairly likely that you have a relationship problem to address. In that case, the next step isn’t investing more energy in explaining your constraints, but instead working on how you partner with your individual stakeholders.

  4.3 Your philosophy of management

  For me, the first few years of management were a wild frenzy. Every situation was brand-new, and I puzzled through each decision from the ground up. Over time, I developed some rules of thumb and guidelines, but only the experience of managing managers has truly refined my thoughts on management.

  When I started managing, my leadership philosophy was simple:

  The Golden Rule6 makes a lot of sense.

  Give everyone an explicit area of ownership that they are responsible for.

  Reward and status should derive from finishing high-quality work.

  Lead from the front, and never ask anyone to do something you wouldn’t.

  These have served well as a foundation, but by applying them repeatedly over time and circumstance, I’ve seen them fray on the edge cases. Learning forward, I’ve started to weave in a number of additional ideas in the vain quest for a unified theory of management.

  4.3.1 An ethical profession

  I believe that management, at its core, is an ethical profession. To see ourselves, we don’t look at the mirror, but rather at how we treat a member of the team who is not succeeding. Not at the mirror, but at our compensation philosophy. Not at the mirror, but at how we pitch the roles to candidates. Whom we promote. How we assign raises. Provide growth opportunities. PTO requests. Working hours.

  We have such a huge impact on the people we work with—and especially on the people who work “for” us—and taking responsibility for that impact is fundamental to good management.

  This doesn’t always mean being your team’s best friend. Sometimes it means asking them to make personal sacrifices, letting go of a popular member of the team, or canceling a project the team is excited about. It’s remembering that you leave a broad wake, and that your actions have a profound impact on those around you.

  4.3.2 Strong relationships > any problem

  I believe that almost every internal problem can be traced back to a missing or poor relationship, and that with great relationships it is possible to come together and solve almost anything.

  Technical disagreements become learning opportunities for everyone. Setbacks are now a shared experience that offers the opportunity to gel together as a team.

  Even with great relationships, there are still real challenges! You have a limited budget for giving raises and can’t satisfy everyone. If your customers don’t love your product, camaraderie can’t pay salaries. Some technical problems are genuinely novel, without an obvious solution, and sometimes the obvious solution is cost-prohibitive.

  That said, I try to start debugging problems from the relationship angle, and I find this technique pretty effective.

  4.3.3 People over process

  A few years back, one of the leaders I worked with told me, “With the right people, any process works, and with the wrong people, no process works.”

  I’ve found this to be pretty accurate.

  Process is a tool to make it easy to collaborate, and the process that the team enjoys is usually the right process. If your process is failing somehow, it’s worth really digging into how it’s failing before you start looking for another process to replace it.

  As you start homing in on the problem (maybe it’s you!), honestly ask yourself if a different process would address it, or if you’re moving around the food on your plate. My experience is that a different process probably isn’t the solution you’re looking for.

  4.3.4 Do the hard thing now

  In this profession, we’re often asked to deal with difficult situations. No set of rules can guide you safely through every scenario, but I have found that postponement is never the best solution.

  Instead of avoiding the hardest parts, double down on them.

  If you have a poor relationship with your manager or a member of your team, spend even more time with them. Meet with them every day, or have dinner with them. If two engineers are struggling to work together, before you separate them onto different teams get them to spend more time together trying to understand each ot
her’s perspective. (There are some obvious exceptions here, but if two people truly cannot work together, is there something else there that you’ve been avoiding dealing with?)

  As a leader, you can’t run from problems; engage ’em head-on.

  4.3.5 Your company, your team, yourself

  Lately, I’ve come to have something of a mantra for guiding decisionmaking: do the right thing for the company, the right thing for the team, and the right thing for yourself, in that order. This is pretty obvious on some levels, but I’ve found it to be a useful thinking aid.

  First, all thinking should start from a company perspective, and you should make sure that what you’re doing is not creating negative externalities for the company or the other teams you work with. For example, you’re really excited about trying out a new programming language in a project, but you should also make sure that you’ve considered the additional maintenance cost for the rest of the company.

  Next, make sure that your choices are being made on behalf of your team, not on your own behalf. This might mean pushing back on a timeline that would force your team into a death march, even though it’s uncomfortable to have that conversation with your manager or your product partner.

 

‹ Prev