Smart Baseball

Page 19

by Keith Law

Don’t get hung up on the coefficients in that formula; you’re not going to be quizzed at the end of the chapter, and they’re out of date anyway. But notice what they do tell us about the relative values of these events:

• A home run is worth 1.4 runs, which means its power to score other runners on base is worth 0.4 runs on top of the one run it’s automatically worth because the batter scores.

• Whereas slugging percentage values a homer at four times that of a single, this Batting Runs formula has that ratio at three, meaning that slugging percentage actually overvalues home runs—and all extra-base hits, in fact.

• A single is worth about 30 percent more than a walk. This makes sense, since a single advances all runners and can advance some runners more than one base, while a walk can only advance runners who are forced to move up.

• A batter who comes to the plate four times and hits three singles has produced more value than the batter who comes to the plate four times and hits one home run. The first batter has produced 1.13 runs of value (.46*3 –.25*1), while the second batter has produced .65 runs of value (1.40*1 –.25*3). This may sound counterintuitive—how could a batter homer and yet produce less than one run of value on the day?—but it reinforces my earlier argument that a batter making an out has actually destroyed value, reducing the team’s chances of scoring in that particular inning.

The individual coefficients change over time as the run-scoring environment changes—for example, a home run is worth more in years when offense is down than when it’s up—but the concept remains the same. A hitter does things. Those things have value. Add up the values of the things and you get the total value he produced.

Somehow this became controversial, because that’s not How We’ve Always Done Things Around Here, Son, but over the last ten years, it’s become standard within the industry to look at players this way, and that change has gradually (albeit not completely) come through to the media and to a portion of the overall fan base. There are still people shouting on TV and radio about nerds and newfangled statistics, but their numbers are declining and they are increasingly becoming punch lines.

Palmer and Thorn used stolen bases and caught stealing in their master formula and, in The Hidden Game, broke them out into a separate Base Stealing Runs number. The indispensable site Fangraphs has updated this formula in by adding Ultimate Base Running (UBR), which also includes the value baserunners create or destroy with their actions on the base paths—such as going from first to third on a hit, or failing to take an extra base, or scoring/not scoring from third on a flyball. Fangraphs combines its weighted stolen base values with UBR and a smaller factor, wGDP, that weighs a hitter’s ability to avoid or propensity for hitting into double plays compared to the league-average rate of doing so, to get a total Base Running Runs number that is the most complete measure publicly available for a player’s running value.

To give you some sense of what a good Batting Runs total would be, here are the leaders, using Fangraphs’ formula (excluding baserunning), for 2015, along with their avg/obp/slg triple-slash line:

These Batting Runs numbers are “park-adjusted,” meaning that they’ve been modified slightly to reflect the hitters’ home ballparks. Goldschmidt plays half his games in Chase Field, a relatively good hitter’s park (it’s about one thousand feet above sea level), while Trout plays in one of the majors’ best pitchers’ parks in Angel Stadium.

And, just for kicks, the worst hitters in MLB in 2015:

In a little bit of baseball slapstick, the Diamondbacks made a big trade in the off-season to replace Owings . . . with Segura.

All five players on this list play in the middle of the field, with Ramos a catcher and the others all shortstops by trade, meaning that they play positions where teams will often sacrifice some offense to gain value on defense—or will accept awful offense because they can’t find anyone more capable to fill those positions. This is why shortstops and catchers are highly valued in the market for players—the draft, international free agency, MLB free agency, and trades. There are never really enough to go around, and that’s how you end up with Alcides Escobar having one of the worst offensive seasons in the majors . . . on the team that won the World Series that same year.

For a pitcher, we have a couple of ways to approach the problem of value, depending on how much you want to consider the benefit (or harm) a pitcher gets from the defense behind him or just plain ol’ luck. A pitcher’s fundamental job is to get outs; the more hitters he retires, the better for his team, right? If a pitcher retired every batter he faced, he’d be the best pitcher ever. So we could simply base our valuation of a pitcher’s performance by how many hitters he faced, how many he retired, and what the hitters he didn’t retire ended up doing—hits, extra-base hits, walks, hit batsmen, and so on.

I slid right by something in that last paragraph, though. Is a pitcher’s fundamental job to get outs, or to prevent runs? If a pitcher allows 12 hits in a shutout, did he do his job? If a pitcher retires 27 of 29 hitters but gives up two runs in a complete game, did he do a better job than the first guy? Disentangling pitching performance from defense and luck while also considering how much to weigh sequencing remains a contentious subject, because there is merit on both sides of the argument.

By sequencing, I mean the order in which things happen to a pitcher; flyout, walk, walk, homer, groundout is not the same as homer, walk, walk, groundball double play, flyout. Those two sequences have the same five events, but the first one scores three runs with two outs recorded while the second one scores one run with three outs recorded. Sequence matters. And to some extent there’s a skill involved in this; there are pitchers who are worse with runners on base because it requires them to pitch from the stretch or slide-step rather than the windup. Some pitchers lose a mile an hour or so on their fastballs from this; some just have a harder time maintaining their mechanics (and thus command) when in the less comfortable delivery.

But sequencing isn’t always very predictive. We can look at a pitcher’s walk rate as a percentage of batters faced, and it’ll tell us how likely he is to walk batters going forward. The same is true of his strikeout rate, which implies his contact rate—and since pitchers by and large tend to give up fairly consistent batting averages on balls they allow in play, we can somewhat predict the rate at which he’ll allow hits in the future, too. Home run rates show more variance, but for many pitchers they stay within a small range around those pitchers’ flyball rates (the percentage of balls in play they allow that are classified as flyballs rather than groundballs or line drives). What those rates together don’t tell us is whether those negative events will be clustered together or not; we can estimate how many runs the pitcher will allow over a large enough sample, but even a full season isn’t enough of a sample to smooth out all of the noise we might find in a pitcher’s performance. And even if it were, defense matters—how good the fielders behind a pitcher are and how well they were positioned—because no pitcher can do his job in a vacuum. Two pitchers could generate the same groundball to shortstop, but the one in front of Andrelton Simmons sees it converted into an out while the one in front of Derek Jeter sees it go past him into center field. Life ain’t fair, but our stats should be.

Because there is no consensus on which of these two approaches is actually the better way to value a pitcher, I’m going to continue to discuss both for the remainder of this chapter. You can find WAR based on a pitcher’s runs allowed on Baseball-Reference, and WAR based on a pitcher’s performance on a per-batter basis on Fangraphs, although anyone could calculate WAR one of these ways or using some kind of hybrid or smoothing approach to split the baby (if, say, you wanted to calculate a new WAR for Salomon Torres).

Regardless of method, we want to value pitchers based on what they prevented, whether that something is a run or a hit. Sticking with the run-based approach for the moment, a pitcher with a 5.00 ERA is obviously less valuable than one with a 3.00 ERA, but how do we determi
ne just how much less? If they have the same number of innings pitched, we can look at the difference between their earned runs allowed and say that the second pitcher prevented N more runs than the first one. So if both pitchers threw 180 innings, the pitcher with the 5.00 ERA gave up 100 runs while the pitcher with the 3.00 ERA gave up 60 runs, and we could say that the second pitcher prevented 40 runs when compared to the first one. That’s the value he provided over what the first pitcher provided (or destroyed, since a 5.00 ERA is kind of terrible unless you pitch your home games on the moon).

This approach puts an implicit cap on the value a reliever can provide, because relievers today typically max out around 80 innings a year. An average starting pitcher who throws about 200 innings will be worth more than most of the relievers in baseball, even the best ones, because he gets another 120 innings in which to prevent runs—that is, to provide more value. This is why starters are paid more in free agency, are worth more in trade, and, in my personal opinion, are better choices for the All-Star Game than all but the absolute elite of the relievers.

To give you some sense of how relievers measure up to starters, let’s look at Matt Moore, who was the closest thing to a perfectly league-average starter in MLB in 2016. Moore’s ERA was 4.07 and his FIP was 4.17; the major-league ERA for the whole season was 4.18 and its FIP was 4.19. (Moore was traded in July from the AL’s Tampa Bay Rays to the NL’s San Francisco Giants, so he spent slightly more time in the DH league, which tends to have higher run-scoring.) Moore threw 198 innings across 33 starts, and both major sites calculate his Wins Above Replacement at 2.2.

There were 132 relievers across all of MLB in 2016 who threw at least 50 innings, the bare minimum I’d consider a full season’s worth of work. How many of those relievers were more valuable than good ol’ League-Average Matt Moore? Baseball-Reference says twelve. Fangraphs says seven. Either way, it means an elite reliever, performing well into the top 10 percent of all relievers in that season, is only worth as much in run prevention as a league-average starter.

Many more relievers than that posted ERAs or FIPs below Moore’s, but they only threw a third as many innings—and there’s value in just throwing league-average innings to a major-league team, because league-average pitching is hard to find. The distribution of baseball talent is such that you will find fewer pitchers in any season who perform at an above-average level than below average, because guys who perform below the average tend to lose their jobs, often to other guys who also perform below the average but at least do it with a different name on the back of the uniform. Average is never an insult in baseball—it makes you better than most, and teams are willing to pay a lot of money for a player who can be average and handle the workload of a full season.

If I told you you had a choice of two pitchers, one who would be dead average for 200 innings and one who would be just a shade above average for 66 innings, which would you take? What if the second pitcher were comfortably above average? Well above average? Elite? Somewhere those two lines intersect, where you’d rather have the reliever and figure you’ll make up the missing innings somewhere else, but even without resorting to numbers, you should know intuitively that the reliever has to be a lot better in his shorter workload to match the value of the 200-inning starter. It turns out that a full-season starter whose run prevention (ERA, FIP, pick your poison) is at the 50th percentile is more valuable as a full-season reliever whose run prevention is at the 90th percentile.

However, even a run-based approach should probably look beyond innings pitched, because the inning isn’t really the right unit here. An inning means three outs, but doesn’t always mean the same number of batters—and doesn’t even mean the pitcher himself retired three batters. An inning must have three batters at a minimum, but can have up to six batters without a run scoring. We measure batters on a per plate appearance basis, because it’s obvious that that is the correct fundamental unit for a batter—it’s one discrete event for him. For a pitcher, however, it’s the same thing: Each batter he faces is also a discrete event. Yes, we think about a pitcher by how many innings he pitches over the course of a season, or even within a game, but an inning is an atom, and we can break an atom down further into smaller parts that help explain how matter behaves and even what matter is. If baseball has its superstring, it’s the plate appearance.

If we evaluate pitchers on a runs-prevented framework, which Palmer and Thorn called Pitching Runs, there’s still disagreement over how best to isolate a pitcher’s contribution (which I discussed in the chapter on ERA), and even then there are different ways to skin the proverbial cat of pitching valuation.

There is one basic tenet in common among all methods of valuing pitchers: it’s about runs prevented. The pitcher threw some innings and gave up some stuff—hits, walks, homers—that led to some runs. How many runs would an average pitcher have given up in those innings?

If I tell you a pitcher gave up 4 runs for every 9 innings he pitched in a season, is that good? Well, if the league RA9 (run average, or runs allowed per 9 innings) is 3, then no, it’s probably not good. But if I tell you that same pitcher pitched half his home games in Denver, a mile above sea level, or on the surface of the moon, then okay, maybe it’s not so bad. If that pitcher played in front of the best defensive unit in baseball history, though, maybe it’s not so good after all. And so on. Context matters, which is why all good baseball metrics use a set baseline, like the average, and compare players to that.

So a good pitching value metric starts with runs allowed, which you can then compare to a baseline level of runs allowed for an average pitcher or a replacement-level pitcher to determine how many runs the pitcher prevented. You can simply use the pitcher’s actual runs allowed, which is the method used by Baseball-Reference.com (which I used very heavily in writing this book), or you can estimate his hypothetical runs allowed given his performance on a per-batter basis, which is the method Fangraphs uses. I see merit in both methods, and I look at both sites when considering a pitcher’s value, because each number tells me something—and so does the difference between the two.*

Once you settle on your RA9—again, the pitcher’s runs allowed per 9 innings—you have to try to tweak it for the environment, which includes adjusting it for the ballparks in which the pitcher pitched. I mentioned the sad soul who throws half his starts in Denver at Coors Field, but other pitchers have to pitch there, too, with pitchers in the Colorado Rockies’ division, the NL West, pitching there more often than pitchers in other divisions. Baseball-Reference also adjusts the RA9 by the strength of the opposing offenses the pitcher faced, and assesses a small penalty to relievers, whose average RA9 is typically about .15–.20 runs lower than the average for starters. Yes, it’s a lot of adjusting, but I can’t just hand-wave it away, because it’s necessary to get some level of precision in our result.

Take the RA9 that comes out of all that adjusting and compare it to the league average RA9. That difference, multiplied by the pitcher’s innings-pitched total and divided by 9, gives you an estimate of the number of runs the pitcher prevented over the course of that time period, which is now called Runs Above Average but might as well be called Pitching Runs or Runs Prevented, as it all amounts to the same thing: this is what the pitcher was worth, measured in runs.

Let’s look at more tables! Temper your excitement, please; it’s a little unbecoming. First, here are the top five pitchers in the MLB using the actual runs allowed approach of Baseball-Reference:

Now, the same ranking, using Fangraphs’ approach, which is based on a pitcher’s walks, strikeouts, and home runs allowed, and uses a league-wide BABIP (batting average on balls in play) to estimate runs allowed:

Sale has been an underrated pitcher for most of his big-league career, as he plays in a fairly homer-friendly ballpark at New Comisk—I mean, US Cellular Park, and has generally not played in front of good defensive units. His FIP, the base runs-allowed figure Fangraphs uses to calculate its RAA here, is 2.73, so they’r
e saying that his defense/bullpen/bad luck cost him about two-thirds of a run in his ERA.

Also notable is that Kershaw went from third in the majors to first, a swing of 29 runs prevented in value. Imagine that you’re a general manager asked to decide how much to pay Kershaw for the upcoming season. You have two analysts working for you. One says Kershaw was worth 42 runs prevented in 2015, and the other says he was worth 71. Do you just fire them both and pull a salary out of a hat?

The debate between these two methods of valuing pitchers is part philosophical and part analytical. Many people dislike the idea that pitchers have no control over the results of balls put into play, and it’s possible, given improved data that MLB is providing to teams via its Statcast product, that we’ll find out they have a little more control than we’d believed for the last decade. But for forecasting, you’ll get slightly better results using an approach based on the pitcher’s “peripheral” stats, as Fangraphs does, than you will using an RA- or ERA-based approach.

But what’s good for forecasting isn’t necessarily good for valuing past production. Zack Greinke’s 1.66 ERA was the lowest any starting pitcher had posted in twenty years, the lowest in a nonstrike season in thirty years, and the eighth-lowest of any starting pitcher since the live-ball era began in 1921. (Two of the seasons above him on the list were Bob Gibson and Luis Tiant in 1968, the “year of the pitcher,” in response to which MLB lowered the height of the mound.) Whether we think Greinke was lucky, or helped by his defense, or aided by a demon summoned from the sixth circle of hell, the 1.66 ERA means that while he was pitching or responsible for the runners on base, very few guys actually scored. You’d pay more to get Greinke’s 1.66 ERA than Kershaw’s 2.13 ERA, even though you’d bet on Kershaw being better than Greinke in the following season (which turned out to be the case in 2016).

If it sounds like I’m refusing to take sides here, well, I am. I see merit in both ways of looking at player value and I use both methods myself when writing about big leaguers. If pushed to use one over the other, I’d take the peripheral-based approach that Fangraphs uses, but I recognize its imprecision as well and would hate to lose the information that the RA-based approach contains.

‹ Prev Next ›