There's still plenty of work left to do on examining the strike zone and how it is actually called by umpires in baseball. Clearly, there is no one single strike zone. Its size and placement vary based on circumstances, just which are not yet fully determined. Before delving into all that though, I turned my attention to one of the goals of this strike zone endeavor, using it to augment analysis of pitchers and catchers.
If a pitcher throws a pitch that should be a strike and isn't called a strike, is the pitcher at fault? Or the catcher? Is it entirely the umpire? Should the pitcher get some sort of credit? How much? These are the class of questions that I hope to peek into over time.
The model that I currently have of the strike zone — split into two depending on the side of the hitter — isn't perfect. Further refinements will need to be made to incorporate other variables attached to each pitch. However, I do think that it's a solid enough platform to build some temporary scaffolding on and I now wanted to look at things from the catcher's perspective.
What I did was go through every pitch in my database that was called a ball or strike and recorded by pitch F/X. I separated and grouped them by the year, the pitching team and the catcher. Then I, using the appropriate strike zone depending on which side the hitter batted, split each pitch into one of four buckets based on whether it was inside or outside the strike zone and whether it was called a strike or a ball.
I want to stress first that the values below should not be treated as gospel. As mentioned, the strike zone needs to be more finely determined and the methods used to arrive at these catching numbers are not designed to isolate only the catcher's influence. The catcher is simply used as a grouping point. There's no attempt to control for the pitchers, the umpires, the counts, or anything other than which side the hitter stood on.
Nevertheless, I think the numbers are interesting and probably not too greatly prone to error. My belief is that the home plate umpire is the greatest possible source of bias here, but umpires get rotated around with pretty even frequency so I'm comfortable going forward for now.
What I managed to end up with is a breakdown for each catcher, on each team, in each year (2007-present) that includes the following stats:
The number of called pitches caught total (N)
The number of called balls caught within the strike zone (zBall)
The number of pitches caught within the strike zone (nZ)
The number of called strikes caught outside the strike zone (oStr)
The number of pitches caught outside the strike zone (nO)
The percentage of pitches, caught within the strike zone, called a ball (zBall / nZ = zBall%)
The percentage of pitches, caught outside the strike zone, called a strike (oStr / nO = oStr%)
With those, I was able to calculate a few additional numbers. The MLB average oStr% rate (7.2% in 2012) subtracted from the catcher's oStr% and multiplied by the number of pitchers caught by the catcher outside the strike zone (nO) the yields a product [(catcher's oStr% - MLB oStr%) * nO] of how many extra strikes the catcher saw compared to the league average on out of zone pitches. How much influence you want to give the catcher for that is up to you.
Comparably, the MLB average zBall% rate (14.5% in 2012) subtracted from the catcher's zBall% and multiplied by the number of pitchers caught by the catcher inside the strike zone (nZ) yields a product of the number of extra balls the catcher saw compared to the league average on pitches in the zone. How much influence you want to give the catcher for that is up to you.
A quick aside here to bring up another issue. The umpire calls and the strike zone are treated as binary (either 0 or 1) here. For the umpire's call (ball or strike) that's fine, there's no gray area. But for whether the pitch was within the strike zone or not, it's simplistic. Herein this is the picture being used.
The strike zone isn't called with such a massive drop off however. The odds that a pitch are called a strike degrade gradually as it moves further away from the center.
A better representation of the catcher's numbers would be attained by using the probability that a pitch caught in its location would be ruled a strike. Unfortunately, that's currently beyond my resources/coding know-how, though I continue to work toward that aim.
Digression aside, adding the two relative totals (first multiplying the extra balls by -1 to make them extra strikes) above together gives a number (call it +Str) of the number of extra strikes — either by having strikes called on pitches outside the zone or by having pitches in the zone not get called a ball — that the catcher played a role in.
Dividing that by the total number of called pitches (+Str/N) yields a rate of how often a catcher saw a positive switch on the call from expected. Finally, multiplying that rate by 78 (the average number of called pitches per game) gives an expected number of positive switches per full game caught.
Let me run through that all again, using an actual example of Miguel Olivo this past season. Olivo in 2012 caught 4,929 pitches that a hitter didn't swing at; 1,693 (nZ) of them were deemed within the strike zone and should (simplistically) have resulted in 1,693 strike calls. Instead, 254 (zBall) of them were called a ball giving a zBall% of 15%. Outside the zone, 3,236 (nO) pitches were caught. That should (simplistically) have resulted in 0 strike calls. Instead, 192 (oStr) were ruled a strike resulting in a oStr% of 5.9%.
Taking Olivo's 5.9 oStr% and subtracting away the league average 7.2% and multiplying by how many pitches Olivo caught outside the zone (3,236) yields -41 extra strikes, or alternatively written, +41 missing strikes, on those pitches compared to the rest of the league.
For the pitches inside the zone, I take Olivo's 15.0 zBall% and minus away the league rate of 14.5% and multiply by the number of pitches caught inside the zone (1,693) to get 8 extra balls compared to the rest of the league. Extra balls are different, but just inverse, from extra strikes, so I then multiplied by -1 to turn it into -8 extra balls, alternatively written as +8 missing strikes.
Now I can match up and add those two (41 missing strikes and 8 missing strikes) together to get 49 (+Str) missing strikes seen when judged against the MLB average. That total divided over the 4,929 (N) total called pitches gives Olivo a rate of 0.01 switches per called pitch. Then multiplied by 78 average called pitches per game finally results in 0.78 estimated calls per full game in 2012 while Olivo was catching that went against the Mariners.
Now, if you haven't read Mike Fast's groundbreaking research on pitch framing, go read it. In it, he not only attempts to quantify by catcher, but he also digs out some great scouting and identifies possible causes and techniques that influence umpire calls. His goal in quantifying catchers was the same as mine, but his methodology different (probably better than mine, but requires more computational effort than I had to give) enough that the two could provide a good sanity check against each other.
That they do. Fast's piece was written last year and so didn't include 2012 data, but he identified Jose Molina, Russell Martin, Yorvit Torrealba, Jonathan Lucroy and Yadier Molina as the five best catchers from 2007-11 and Ryan Doumit, Gerald Laird, Jorge Posada, Jason Kendall and Kenji Johjima as the five worst.
I recorded 78 different catchers that caught at least 1,000 called pitches in 2012. My top five came out as Jose Molina (+2.47 per game), David Ross, Chris Stewart, Jonathan Lucroy and Erik Kratz. Ross ranked 14th by Fast's measure and while neither Stewart nor Kratz had enough playing time to qualify back in 2011, both ranked well in their small samples then.
My bottom five included Ryan Doumit (-2.97 per game), John Hester, Rob Johnson, Mike Napoli and Carlos Santana, with Gerald Laird just missing, at sixth-worst. Again that showed broad and sometimes very specific agreement with Fast's rankings, which weren't even covering 2012.
In an area I've yet to explore, Dan Turkenkopf, back in 2008, came up with a figure of 0.133 runs per switched call on a close pitch. If that's accurate, a catcher like Jose Molina would have added 26 runs of value if (monumental if!) the entirety of his +2.47 rating came from him alone. That's probably not going to end up as accurate, but I think it's worth stating.
To the larger baseball community that's probably a giant who cares. Fast's method is probably sounder and he published it over a year ago. This isn't new. For me though, this represents a big plus for a couple reasons. First is that, hooray, I can run these numbers whenever I want. I'm always a fan of having data under my control and so should you because I can more easily share with you that which I can generate, while Fast now works for a team and probably isn't publicly sharing updates.
Secondly, hooray, 2012 numbers!
Thirdly, as I get a finer and finer model of the strike zone I think these numbers will get more accurate. Most importantly, eventually I'll incorporate the full probabilistic model rather than just the binary 0 for should be ball, 1 for should be strike and we'll be able to credit/debit fractions of pitches for those tough border pitches that ultimately are what we're after. And once I can do that, I can also start introducing more and more controls, such as factoring out the umpires as well.
That's all future stuff though. For now, here are the totally unsurprising figures for the Mariners troika of catchers in 2012:
Adam Moore: -2.00 extra strikes per game.
Rob Johnson: -2.52 extra strikes per game.
Josh Bard: +0.39 extra strikes per game.
Rob Johnson: -2.67 extra strikes per game.
Kenji Johjima: -2.25 extra strikes per game.
Kenji Johjima: -1.92 extra strikes per game.
Jamie Burke: -0.43 extra strikes per game.
See? Jesus Montero is just trying to fit in. Like with that fish in the commercial!