Evaluating 2023 Pitching Projections

This article reviews the predictive performance of preseason projections from various eminent projection systems—THE BAT, ZiPS, Steamer, and PECOTA—plus mine. I ignored the aggregate projection systems and only chose those that were downloadable from FanGraphs, plus PECOTA (Feb. 16th), because I had remembered to download the data in the preseason. This was mainly a selfish exercise to see if and how badly I got my ass kicked by the more experienced projectors to help me think about where I might improve in the offseason.

Although I’ve been doing peak projections for a few years, this was my first year publishing 2023 projections that incorporated Stuff+, under the name pitching+ projected ERA (ppERA), for Eno Sarris over at The Athletic. Further, I want to see how my 2023 projections did both with and without stuff, as these are both based on the same aging curves, major league equivalencies, regression amounts, park factors, etcetera, that I use for my peak projections (PLIVE+/-) at Prospects Live. If my current projections perform reasonably well each year, the peak projections should also perform reasonably well since they are just current projections plus extra aging.

Method

As a guide for conducting this exercise, I dug up an old seminal methodological article from MLBAM Senior Data Architect, Tom Tango. Tom has also generously provided me with invaluable feedback over various exchanges. I will briefly outline the methodology again here to help guide anyone wishing to evaluate projections in the future—and perhaps especially as a reference for my future self.

Steps:

  1. Grab all the 2023 statistics for every pitcher (download them from the FanGraphs leaderboard!).

  2. Download all the preseason projections and merge them with the 2023 statistics leaderboard. For missing projections, leave them blank for now—more on this later.

  3. Rescale all of the projections so that they assume the same league environment. Projectors generally focus on predicting player performances while assuming league context is constant, e.g., last year’s environment. Guessing changes to the ball before the season starts is probably a fool’s errand, for instance, unless you know a manufacturing insider in Costa Rica. An exception is rule changes announced in advance, e.g., it of course requires projection skill to correctly incorporate the new shift rules, or the new stolen base rules, into your projections. Still, the impacts of these changes on league-relative performance are fairly minor, and evaluating who did the best at guessing, e.g., the new stolen base environment, is best reserved for a separate exercise.

  4. Here’s how to rescale, taking Steamer ERA projections as an example. Look at all of the 2023 MLB pitchers with a Steamer projection. Take the average of their ERA projections weighted by the number of batters they faced in 2023. For Steamer this was 4.09. Next, pick an ERA that you want to rescale every projection to—I went with 4.33 as that was the league average MLB ERA in 2023. There are two ways to rescale to a league environment with a 4.33 average ERA:

    1. You can subtract 4.09 from every Steamer ERA projection and then add 4.33 to it (also known as adding .24 to each for the math wizards out there!). or

    2. you can divide every Steamer ERA projection by 4.09 and then multiply by 4.33. Or if you are under strict orders to convert ERA to implied wOBA against, divide Steamer projected ERA by Steamer league average ERA, then take the square root, then multiply by league average wOBA (.318 in 2023). I used the first approach as that is the approach used in the article I followed. Both of these approaches work similarly anyway—they resulted in nearly identical root mean square errors (RMSEs) in my sample and I confirmed with Tango that it doesn’t matter much which you use. The process is the same for each of the component statistics (e.g., K%, BB%, etc.).

  5. Almost done. Now, you need to address the missing players. There are a couple of options here, each with its own biases (no selection bias-free studies in baseball!). First, you can choose to only focus on players with a projection from every system. Some projection systems cover a broader population of players (e.g., ZiPS), while some focus on relatively more established players (e.g., THE BAT). Second, you can project all missing players to be league average, the MARCEL approach. Third, missing players tend to be slightly below average, so you can project them to all be slightly below average—the MARCEL approach with a twist. None of these are perfect—my preferred choice is to not leave out any data, so I used the third approach, again matching the article I followed. I chose a slightly below league average value for each statistic that minimized the average RMSE of the five projection systems (THE BAT, Steamer, ZiPS, ppERA, PECOTA). The majority of players with 2023 MLB data have projections from every system so the results are generally consistent regardless of the approach you opt for here. The appendix shows the first approach, focusing only on players with a projection from every system, and finds similar results to the other approaches.

  6. Now you are done. Just calculate the RMSEs (or mean absolute errors) for each statistic from each projection system. RMSE is the standard deviation of the residuals, a measure of the typical distance between a forecast and an actual result. Weight the RMSEs by 2023 total batters faced. Tango recommends a further step where you calculate the difference between each projection and a naive projection, with the naive projection assuming everyone projects to be the same. I’ll henceforth refer to these as ‘samezies.’ At the very least, you should aim for your projection to outperform samezies. To calculate the difference between your projection and samezies, square both, and then subtract the smaller number from the larger number, then take the square root.

How did the various projection systems fair in 2023? The tables below summarize their predictive performances.

Table 1. RMSE of preseason projections for 2023 performance. (Lower is better predictive accuracy)

Table note: n=803 non-hitter pitchers. RMSE weighted by 2023 total batters faced. DRA was best ERA estimator for PECOTA, FIP was best for ZiPS, and BABIP-neutral ERA was best for ppERA and ppERA traditional.

Table 2. Difference between each projection and a naive projection. (Higher is better predictive accuracy)

Overall, the projections grouped closely together, all outperforming the samezies model similarly. Steamer had an excellent year overall, leading the way in predicting ERA, wOBA against, HR/9, and K%. If one felt compelled to declare a “winner” for 2023, it would certainly be Steamer. THE BAT led the way in predicting BABIP, while ppERA traditional led the way in predicting BB%.

Turning to my projections, my traditional BB% projection did not include Location+ while the ppERA version did. Accordingly, I will drop Location+ from the ppERA model as well next year. For all other metrics, ppERA with Stuff+ outperformed ppERA traditional (without Stuff+). This fulfilled an important offseason goal of mine to incorporate Stuff+ into a traditional projection in a way that improved the traditional projection’s predictive accuracy. Generally, I am relieved to see ppERA and ppERA traditional both holding their own reasonably well with the more established systems. Still, there is plenty of room for improvement. For example, this offseason, I plan to add in some pitch-level metrics, like swinging strike %, to try and close the K% gap between me and the K% leaders, Steamer and ZiPS.

What about rookie projections? The tables below summarize how the various projection systems performed at projecting rookies in 2023.

Table 3. RMSE of preseason projections for 2023 rookie performance. (Lower is better)

 Table note: n=233 non-hitter rookie pitchers. RMSE weighted by 2023 total batters faced. DRA was best ERA estimator for PECOTA.


Table 4. Difference between each projection and a naïve projection, rookies. (Higher is better)

The rookie projections are also grouped fairly closely together but with a broader spread and higher RMSEs than the overall projections. Higher RMSEs are to be expected as rookies are typically the hardest group to project. Steamer led the way in projecting rookie wOBA against and BABIP, while ppERA had the best rookie ERA projections. PECOTA had the strongest rookie K% projections, while Steamer led the way for BB%, and ppERA traditional was best for HR/9. For my ppERA traditional HR/9 projection, I project a pitcher’s fly ball rate and then assume a league average HR/FB rate, following an xFIP-style logic. The models generally outperformed the MARCEL approach for rookies (with MARCEL projecting everyone to be the same). However, MARCEL outperformed PECOTA, ZiPS, and THE BAT for HR/9, and for BABIP, MARCEL outperformed PECOTA and basically tied ppERA traditional (unsurprising as it also uses the MARCEL approach projecting all rookies for the same BABIP). Altogether, I can’t complain too much here, seeing ppERA and ppERA traditional among the leaders this year for projecting rookies.


Appendix


Table A1. RMSE of preseason projections for 2023 performance. (Lower is better). N=594 pitchers with projections from every projection system.