Player Valuation Tip #7: Use the best projection systems

Tip #1: Where do player values come from?
Tip #2: Set your Hit/Pitch split
Tip #3: Value your picks and make preseason trades
Tip #4: Draft with tiers
Tip #5: Use xFantasy, the xStats projection system
Tip #6: Use aging curves for keeper/dynasty leagues

Entering now into part seven of my preseason player valuation series, we arrive at one of the more important decisions of the preseason: deciding which projection system(s) to use. As a testament to how important this is, people have been asking me about this piece for weeks - wait no longer!

Evaluating projection systems is well-trodden ground, as documented by Will Larson over at the Baseball Projection Project. This year, the most notable projection analysis I've seen is again BTBS and in that case it was not a specifically fantasy-focused analysis. Over there, they largely found that Steamer was the best, followed closely by PECOTA, and they also examined some interesting differences in how each system does with players of certain ages. Each of the projection systems changes and iterates their methodology year over year, and so we can always stand to learn more by analyzing the most recent year's results.

In this study, I'll focus on the most commonly used projections - the same ones that appear in the Big Board: Steamer, PECOTA, ZiPS, Fangraphs Depth Charts, Fangraphs Fans, and Clay Davenport projections. The categories of interest are the typical 5x5 categories, HR/R/RBI/SB/AVG for hitters, and W/SO/SV/ERA/WHIP for pitchers. I'll look at each system's ability to project these categories in total as well as on a per-PA/per-IP basis. Since for fantasy purposes we only care about the relative projections made by each system (ie, we only need to know Kershaw is the best SP in baseball, not exactly what his ERA will be), I'll primarily use R squared to evaluate how well the projections correlated to actual results, but this year I'll also be including RMSE to show the absolute error in each projection system. The most common fantasy leagues draft about 300 players, broken out into 180 hitters, 90 SP, and 30 RP, and so I've gone through each system to find the consensus top 300 players as projected in the 2016 preseason, and will only be evaluating the systems based on their projections of those 300 players. One final adjustment - hitters that didn't end up reaching 400 PA and pitchers that didn't reach 35 IP, have been thrown out of the sample this year to reduce the effect of outliers.

Finally - for both hitters and pitchers, I'll present the the weighting factors used to create the best possible projection from a combination of the various 2016 projections. This year, it's called "Steacotaps" for both hitters and pitchers, which as you might guess, means it's a combo of Steamer, PECOTA, and ZiPS. This combined projection will be the default one I set for the Big Board as we head into draft season. 

Hitters

After applying the caveats listed above, I end up with 142 hitters in my data set for 2016. In addition to each of the standard projections, I've included a custom mix which I'll call 'Steacotaps' - 40% Steamer, 35% PECOTA, 25% ZiPS. This custom mix also uses weighted playing time, with the same percentage weights. Rate stats like AVG were also evaluated as part of the 'total' projections by using a playing-time weighted value indicated by an 'n' (e.g. "nAVG").

 
 

Starting with playing time, nearly every projection system struggled. Between injuries, lineup spots, and role changes, playing time is just plain difficult to peg. In previous years I've given a hat-tip to the hand-curated playing time over at Fangraphs, but here we see it failed to help the Steamer and FGDepth systems achieve notably better playing time. At first glance, ZiPS did well for hitter playing time, but there is a caveat - ZiPS makes no effort to project accurate playing time. If you extend the dataset to all active players, you'll see ZiPS actually did quite poorly. By combining Steamer, PECOTA, and ZiPS opinions of playing time however, we get the best of both worlds, so Steacotaps comes up with a significantly better projection of PA's.

Homers and steals were again the easiest offensive categories to project, with no system getting much of an edge on a per-PA basis. Steals are obviously assisted by the low/no-speed guys who are accurately projected for nearly no steals. Steamer had a strangely poor showing in SB projection this year, but on closer inspection that's largely due to a terribly inaccurate Jonathan Villar projection (320 PA, 20 SB projected vs. 679 PA, 62 SB actual). Steamer was among the most bearish on him going into 2016. Runs were very hard to project, but there was also quite a spread from Steamer at the top to ZiPS/Clay at the bottom. RBI were projected relatively well by all systems. Average is understandably difficult to project given the year-to-year fluctuation in that category, and all systems besides Fans had R sq. between .31-.34. The net result is calculated in terms of the percentage above (or below) average across the five categories for each system, and is also plotted below.

 
 

It's good to see that some of the common fantasy baseball adages are true - averaging projections together gives you better results. In the case of hitters, my combined Steacotaps projection beats all the others noticeably, beating the average five-category R sq. by about 9%. On a per-PA basis, Steamer comes close, and FGDepth/PECOTA are not far behind that, but the improved playing time gives Steacotaps the big advantage. ZiPS had a rough year, tumbling all the way to the bottom of the list on a per-PA basis. I am losing confidence in Dan's system at this point, though clearly it is doing some things right, as it helped Steacotaps to improve noticeably.

RMSE: As alluded to in the introduction, I'm including root mean square error (RMSE) this year. This gives you an idea of what the typical error in each category, for each system.

 
 

As it turns out, RMSE shows that the systems are perhaps a bit closer to each other than R sq. would have you believe, but the conclusions are largely the same - Steacotaps is the best or second-best in every category. The Fans, somehow, did very well in predicting homeruns (maybe the homerun surge pushed the actual totals up towards their rose-tinted expectations?). The FanGraphs Depth Charts come out of this looking much better. Again, I'm drawing my primary conclusions from the R sq. data above, but this is nevertheless another interesting perspective on the data.

Verdict: Steacotaps wins out, a 40-35-25 combo of Steamer-PECOTA-ZiPS. But if you're in a hurry, the straight Steamer projections will be 90% as good.

Pitchers

Last year, I separated SPs and RPs, which simplified the selection of players for the data set, but this year I found a better way to organize the data, and so I'm leaving them combined. Note that the R sq. values will be noticeably higher now because of this, so they aren't directly comparable to the '16 version of my analysis. After subtracting non-qualifiers, I'm left with 114 pitchers, 79 of them starters and 35 of them relievers. As with the hitter projections, weighted rate stats will be indicated by an 'n' (e.g. "nERA"). I'll also include another custom mix, still called 'Steacotaps' - this time it's 45% Steamer, 40% PECOTA, 15% ZiPS. The Pitcher version of Steacotaps uses a 50-50 split of Fans and PECOTA playing time projections, as those are far and away the best two systems at projecting IP.

 
 

Last year, Clay's IP projections for SPs were impressively good, but perhaps that was a blip. I had also always heard that the Fans were good at projecting IP and that didn't show up last year... looks like they're back to effectiveness this year, though! Fans and PECOTA separate themselves from the pack on IP projections.

Steamer continues to be the champion of pitcher projections, but Steacotaps takes the top spot by improving its predictions marginally in each category. Strikeouts were relatively easy for all systems to project, and wins were very hard. Overall you'll notice a much wider spread in performance in the pitching categories than you saw above for the hitting categories. Again, the net results are plotted below:

 
 

The difference between Steamer and PECOTA is not huge overall. Steamer's advantage in W, ERA, and WHIP is balanced by PECOTA's advantage in K and IP. Either of these is a fine starting point, but combining them together, along with a minor contribution from ZiPS, to form Steacotaps, gives you a projection set that performed 25% above average on a per-IP basis. Again you can see ZiPS tumbling to the bottom with Clay. When the Fans are beating your system, for both hitters and pitchers, maybe it's time to re-evaluate!

RMSE: The root mean square error for each pitching category...

 
 

Based on RMSE, the differences between the systems are not that vast, but here we can see the clear advantage PECOTA has in IP projection. You can also see that the error in ERA is huge. In your fantasy leagues, you should pay for strikeouts and WHIP, not for ERA. Wins are also a total shot in the dark, although they look less bad when considered on a per-inning basis (the number one predictor of Wins is IP-per-game, which is sort of a chicken-and-egg scenario). All in all, Steamer and PECOTA are quite good. But combining them in Steacotaps is better.

Verdict: Steacotaps kicks some butt, using a 45-40-15 combo of Steamer-PECOTA-ZiPS, with a 50-50 split of PECOTA-Fans for playing time.