|
|||||||
| Capping All handicapping, betting systems, spreadsheets, mathematics & quantitative technicapping. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
AL and NL Cy Young
The requisites for establishing a reasonable prediction of the likely Cy Young candidates were similar to what I did with the MVP. Of course different statistics needed to be applied, but the essence of the framework remained the same. Find numbers of consequence and regress.
But it doesn’t take long for one to notice, after an inventory of the candidates of years prior, that the appropriation of voting points to starters as well as closers permits the process to be negotiated via arbitrary re-configuration of statistics. This means, since closers are measured by accumulating saves, an equalizer of saves and wins serves the purpose of adjoining closers and starters into the same regression. Relievers that are not closers are S-O-L. I can assure you, with a fair amount of certainty, a middle reliever will not win the Cy Young. The aforementioned requires some degree of mathematical ingenuity. The term arbitrary finds itself useful in describing such a process of ingenuity, in addition to being thrown around in describing similar applications. I find it to be a very flexible word of choice. I don’t think it presumptuous to denote the term as having, uniquely, a universal privilege for relating the vicissitudes of the day. When you walk to some particular location, the steps you take are not so much planned as they are the result of having authority over the steps you take for the convenience of arriving at the desired objective. Therefore the route can be considered arbitrary, subjected to the judgment of the walk-man. Arbitrary in mathematics refers to a constant with an undetermined value. Therefore a formula can determine the value, and act as the constant itself. During the MVP regression process, I felt it practical to provide a certain weight to players playing on teams that make the playoffs (since a playoff appearance by a player has a positive correlation to voting points), by adding the square root of wins to itself, producing the formula: Given team wins (TW), solving for weighted team wins (TWx) ![]() The results served the purpose of adequacy, and I didn’t feel it necessary to experiment further. Obviously the weight will be incommensurate with the reality of the situation, though not beyond the risk of ineffably skewing the odds. I did not much deliberate on the matter simply because the resulting odds appeared to be agreeable to reason. To proceed with creating a formula to assimilate starters and closers into one equation, I partitioned the two types of pitchers, and found the correlations. Closers are well represented in the voting practices of the writers, for both leagues. I was initially surprised to find out team wins, and playoff appearance had a negative correlation to voting points. A playoff appearance cost a player around six points. But my vexation was swiftly addressed and pacified through recall of some of the past winners. Call it the Roy Halladay effect, or concomitantly, the Lincecum effect. More in line with reason, ERA and WAR were highly correlated to the trends of the voters, with starting pitchers and closers. As well, saves were treated as consequential with closers, and player wins with starting pitchers. However, that is basically it. Unexpectedly, strikeouts, strikeout ratio, and strikeout/walk ratio created very little substance in the evaluation of positive or negative relationships. FIP and other advanced pitching metrics are yet seen to be invidious creations of new age sabremetricians, and the old habits of the writers persist. (Though with the recent influx of some prominent sabremetricians into the BBWAA, hard-core baseball statistics and advanced metrics may usurp the incapacity of conventional numbers) Now what is left is a painstaking gap to fill, that of saves and wins. How can the two reconcile to prescribe to the limits of sample size? Left to the devices of my own ingenuity, it didn’t take but seven to ten minutes of concerted thought to find resolve. And using the fundamentals of the simple formula concocted from the MVP regression, demonstrated above, I merely jutted the basics until my humble vanity was content. Wins (W), Saves (SV), solving for SVW (Saves to Wins) ![]() Closer wins are putatively rudiments of blown saves, so more or less that is a neutral statistic. One major difference, on this occasion, I decided the conclusion should be instantiated to at least some end and with rationale. And I retrofitted the formula to a formidable interval of average wins. I simply averaged the wins of starters, and then found averaged wins with closers after implementing the formula. I did this for both leagues AL: SP average wins = 18.18 CP average wins = 15.87 NL: SP average wins = 17.83 CP average wins = 14.23 The re-adjustment of saves to wins naturally combats the asymmetry between SP and CP ERA, therefore the weight of the coefficients are equalized. And to further validate the insertion of the formula, Eric Gagne remained the 2003 winner upon applying post-regression numbers. The invention left me with a sense of great satisfaction, of which I do not hesitate to admit. The arbitrary formula, pragmatically, is rather banal outside the spectrum of this particular regression. Its a meaningless scale that is not afforded qualities of general utility in evaluating players. WAR and WPA are sufficient for comparison across subsets of positions. Here are regression coefficients and odds. American League Code:
Source | SS df MS Number of obs = 66
-------------+------------------------------ F( 3, 63) = 47.04
Model | 165575.187 3 55191.7289 Prob > F = 0.0000
Residual | 73919.2691 63 1173.32173 R-squared = 0.6914
-------------+------------------------------ Adj R-squared = 0.6767
Total | 239494.456 66 3628.70387 Root MSE = 34.254
------------------------------------------------------------------------------
votepts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
war | 7.365279 2.725402 2.70 0.009 1.918998 12.81156
era | -29.23094 5.821598 -5.02 0.000 -40.86448 -17.59741
svw | 5.355 1.420173 3.77 0.000 2.517011 8.19299
------------------------------------------------------------------------------
Spoiler: National League Regression results: Code:
Source | SS df MS Number of obs = 64
-------------+------------------------------ F( 3, 61) = 49.35
Model | 210353.074 3 70117.6914 Prob > F = 0.0000
Residual | 86668.9258 61 1420.80206 R-squared = 0.7082
-------------+------------------------------ Adj R-squared = 0.6939
Total | 297022 64 4640.96875 Root MSE = 37.694
------------------------------------------------------------------------------
votepts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
war | 9.735335 2.944883 3.31 0.002 3.846677 15.62399
svw | 6.053746 1.591225 3.80 0.000 2.871894 9.235598
era | -37.1176 7.397198 -5.02 0.000 -51.90921 -22.32599
------------------------------------------------------------------------------
Spoiler: At this point there is nothing to compare the odds to. They haven’t been released offshore or in Vegas. At length I’ll revisit with some thoughts on the matter.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#2
|
|||
|
|||
|
For a reference point, here are the top ten from the ESPN cy young predictor.
AL: CC Sabathia David Price Carl Pavano Rafael Soriano Clay Buchholz Jon Lester Trevor Cahill Joakim Soria Jonathan Papelbon Neftali Feliz NL: Adam Wainwright Roy Halladay Ubaldo Jimenez Heath Bell Billy Wagner Tim Hudson Chris Carpenter Brian Wilson Mat Latos Josh Johnson I honestly think my top ten more resembles the likely voted on top 10, at this point in the season
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
All times are GMT -5. The time now is 04:26 PM.












Linear Mode
