Go Back   Sports Handicapping Forum > Welcome Forums > Main Street > Capping

Capping All handicapping, betting systems, spreadsheets, mathematics & quantitative technicapping.

Reply
 
LinkBack Thread Tools Display Modes
  #1  
Old 08-14-2010, 11:21 PM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
MVP Odds for Baseball

Any eight year old baseball fanatic can isolate the handful of players in each league that are likely to win the MVP. And with a high degree of certainty, the winner being inevitably chosen from that handful of players is justifiably expected. Popularity is obviously a major indicator of likely MVP consideration. But I have a deep suspicion in any line of reasoning derived from untested opinion or visceral assumptions, rather than some form of empiricism. This calls for a regression model, using the voting trends from the last ten years (2000-2009).

Qualitative measures are hard to analyze. Let’s use Carlos Gonzales as an example. Playing with the mid-market profile Rockies, qualitative points are by default removed simply because of the perceived Coor’s Field offensive inflation factor, as well as not being in a major market. His stats may be considered deceiving, and for him to have a realistic chance, even though the raw numbers would indicate evidence to the contrary, the Rockies will probably have to win the NL West. Immediately this brings forth a comparison to Matt Holiday, who also played for the Rockies, Wild-Card winner in 2007, the year Holiday finished 2nd in the MVP, despite leading the league in total bases, RBI, batting average, doubles, and tied for 2nd in the league in OPS as well as a top 5 WAR. The eventual winner, Jimmy Rollins, playing for the major market Philadelphia Phillies, had a underwhelming OPS+ (119), a WAR rated three spots lower than Holiday, and had the luxury of playing in a ballpark that had yet to reach its current consensus status of bandbox.

Not saying Rollins was undeserving, merely attempting to create a proxy for the categorical particulates that may dim the chances of certain players receiving the votes necessary to win.

Certainly there are other intangibles that can manifest themselves spontaneously, and at times erroneously, because ultimately the votes accumulated for the MVP award is attributed to the subjectivity of the writers.

To expound further in that respect, a meta-analysis, which would involve an overall abstract of each player’s popularity level and his respective team’s status in relation to the configured mindset of the writers responsible for the voting process, could be broken down into various components that enables a measure of quantification. But that would take too long.

The goal here is to search for value, in order to accomplish this, creating odds to compare to the actual odds is the best approach. Then evaluating the players from a qualitative standpoint. Which I won’t address here. But as the season reaches its end I will update the prevailing consensus concerning the likely top MVP candidates, and adjust my odds accordingly.

A multivariate analysis using Stata, a program that operates with a sharp understanding on the benefits of parsimonious exertion, allows for a flexible and facilitating process.

I found the correlations between each crucial stat line and the overall voting points, which is what decides the winner of the MVP. Of course, I separated the American and National League, each having their separate yet similar MVP selection process.

I stripped pitcher’s from the MVP equation, and will revisit the pitchers later when I assess the Cy Young odds.

The variables used were team wins, WAR, batting average, home runs, RBI, runs, stolen bases. These were chosen after some trial and error, but the inherent value of the variables used can certainly reconcile with logic. WAR may be considered redundant in terms of information content, as well as redundant in compatibility with the other player variables. However, WAR has its advantages because of the all-inclusive nature of the statistic: Position adjustments, defense, sophisticated offense metrics. It is an overall measure of player viability that has yet to hit mainstream and a faction of writers may not even attempt to recognize or consider it as part of their voting practices. Since the regression coalesces to the tendencies of the voters by way of voting points, the most common statistics must be included, and redundancies are to be expected. Runs and HRs are not mutually exclusive variables, neither are Home runs and Slugging, or even Home runs and Batting Average. All the preceding events happen simultaneously.

One more thing before I proceeded with both leagues, I created an arbitrary though sufficient way of invoking the playoff variable. Teams primed for a playoff position were given an extra weight in wins, for team wins have a positive, albeit stunningly slight, correlation to voting points. The weight was calculated by adding the square root of projected team wins to itself. So a team on pace for 100 wins would be credited with 110 if they are likely to make the playoffs, and conversely, a team would not be weighted if they had a projection of 90 wins with no post-season prospects. Its a sliding scale. Again, arbitrary, but after messing around with the correlations, this seemed to be a solid method of including the post-season factor.

American League

Here is the correlation matrix for the American League (‘twx’ indicates adjusted team wins , everything else is labeled appropriately):


Spoiler:

Code:
             |  votepts      twx      war       hr      rbi        r       ba      ops      obp      slg       sb
-------------+---------------------------------------------------------------------------------------------------
     votepts |   1.0000
         twx |   0.2800   1.0000
         war |   0.5355   0.0654   1.0000
          hr |   0.3809  -0.0676   0.2404   1.0000
         rbi |   0.4573  -0.0265   0.2614   0.8230   1.0000
           r |   0.4026   0.0392   0.5326   0.2707   0.3019   1.0000
          ba |   0.3341  -0.0767   0.3763  -0.2653  -0.0843   0.1744   1.0000
         ops |   0.4841  -0.0397   0.4874   0.6781   0.5803   0.2022   0.3519   1.0000
         obp |   0.3946   0.0610   0.5200   0.2222   0.1924   0.1843   0.5332   0.8006   1.0000
         slg |   0.4560  -0.0863   0.3952   0.8119   0.6937   0.1798   0.2019   0.9496   0.5726   1.0000
          sb |  -0.0891   0.0456   0.1202  -0.4882  -0.5237   0.3247   0.0853  -0.4473  -0.1977  -0.5093   1.0000


Stolen bases having a negative correlation may seem surprising, but at this point I feel it may be appropriate to refer to Gould’s ‘bio-mechanical limit’ theory to dignify. However, I won’t rhapsodize further (or even at all), for fear of frivolous in-articulation, as well as a mind-hurtling digression that is completely unnecessary.

The coefficients:

Spoiler:

Code:
  Source |       SS       df       MS              Number of obs =     212
-------------+------------------------------           F(  5,   206) =   48.97
       Model |  1058497.82     5  211699.563           Prob > F      =  0.0000
    Residual |  890627.995   206  4323.43687           R-squared     =  0.5431
-------------+------------------------------           Adj R-squared =  0.5320
       Total |  1949125.81   211  9237.56309           Root MSE      =  65.753

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         twx |   2.060374   .3189046     6.46   0.000     1.431638    2.689109
         war |   11.34832   3.032244     3.74   0.000     5.370106    17.32653
          hr |   2.330891    .661193     3.53   0.001     1.027318    3.634464
          ba |   1362.608   226.1278     6.03   0.000     916.7862    1808.429
          rp |   1.278251   .4477948     2.85   0.005     .3954025    2.161099
       _cons |  -782.4358   79.27622    -9.87   0.000    -938.7326   -626.1391
------------------------------------------------------------------------------


The variable ‘rp’ indicates Runs Produced, which is simply . The voting tendency of the AL MVP, being an offensively geared league as compared to the NL, sees a leaning towards an inclusion of runs scored as well as RBI. For ocularity I just inserted Runs produced into the process, and running the two regressions onto voting points result in virtually identical descriptive statistics.

Now with the given data above its easier to construct a formidable scale of probability. Then later the scale can be leveraged to the general nature of the players involved (what team they play for, etc…).

Here are the odds of the AL MVP, using players that registered voting points produced by the above coefficients and the current season stats. Current in this case means current projection/pace. All statistics were purely flat projections, except for Batting Average which was regressed to a .285 over the remaining number of likely ABs for each player (.285 based solely on my preference, applies to NL as well).


Spoiler:




Once the real odds are released I’ll re-address the table and strip away any players that are unlikely to win. A brief survey and one can identify about half a dozen that have a very slim chance of being considered. The odds will be adjusted accordingly.

National League

There is a systematic infection of irreducibility that arises after analyzing the NL data corresponding to Barry Bonds. I removed Bonds for obvious reasons, he skews the entire process. After removing Bonds I re-allocated the voting points to the other candidates, with some reason and adequacy in the end, but didn’t put too much thought into it so it could have been better.

The rest of the process was the same as the AL. For the NL, there was a small difference in the variables that were selected as a result of the most optimal regression, and they are apparent by observing the results below. In both leagues, the four major variables (batting average, RBI, HR, wins) provide a much expected symbiotic relationship with the distribution of voting points.

The NL correlation matrix:

Spoiler:

Code:
      
             |  votepts      twx      war       hr      rbi        r       ba      ops      obp      slg       sb
-------------+---------------------------------------------------------------------------------------------------
     votepts |   1.0000
         twx |   0.2844   1.0000
         war |   0.5730   0.1194   1.0000
          hr |   0.5342  -0.1629   0.3935   1.0000
         rbi |   0.5187  -0.1536   0.3689   0.8242   1.0000
           r |   0.4161  -0.1116   0.5578   0.4451   0.4117   1.0000
          ba |   0.2885  -0.0937   0.4192  -0.0546   0.0701   0.1110   1.0000
         ops |   0.5379  -0.1334   0.5907   0.6584   0.5781   0.2957   0.6023   1.0000
         obp |   0.3718  -0.0757   0.5785   0.2745   0.2970   0.2247   0.6984   0.8366   1.0000
         slg |   0.5562  -0.1467   0.5269   0.7705   0.6488   0.2956   0.4834   0.9629   0.6579   1.0000
          sb |  -0.0993  -0.0047  -0.0051  -0.3482  -0.3985   0.2915  -0.1109  -0.3776  -0.2489  -0.3972   1.0000


Here are the regression results:

Spoiler:
Code:
            Source |       SS       df       MS        Number of obs =     216
-------------+------------------------------           F(  6,   209) =   54.96
       Model |  1696461.17     6  282743.528           Prob > F      =  0.0000
    Residual |  1075285.46   209  5144.90649           R-squared     =  0.6121
-------------+------------------------------           Adj R-squared =  0.6009
       Total |  2771746.63   215  12891.8448           Root MSE      =  71.728

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         twx |   3.310143     .38732     8.55   0.000     2.546588    4.073697
         war |   8.539037   3.239998     2.64   0.009     2.151772     14.9263
          hr |   4.478822   .8344577     5.37   0.000     2.833789    6.123855
          ba |   1313.972   239.9688     5.48   0.000     840.9021    1787.041
         rbi |   1.012889    .395473     2.56   0.011     .2332621    1.792517
          sb |   1.402283   .4044013     3.47   0.001     .6050547    2.199512
       _cons |  -916.8518   91.12105   -10.06   0.000    -1096.486   -737.2176
------------------------------------------------------------------------------


NL MVP Odds:

Spoiler:




Obviously one couldn't anticipate Pujols being given odds any greater than 2.5 or 3, same for the AL leader Hamilton. Once I remove the Jeters and Harts from the process, the odds will likely equalize to a more realistic number. Add some overround and the odds will probably be released around the interval from 1.5 to 2.5.

I gave teams that are in playoff/non-playoff contiguity the benefit of the doubt in both leagues (i.e. Red Sox, Rockies). If value is to be found it can only be found in a scenario that enables the most strict possible calculation of probability.

The spectrum of calculating odds is tethered to the number of players considered as candidates. With time and some employment of elementary baseball logic, comes a higher semblance of accuracy. Once the season progresses I can strive for more precision.

I’ll revisit as players vault in and out of MVP candidacy.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000

Last edited by uva3021; 08-15-2010 at 04:06 AM.
Reply With Quote
  #2  
Old 08-14-2010, 11:46 PM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
If I have any feel for the futures market at all, I would think after removing some noise and isolating the top 5, Adrian Gonzales may have the best overall value from either league. I can't see the odds on him being placed any lower than 8/1, it'll probably be around 10/1. And him winning is not outside the realm of possibility, especially if he closes the season well and Votto's numbers dwindle a little.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #3  
Old 08-15-2010, 09:43 AM
Hall Of Fame '11
 
Join Date: Aug 2004
Location: Philadelphia
Posts: 35,968
Rewards: 475
agree with A Gon
Reply With Quote
  #4  
Old 08-16-2010, 01:00 AM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
I guess it's time to finally grow a pair and get Stata which also means I'll need to get over myself and start learning how to write some actual code. Brilliant post, as always UVA, but I do have to question if analyzing the MVP futures market is the best use of your efforts/knowledge considering (1) limits and (2) you told me in an e-mail not too long ago that you didn't have time or inclination to study NFL pre-season which would likely yield more equity per time invested than MLB MVP voting (although my NFLX teasers got shelled over the weekend so it may have been a good choice).

Nonetheless, it's another great post (one that might inspire me to throw a few logs on Gonzalez) and a good lesson in how to quantitative analyze markets using regression - the first step towards any intrinsic-value seeking model.

Last edited by IrishTim; 08-16-2010 at 01:02 AM.
Reply With Quote
  #5  
Old 08-16-2010, 05:22 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
just in general I can't envision constructing a model that finds and sort of advantageous resolve in NFL statistics. Its the NFL. All teams are built to be even, and all teams pretty much are except for two players, QB and LT/RT.

I think the best thing to do is play off lines at BetUS and SIA until week 12-13 then maybe I can decipher some of the data, using accumulated lines.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #6  
Old 08-16-2010, 08:20 AM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
I don't disagree with you re: NFL at all. Matter of fact, that is precisely the strategy I will employ. And even if you do come up with a +EV model, you might not turn a profit for 5 years because of the small sample and by that time the market will have changed drastically. NFLX teasers are a different story...
Reply With Quote
  #7  
Old 08-16-2010, 08:58 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
sorry WAR values are wrong, i forgot to use projected, which would change the distribution of odds somewhat, pujols and votto still the considerable favs
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #8  
Old 08-16-2010, 09:05 AM
Registered User
 
Join Date: Dec 2006
Location: philly
Posts: 24,075
Rewards: 533
wow, pretty legit stuff here uva ... nicely done.
Reply With Quote
  #9  
Old 08-16-2010, 01:49 PM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
fixed odds, forgot to project WAR, and I had a typo in my excel sheet in odds calculation so odds were off



__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000

Last edited by uva3021; 08-16-2010 at 01:53 PM.
Reply With Quote
  #10  
Old 08-16-2010, 02:46 PM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
Where do you see the best prices (lowest vig) on this market offshore?
Reply With Quote
  #11  
Old 08-16-2010, 08:44 PM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
best price and lowest vig can be mutually exclusive <tup>

from my experience however, BetUS, Bodog, and SIA, which you would expect to have nice longshots, are typically parsimoniously inclined to add insane amount of overround, i've seen BetUS throw up 40% overround before on player futures

Pinnacle obviously would have the lowest vig, not sure about some of the best long shot prices, it varies

Greek has been fair in the past, so has betjm
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #12  
Old 08-16-2010, 10:23 PM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
lol I guess those were two separate questions I mushed into one.

1) Who has the best price on A Gonzalez?
2) Which book offers the best prices/lowest vig across the board?
Reply With Quote
  #13  
Old 08-17-2010, 02:23 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
MVP prices aren't out yet, my preliminary strategy is to put some on pujols as insurance for A Gonzo
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #14  
Old 08-17-2010, 04:08 AM
Registered User
 
Join Date: Mar 2005
Posts: 2,483
Rewards: 318
very impressed with your stats but pujols will win it, the media loves him
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -5. The time now is 05:52 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2012, vBulletin Solutions, Inc.