|
|||||||
| Capping All handicapping, betting systems, spreadsheets, mathematics & quantitative technicapping. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
MLB Moneyline to Win Correlation
I’ve been preoccupied recently with trying to apply the convened line throughout the year to an aggregated set of seasonal data. I was able to incorporate this concept into college basketball, to some consequential end. Taking an average line appropriated and comparing in a correlative fashion to the rest of the league can demonstrate where a team falls in accordance with the linesmakers’ perception and ratings, measuring the “DNA” of the league as a whole, where the teams act as corresponding points on the structure. With NCAAB, I used the average line each team accumulated throughout the conference season, compared to the efficiency statistics such as differential and tempo, and formulated not only a predicted score set by the vegas number on each game, but their expected W/L records and against the spread W/L record. The results were more than satisfactory, and proved to be of imminent practice when filling out my bracket for the tournament.
The MLB season starts in a few days, so I’ve formalized a sample of data spanning the last five regular season, and practicing similar methods utilized from NCAAB. One difference in discerning a team’s average line is dealing with moneylines instead of spreads. While baseball has the standard 1.5 runline, which is the assigned line for each game essentially, a more practical system would use the moneyline as the main determinant of seasonal trends, purely based on my judgment. So now as we move from general collection of data to the particulars, how should the moneyline be averaged? Based on simple mathematics, averaging American Odds (-150 / +150) is an impractical calculation. The magnitudes being outside the realm of formulating a mean number consistent with the data. The logical thing to do is to convert the American Odds to the more manageable Decimal Odds (2.1 / 1.9). Decimal Odds, commonly referred to European Odds, use in theory the same percentage chance of winning, but are represented in obviously a different fashion. Here is the simple formula to convert Decimal to American: American = decD(100)-100 American = 100/(1-decF) Where dec is decimal, ‘D’ is the Dog, and ‘F’ is the Absolute value of Favorite. And this base formula can be reversed to convert American to Decimal given the appropriate numbers: Decimal = (amerD+100)/100 Decimal = (amerF+100)/amerF ‘amer’ represents American, ‘D’ is Dog, ‘F’ is the Absolute value of the Favorite. After figuring out the average spread for each team, I then measure how that number can be related to expected win total. For basketball, because the average spread often mirrors, or closely resembles a team’s efficiency margin (which is a testament to the linesmakers), I converted the spread to a score, using the same system as applied by an efficiency score. Then using the formulated score, calculated expected wins and losses via the Pythagorean method. (I also tried to incorporate a normal distribution in order to arrive at an expected figure, but on second thought a normal distribution is hypersensitive to the magnitude of data, and essentially realty, therefore not very practical). Calculating the expected wins by way of average moneylines is a different animal. The score of the game is a not a variable for this particular stipulation. After much thoughtful and sensible deliberation, I discovered there are two reasonable methods of calculating expected wins via the spread. One is simple and direct, and is independent of the league itself. Take the average spread and find the implied win probability. This is explained sufficiently in detail by Romanowski here. To briefly explain: For the underdog – 100/(100+D) For the favorite – F/(100+F) +105 = 48.8%, -105= 51.2% This creates a percentage total with a sum of 100%. Of course the previous example does not account for the 10% vig given to each side. The even vigorish spectrum is a myth, and does not exist in the world of sports gambling unless someone has the disposition to lie about such things. Irregardless, this is a pretty direct way to calculate win expectancy. A more systematical approach includes a correlation to seasonal trends throughout the league. If you compare each time’s line to their win total, there should be in all probability be a relatively high linear relationship. This in fact is the case, taking the assortment of data encompassing the previous five years. It is very interesting how the two interact. The accumulated linear correlation from 2005-2009 is a little over an 80% trend. Meaning 80% of the time a team’s actual and expected wins approaches or falls on the trend line for that set of data. The reason for using win total as opposed to Winning Percentage, is to factor in the determinant of games played as a variable in a team’s season win total. While for a Winning Percentage the number of games played is a non-factor, and essentially the percentage is relative to infinity. I found when comparing a 162 game schedule to a percentage of an undetermined amount of games, the discrepancies on average fall within a 3-5% differential, not conducive to an accurate form of data analysis. And to remain consistent, everything I calculate is in relation to Wins and not Winning Percentage. To show how moneylines correlate, here is a graph for the ocular: ![]() The titles of the graph and respective x and y axis should be sufficient to explain the data used for the graph. This is the five year trend for the correlation between moneyline and wins. The line marks a slightly above 80% correlation (the exact number being 80.3). The equation shown is what is used to determine expected wins where x = Decimal Odds. What is remarkable is the average aggregate moneyline for each season. The average in every season except the 2005 regular season was +103, and for 2005 it was +104. (Immaculately disgusting consistency displayed by the linesmakers. How are we suppose to compete with that.) The correlating wins using each season’s average line is 81 (five-year accumulative and every season), as would be expected, yet even still an astounding convenience. The blueprint being laid out, my epistolatory efforts now dangerously reaching the point of fatal exhaustion, and having adequately explained the process as is within my capacity for explanation, here are the tables for the 2008 season, and 2009 season, with the appropriate Pythagorean wins for each year, season win futures from 5dimes for the season following, and PECOTA projections. Some of the ancillary data is not divulged here in order to satisfy the limits imposed by the normal pixel dimensions. All the 2005-2009 season data is here, with all columns exposed to view. I’ll offer a quick dirge below the table explaining what is apparent, and the in-season assortment of data. Then revisit on subsequent days on occasion throughout the season.
![]() ![]()
![]() ![]() Line/Win Correl = Linear relationship between moneyline and wins Avg Line = Average American Odds DEC = Decimal Odds LW = Linear Wins resulting from the linear formula DW = Direct Wins using percentage converting from moneyline then multiplying by the number of games played PW = Pythagorean Wins F = Future win total for next season P = PECOTA projections for next season DIFF = Differential between the adjacent win expectations and the actual GREEN = Expected above actual RED = Expected below actual I threw the units into the table for flavor. (Units won are ultimately subjective, different books post different numbers, and getting the best odds on a game can considerably enhance your final ROI.) The lines used are closing numbers from what is given via Statfox. And different books may permit a slight variation in the delineation of season trends, but with the five year sample, the differences in negotiable moneylines would inevitably fluctuate to the point of a collective equilibrium across all markets. It is obvious that a team would have a higher ROI by degrees as they exceed their expectation set forth by the average moneylines, and the opposite is true as well. Therefore teams displayed in GREEN underperformed the market, and have a ROI mostly commensurate. Its also reasonable to assume that teams having to negotiate the highest moneyline during the regular season would have secured a playoff position by season’s end. Eight of the top twelve made the playoffs in 2009, and eight of the top 14 in 2008. Refer to the link to the sheet for any further evaluations from 2005-2009. The dynamics of a team can drastically be reconfigured through free agency, injuries, and other determinant from one season to the next, so as a predictor the Linear correlary would not seem to be a measurable to conduct future projections. However if you are a proponent of the natural regression of probability to the mean, then perhaps an overly conspicuous performance to expectation differential can strike interest. For example in 2008, Seattle, San Diego, Detroit and Atlanta vastly underperformed the market and happened to go beyond their future win total the following season as posted by 5dimes. Washington was the only team that had an over 10% differential to go under their future win total from 2008-2009. The opposite condition of -10% expectation vs performance did not induce similar results, as teams were 2-2 in projecting to next year using LW versus their assigned future. In 2010, Washington, Pittsburgh, Kansas City, Cleveland, and Arizona may undergo an upward progression to the mean, which may or may not be a legitimate prospect for future bettors to consider. For this season, I have assembled an excel sheet to grab the appropriate data by way Statfox, for convenience. Its imperative that I mention how valuable Statfox are in extracting the data and easily transferring to a table of my own creation, similar to what is displayed above using the devices that Openoffice or Excel have provided for our disposal. The data is serried as to automatically update once the spreadsheet is opened, without the need for me to do any substantial amount of exertion, previously used in proportion in setting up the table and evaluating the data from 2005-2009.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 Last edited by Romanowski; 03-31-2010 at 08:23 PM. |
|
#2
|
|||
|
|||
|
looks great :)
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis |
|
#3
|
|||
|
|||
|
I'm very impressed ! Keep it coming.
PLP |
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
All times are GMT -5. The time now is 05:51 PM.












Linear Mode
