|
|||||||
| Capping All handicapping, betting systems, spreadsheets, mathematics & quantitative technicapping. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
Creating your own MLB line - Pythagorean Method
Anytime you are trying to analyze a market, you need a frame of reference. A way to compare some set of data to what the market formulates, lest you enjoy throwing darts. I like to keep objective, simplified data organized in the most aggregated form possible, meaning a collection of team and league data over the course of a regular season. As the season progresses, then you can start to provide a basic calculation of how the line should be convened and then compare to the set number given by linesmakers.
Early in the season, basically your only form of information to negotiate arise from projections, forecasts, and just your own opinion of how things might occur based on previous seasons. Additionally, finding derivatives of the unobservable through forecast simulations is eminently the most efficient way to appropriate a team by team comparison. MLB has the luxury of myriad sophisticated projection systems that have proved to be quite substantial in their team assessments. PECOTA, CHONE, ZiPs, among some others, are the more prominent ‘Sabremetric-ated’ measures of an unbiased statistical simulation of an entire season. We are fortunate enough to have access to the imminent data required to set the various factors in place , thanks to the efforts of Baseballprojections.com and the Replacement Level Yankees Weblog. So using the individual and team projections from the aforementioned websites, you can come up with a line for each game by placing the pitching matchups in an unconditional spectrum. Basically how would the team expect to perform given the respective starting pitchers pitched every inning of every game. By doing this you can save a little time by finding immediate value, therefore not having to expend additional resources to find precision that would ultimately be supernumerary for what is to be realized. Here is an example from a game I chose at random, using CHONE forecasts of individual and team statistics (You can use whatever projection system you prefer). April 8th , 2010 Cleveland Indians (Masterson) @ Chicago White Sox (Floyd) Open: Chicago -138 / Cleveland +130 Close: Chicago -115 / Cleveland +108 (Open and Close are purely for a reference to my calculated line below, numbers from Pinnacle) Extracting data from Baseball Projections, here are the ERAs of the two starting pitchers: Masterson 4.36 Floyd 4.33 And the Projected Runs scored: Cleveland 816 Chicago 757 I’ll place the starting pitchers’ in a vacuum and calculate the exclusive runs against: Masterson 4.36 x 162 = 706.32 Floyd 4.33 x 162 = 701.46 Now we have runs scored and runs against, apply the Pythagorean method to each team to calculate expected team winning percentage. Cleveland PythW% = 57.17% Chicago PythW% = 54.80% Now we apply the log5 formula, add in home field advantage, and convert to a line: Cleveland win probability = (.5717 – .5480 * .5717) / (.5717 + .5480 – 2 * .5717 * .5480) Cleveland W% = 53.4% Chicago W% = 46.6% Home-field advantage is somewhat tricky. This comprehensive analysis dating back 100 years in baseball found the average from 1991-2002 to be a 7% increase in wins compared to road percentage , or 2.84 wins, with averages displayed for each former decade. Diamond-mind.com took a different approach and merely took the average home winning percentage and subtracted .500, which is a Winning percentage of 50%, or the standard issue average of 81 wins, and found the number to be .042, or 4.2%. But then the visiting team has to be adjusted proportionally to the home increase, equating to a .084 advantage. Whichever number you use is simply preference, I tend to lean to the more regressive way of seeing things, and try to account for the lowest relative scenario in order to determine value. Pretty arbitrary way of doing things, but nonetheless. Therefore I’ll give Chicago a .035 HFA rather than .042. With HFA: Cleveland W% = 49.90% Chicago W% = 50.10% Converting this final number to a line with the spreadsheet formula [=IF(W%>0.5;-(W%/(1-W%))*100;-(-((1-W%)/(W%))*100))]: Cleveland 100 Chicago 100 Add the 10% juice [=IF(line>0;line-(line*0.1);line+(C5*0.1))]: Cleveland -110 Chicago -110 These are obviously 20 cent lines, for 10 cent lines [=IF(line>0;line-(line*0.05);line+(line*0.05))]: Cleveland -105 Chicago -105 By looking at the pinnacle opener (Cleveland +130), its apparent that Cleveland -110 is an expected performance value of 40 cents on the dollar, which is a huge advantage and at length will show a solid rate of return if you can beat the market in such ways consistently. Once the line moved to its closing number of Chicago -115, the value had diminished greatly, and the game reduced to nothing more than a slight lean rather than an investment, in my opinion. Though the Indians went on to win the game, conveniently enough. Getting the best line possible is crucial in baseball because of market fluctuation. Whatever book you use, it may be a viable option to strip the juice from the market number and your own number for an easier comparison, then wager accordingly. Like I said, I’ve completely removed the impact of a bullpen from the equation, which unfortunately is not how baseball operates. You can calculate a team’s bullpen ERA by averaging CHONE individual relief pitchers’ ERA and then determining the expected affect on a particular game by finding the IP/g projected for each starting pitcher. Then the runs surrendered by the bullpen can be subsequently calculated given the ERA. In this case, considerable value has already been established based on the exclusive condition of the starting pitchers, and I think it reasonable to assume that including the variable of the bullpen would not change the fact that the Indians had value versus the opener. No reason to exert additional effort in measuring unnecessary variants. Once the season reaches a statistical flow, maybe June/July, then reality can displace sophisticated speculations and lines can be more easily determined while carrying more certainty. And some of the same methods can be applied. I’ll probably revisit over the course of the season. Might be a good idea to save some of these formulas, making your own functions and templates to use in excel if you feel compelled to make a spreadsheet.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 Last edited by Romanowski; 04-09-2010 at 08:05 AM. |
|
#2
|
|||
|
|||
|
what pythagorean exponent are you using here?
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis |
|
#3
|
|||
|
|||
|
nice work uva. the only piece i dont get is why you dismiss the bullpen so easily.
IMO it is the bullpen that will give you an edge a lot of time because it is the weakest point of every team and the resources are limited. If you also consider that some middle relievers or closers may not be available for a game then it would explain a possible gap between your line and the market. If you doint account for it is it possible that your own line is off and you assume and edge where there isnt one? |
|
#4
|
|||
|
|||
|
Noted Zoom, however I mentioned 40cents on the dollar is a tremendous fundamental advantage, and whatever the affect the bullpen might have, I doubt it is enough to overrun the fact that I found value on the game vs the opener
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#5
|
|||
|
|||
|
The exponent I used was 2 for the pythagorean methods, just the simplified number laid out by Bill James
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#6
|
|||
|
|||
|
Quote:
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#7
|
|||
|
|||
|
Is ERA really a good measure of a pitcher? I wonder how using OPS against in this context (if we could find a way to insert it into the equation) would sharpen the results.
|
|
#8
|
|||
|
|||
|
ERAs just gets you total runs, which is a component of the Pyth equation
UVA provided a link if youre curious on the Pyth Equation, its a good measure for many sports
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis |
|
#9
|
|||
|
|||
|
I'm actually working on creating a conversion from OPS against to runs allowed/inning, I think it would be more accurate than ERA for estimating totals runs and has the added benefit of using numbers that are less "public."
|
|
#10
|
|||
|
|||
|
Awesome...
heres a method too, that should be close to true ERA, but if theres a delta greater than 0.5, its cause to look further xERA=expected ERA = (0.575*Hits allowed per nine) + (0.94*HRs allowed per nine) + (0.28*Walks allowed per nine) - (0.1 * Strikeouts per nine) - (2.68) if a Pitchers xERA is higher than is true ERA, than he was fortunate or lucky ( and will regress) or conversely if his true ERA is higher than his xERA, he pitched better than his numbers indicate
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis Last edited by Romanowski; 04-09-2010 at 04:50 PM. |
|
#11
|
|||
|
|||
|
This is a very good explanation of the Pythag method. One question though, UVA, why do you include the juice when making your own line? Unless you're booking action, I don't really see the point.
Additionally, the number I use for MLB HFA is 2.4%. I looked at 8 or so years of data and found that playing at home adds 2.4% to the home team's win percentage. Maybe we could do a combined study to find the HFA of each individual team/park somewhere down the road. |
|
#12
|
|||
|
|||
|
that interesting you use 2.4%, we had started with a high #, and since went down to 3.5%
how are bases treating you thus far?
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis |
|
#13
|
|||
|
|||
|
I'm doing very well but will admit I've been on the good side of many breaks (ATL tonight being one of them). I expect it to even out but I do think I have a +EV model for the time being. Yourself?
|
|
#14
|
|||
|
|||
|
I started out the gates very well, with about 20% return after only a week
looking like down to 16% after tonight, hard to expect that kind of growth rate though Ill contact you in the next few days, to see what we could collaborate on
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis |
|
#15
|
|||
|
|||
|
Quote:
How did you fellas come up with your HFA numbers? |
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
All times are GMT -5. The time now is 04:28 PM.









Linear Mode

