Go Back   Sports Handicapping Forum > Welcome Forums > Main Street > Capping

Capping All handicapping, betting systems, spreadsheets, mathematics & quantitative technicapping.

Reply
 
LinkBack Thread Tools Display Modes
  #1  
Old 04-09-2010, 06:24 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
Creating your own MLB line - Pythagorean Method

Anytime you are trying to analyze a market, you need a frame of reference. A way to compare some set of data to what the market formulates, lest you enjoy throwing darts. I like to keep objective, simplified data organized in the most aggregated form possible, meaning a collection of team and league data over the course of a regular season. As the season progresses, then you can start to provide a basic calculation of how the line should be convened and then compare to the set number given by linesmakers.


Early in the season, basically your only form of information to negotiate arise from projections, forecasts, and just your own opinion of how things might occur based on previous seasons. Additionally, finding derivatives of the unobservable through forecast simulations is eminently the most efficient way to appropriate a team by team comparison. MLB has the luxury of myriad sophisticated projection systems that have proved to be quite substantial in their team assessments. PECOTA, CHONE, ZiPs, among some others, are the more prominent ‘Sabremetric-ated’ measures of an unbiased statistical simulation of an entire season. We are fortunate enough to have access to the imminent data required to set the various factors in place , thanks to the efforts of Baseballprojections.com and the Replacement Level Yankees Weblog.


So using the individual and team projections from the aforementioned websites, you can come up with a line for each game by placing the pitching matchups in an unconditional spectrum. Basically how would the team expect to perform given the respective starting pitchers pitched every inning of every game. By doing this you can save a little time by finding immediate value, therefore not having to expend additional resources to find precision that would ultimately be supernumerary for what is to be realized.



Here is an example from a game I chose at random, using CHONE forecasts of individual and team statistics (You can use whatever projection system you prefer).


April 8th , 2010
Cleveland Indians (Masterson)
@
Chicago White Sox (Floyd)


Open: Chicago -138 / Cleveland +130
Close: Chicago -115 / Cleveland +108


(Open and Close are purely for a reference to my calculated line below, numbers from Pinnacle)


Extracting data from Baseball Projections, here are the ERAs of the two starting pitchers:


Masterson 4.36
Floyd 4.33


And the Projected Runs scored:


Cleveland 816
Chicago 757



I’ll place the starting pitchers’ in a vacuum and calculate the exclusive runs against:


Masterson 4.36 x 162 = 706.32
Floyd 4.33 x 162 = 701.46


Now we have runs scored and runs against, apply the Pythagorean method to each team to calculate expected team winning percentage.


Cleveland PythW% = 57.17%
Chicago PythW% = 54.80%


Now we apply the log5 formula, add in home field advantage, and convert to a line:


Cleveland win probability = (.5717 – .5480 * .5717) / (.5717 + .5480 – 2 * .5717 * .5480)


Cleveland W% = 53.4%
Chicago W% = 46.6%


Home-field advantage is somewhat tricky. This comprehensive analysis dating back 100 years in baseball found the average from 1991-2002 to be a 7% increase in wins compared to road percentage , or 2.84 wins, with averages displayed for each former decade. Diamond-mind.com took a different approach and merely took the average home winning percentage and subtracted .500, which is a Winning percentage of 50%, or the standard issue average of 81 wins, and found the number to be .042, or 4.2%. But then the visiting team has to be adjusted proportionally to the home increase, equating to a .084 advantage. Whichever number you use is simply preference, I tend to lean to the more regressive way of seeing things, and try to account for the lowest relative scenario in order to determine value. Pretty arbitrary way of doing things, but nonetheless. Therefore I’ll give Chicago a .035 HFA rather than .042.



With HFA:
Cleveland W% = 49.90%
Chicago W% = 50.10%


Converting this final number to a line with the spreadsheet formula [=IF(W%>0.5;-(W%/(1-W%))*100;-(-((1-W%)/(W%))*100))]:


Cleveland 100
Chicago 100


Add the 10% juice [=IF(line>0;line-(line*0.1);line+(C5*0.1))]:


Cleveland -110
Chicago -110


These are obviously 20 cent lines, for 10 cent lines [=IF(line>0;line-(line*0.05);line+(line*0.05))]:


Cleveland -105
Chicago -105


By looking at the pinnacle opener (Cleveland +130), its apparent that Cleveland -110 is an expected performance value of 40 cents on the dollar, which is a huge advantage and at length will show a solid rate of return if you can beat the market in such ways consistently. Once the line moved to its closing number of Chicago -115, the value had diminished greatly, and the game reduced to nothing more than a slight lean rather than an investment, in my opinion. Though the Indians went on to win the game, conveniently enough.


Getting the best line possible is crucial in baseball because of market fluctuation. Whatever book you use, it may be a viable option to strip the juice from the market number and your own number for an easier comparison, then wager accordingly.


Like I said, I’ve completely removed the impact of a bullpen from the equation, which unfortunately is not how baseball operates. You can calculate a team’s bullpen ERA by averaging CHONE individual relief pitchers’ ERA and then determining the expected affect on a particular game by finding the IP/g projected for each starting pitcher. Then the runs surrendered by the bullpen can be subsequently calculated given the ERA.

In this case, considerable value has already been established based on the exclusive condition of the starting pitchers, and I think it reasonable to assume that including the variable of the bullpen would not change the fact that the Indians had value versus the opener. No reason to exert additional effort in measuring unnecessary variants.


Once the season reaches a statistical flow, maybe June/July, then reality can displace sophisticated speculations and lines can be more easily determined while carrying more certainty. And some of the same methods can be applied. I’ll probably revisit over the course of the season.


Might be a good idea to save some of these formulas, making your own functions and templates to use in excel if you feel compelled to make a spreadsheet.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000

Last edited by Romanowski; 04-09-2010 at 08:05 AM.
Reply With Quote
  #2  
Old 04-09-2010, 07:17 AM
the straightshooter
 
Join Date: Aug 2004
Location: an onrushing tram, on collision course
Posts: 46,820
Rewards: 855
what pythagorean exponent are you using here?
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance

Check out Technicapping for quantitative sport analysis
Reply With Quote
  #3  
Old 04-09-2010, 07:51 AM
Green before it was cool
 
Join Date: Apr 2007
Location: b-town, VT
Posts: 23,061
Rewards: 2,959
nice work uva. the only piece i dont get is why you dismiss the bullpen so easily.
IMO it is the bullpen that will give you an edge a lot of time because it is the weakest point of every team and the resources are limited. If you also consider that some middle relievers or closers may not be available for a game then it would explain a possible gap between your line and the market. If you doint account for it is it possible that your own line is off and you assume and edge where there isnt one?
Reply With Quote
  #4  
Old 04-09-2010, 07:59 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
Noted Zoom, however I mentioned 40cents on the dollar is a tremendous fundamental advantage, and whatever the affect the bullpen might have, I doubt it is enough to overrun the fact that I found value on the game vs the opener
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #5  
Old 04-09-2010, 08:00 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
Quote:
Originally Posted by Romanowski View Post
what pythagorean exponent are you using here?
The exponent I used was 2 for the pythagorean methods, just the simplified number laid out by Bill James
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #6  
Old 04-09-2010, 08:05 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
Quote:
Originally Posted by ZoomVT View Post
nice work uva. the only piece i dont get is why you dismiss the bullpen so easily.
IMO it is the bullpen that will give you an edge a lot of time because it is the weakest point of every team and the resources are limited. If you also consider that some middle relievers or closers may not be available for a game then it would explain a possible gap between your line and the market. If you doint account for it is it possible that your own line is off and you assume and edge where there isnt one?
Hey zoom once the season starts rounding into form romanowski and I will start posting era splits and bullpen variables with line, and try to find other aspects that hold substantial weight
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #7  
Old 04-09-2010, 01:09 PM
Registered User
 
Join Date: Apr 2004
Location: Redwood City, CA
Posts: 6,515
Rewards: 764
Is ERA really a good measure of a pitcher? I wonder how using OPS against in this context (if we could find a way to insert it into the equation) would sharpen the results.
Reply With Quote
  #8  
Old 04-09-2010, 01:17 PM
the straightshooter
 
Join Date: Aug 2004
Location: an onrushing tram, on collision course
Posts: 46,820
Rewards: 855
ERAs just gets you total runs, which is a component of the Pyth equation

UVA provided a link if youre curious on the Pyth Equation, its a good measure for many sports
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance

Check out Technicapping for quantitative sport analysis
Reply With Quote
  #9  
Old 04-09-2010, 01:28 PM
Registered User
 
Join Date: Apr 2004
Location: Redwood City, CA
Posts: 6,515
Rewards: 764
Quote:
Originally Posted by Romanowski View Post
ERAs just gets you total runs, which is a component of the Pyth equation

UVA provided a link if youre curious on the Pyth Equation, its a good measure for many sports
I'm actually working on creating a conversion from OPS against to runs allowed/inning, I think it would be more accurate than ERA for estimating totals runs and has the added benefit of using numbers that are less "public."
Reply With Quote
  #10  
Old 04-09-2010, 01:35 PM
the straightshooter
 
Join Date: Aug 2004
Location: an onrushing tram, on collision course
Posts: 46,820
Rewards: 855
Awesome...

heres a method too, that should be close to true ERA, but if theres a delta greater than 0.5, its cause to look further

xERA=expected ERA

= (0.575*Hits allowed per nine) + (0.94*HRs allowed per nine) + (0.28*Walks allowed per nine)

- (0.1 * Strikeouts per nine) - (2.68)

if a Pitchers xERA is higher than is true ERA, than he was fortunate or lucky ( and will regress)

or conversely if his true ERA is higher than his xERA, he pitched better than his numbers indicate
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance

Check out Technicapping for quantitative sport analysis

Last edited by Romanowski; 04-09-2010 at 04:50 PM.
Reply With Quote
  #11  
Old 04-20-2010, 08:32 PM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
This is a very good explanation of the Pythag method. One question though, UVA, why do you include the juice when making your own line? Unless you're booking action, I don't really see the point.

Additionally, the number I use for MLB HFA is 2.4%. I looked at 8 or so years of data and found that playing at home adds 2.4% to the home team's win percentage. Maybe we could do a combined study to find the HFA of each individual team/park somewhere down the road.
Reply With Quote
  #12  
Old 04-20-2010, 09:54 PM
the straightshooter
 
Join Date: Aug 2004
Location: an onrushing tram, on collision course
Posts: 46,820
Rewards: 855
that interesting you use 2.4%, we had started with a high #, and since went down to 3.5%

how are bases treating you thus far?
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance

Check out Technicapping for quantitative sport analysis
Reply With Quote
  #13  
Old 04-20-2010, 10:04 PM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
I'm doing very well but will admit I've been on the good side of many breaks (ATL tonight being one of them). I expect it to even out but I do think I have a +EV model for the time being. Yourself?
Reply With Quote
  #14  
Old 04-20-2010, 10:07 PM
the straightshooter
 
Join Date: Aug 2004
Location: an onrushing tram, on collision course
Posts: 46,820
Rewards: 855
I started out the gates very well, with about 20% return after only a week

looking like down to 16% after tonight, hard to expect that kind of growth rate though

Ill contact you in the next few days, to see what we could collaborate on
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance

Check out Technicapping for quantitative sport analysis
Reply With Quote
  #15  
Old 04-21-2010, 10:35 PM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
Quote:
Originally Posted by Romanowski View Post
that interesting you use 2.4%, we had started with a high #, and since went down to 3.5%

how are bases treating you thus far?
I was looking at a few things tonight and decided to recheck this. Going back to '03, home teams have won 54.5% of the games. Assume that over 17k+ games in the sample, that strength of schedule and all that evens out. So that would mean that playing at home adds about 2.25% to the home team's win percentage (and thus subtracts 2.25% from the away team's).

How did you fellas come up with your HFA numbers?
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -5. The time now is 04:28 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2012, vBulletin Solutions, Inc.