|
|||||||
| Capping All handicapping, betting systems, spreadsheets, mathematics & quantitative technicapping. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
MLB Linemaker spreadsheet
Process explained in some of the previouss posts about bullpen usage and creating a line. Here is the excel sheet that uses all those calculations to automatically create a line, it's pretty comprehensive and i'll answer questions here.
After downloading, go to MAIN sheet and refresh all the data by the data>>refresh tab on the toolbar. Then after it updates, hit both buttons that are present only once. Please only push the SORTSTATS button once after update or you'll probably have to start over. The lines are extracted from Pinny, you can refresh that data as often as you can, just abide by the 60 second refresh rule that Pinny allows, lest you want your IP to be banned. In the MAIN sheet there will be a list of games with their respective lines. At the top of the page, there are two drop down boxes. The top dropdown indicates what factor you want to use to determine the starting pitcher's runs allowed. For now its either CHONE preseason projections (in season are not yet available) and ZiPS in season projections which I update every Monday and Friday. THe second drop down is offense, either Runs Projected or Runs Pace. Runs projected is also CHONE projections but updated April 30th. Runs pace is current runs scored factored to 162 games. The two lines on the right are nonbullpen and bullpen adjusted, which I explained at length a few threads down. Some sheets and columns are hidden from view, feel free to explore, but unless you know what you are doing I wouldn't touch the formulas or macros. There is a fun little SIM tab that for now just simulates the win/loss of each game, a user determined number of times. I have a sheet that simulates a score but its in alpha status and don't think I should upload it yet. Post here if you have suggestions, questions, comments. ENABLE MACROS
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#2
|
|||
|
|||
|
Hey uva,
This is really exceptional stuff and I'd just like to thank you for all your hard work in sharing this. I have a couple of quick questions if you wouldn't mind answering them: 1. I've noticed that the projected lines can really vary depending on which projections you use (CHONE vs. ZiPS and Runs Projected vs. Puns Pace). In some cases, the projections will conflict depending on which system you choose (a team may be projected as a favorite using CHONE but as a dog using ZiPS). Which of the four iterations is most reliable? And come June, will you update the spreadsheet to use "real" data? 2. How exactly do you interpret the projected money lines? Is a 25 cent differential enough or do you look for more? 3. Is there an easy tutorial for writing the macro to get Excel to extract web data? 4. Any thoughts on using starter WHIP or xERA instead of starter ERA or some other defense-adjusted pitching measure? Thanks again! Look forward to your response. Last edited by Panekkkk; 05-10-2010 at 11:41 AM. |
|
#3
|
|||
|
|||
|
thanks for the response
come June I"ll probably resort to real pitching data in lieu of CHONE seeing as how they aren't doing projections the runs scored data is YTD adjusted to 162 games (PACE), or CHONE projections updated from May 1 (PROJ) I might add xFIP or xERA, or some other derivative of a pitching statistics, but not sure, don't really think its practical the MLs are highlighted green when my created line (converted to winning percentage) is at least 3% higher than the pinnacle line (winning percentage), which typically equates to around 15-20 cents but by degrees rises as the winning percentages rise because new pitchers are added and the macros aren't functioning properly, after you do an update and sort the stats, go to the PLAYER PITCHING and format the remaindingi pitchers, then resort by arranging alphabetically A-Z
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#4
|
|||
|
|||
|
can't get it to open
|
|
#5
|
|||
|
|||
|
google "office 2010 beta", free download from microsoft, works perfect and not on the market yet
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis |
|
#6
|
|||
|
|||
|
Nice work, UVA. I wouldn't be in any hurry to use this season's stats. In fact I'd go so far as to say that in most cases, the pre-season projections are more predictive of how a player will perform going forward than the current season's stats.
|
|
#7
|
|||
|
|||
|
fan graphs projections update and a minor fix in CHONE pitching projections (which I barely use)
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#8
|
|||
|
|||
|
minor tweaks, some of the projections had to be readjusted to accommodate my defensive equalizer add on, and all the macros should be in working order
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#9
|
|||
|
|||
|
Thanks for making this workbook available. Great work that has really gotten me interested in quantitative analysis of sports, baseball in particular. I am relatively new to this type of analysis when it comes to sports but I have a decent knowledge of statistics (always liked working with numbers) and have been actively using Excel for 15 years and have built many models, predominantly to analyze companies, the stock market, investment strategies and the like. Turns out that betting on sports have a lot of similarities with the type of investing I like to do.
After having spent quite a few hours wrapping my head around how your workbook works, I think have found a few things (formulas) that might not work exactly as I believe you intended and a few adjustments that would make it work. Assuming that you want this type of input, should I post my comments here in this thread or is there another way that I can communicate this to you since I honestly believe that my comments will not be of interest to anyone other than you, as the creator of the workbook. Tom |
|
#10
|
|||
|
|||
|
your comments are welcome right here in this thread, you can even attach your adjusted workbook
anyone and all who have downloaded the workbook, or might in the future woudl benefit welcome tom
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance Check out Technicapping for quantitative sport analysis |
|
#11
|
|||
|
|||
|
I second the earlier comments... Just to be able to see the work flow the way you organize the data has given me some ideas on how to tidy up my own stuff.
Thank you. Last edited by C.S.; 05-17-2010 at 04:58 PM. Reason: added thanks |
|
#12
|
|||
|
|||
|
A few comments with potential solutions
I apologize in advance since this got a little bit more complicated than I wanted to but hopefully it is useful. The first part of my comment refers the current issues I have found. Each of them is quick to fix. However, the last part of my comments is regarding how to make the workbook easier to manage, expand with additional data and audit (to verify that the results are correct). The latter obviously will take a bit more effort and I would be happy to help if there is interest. I don't want to provide a workbook with my adjustments since I don't claim to understand (yet at least) all the calculations that are included and since uva is updating the data and possibly adding more calculations over time and might have reasons for why my suggested changes don't make sense,I believe he is better equipped to do any changes than I am. I would be happy to provide specific examples of how the changes I am suggesting could be implemented, if there is interest.
My comments: MAIN sheet: Cell F1 has a space in it, so the formula in column P doesn't work properly. It is looking for "ZiPS" in cell H1 while cell H1 will only be either "CHONE" or "ZiPS " (note the space at the end. "ZiPS " does not match "ZiPS" so column P will always pull in the data for the CHONE projections. MAIN sheet: VLOOKUP function without requiring an exact match will not prompt you that a pitcher is missing. One example is L. Atilano (Washington Nationals). The numbers that are being pulled in from the ZiPS sheet for L. Atilano actually belong to K. Wood (the last item on letter K) and data pulled in from J. Ely actually belong to J. Duchscherer. This will happen anytime the pitcher is missing from the "ZiPS" sheet using the current VLOOKUP formula. To force Excel to look for an exact match, the VLOOKUP formula should look like VLOOKUP(C5,pitchers!$A$2:$V$1514,15, FALSE) rather than VLOOKUP(C5,pitchers!$A$2:$V$1514,15) which is how it is currently. This example was taken from column K but applies to all VLOOKUP formulas used. MAIN sheet: The VLOOKUP formula to get data from ZiPS sheet only includes the range A2:P383, while the data in the ZiPS sheet goes to row 401 so some pitchers will not be found because of that. Applies to formulas in columns K, L, M, N, O, P The basic issues we have when trying to retrieve data is 1. Missing pitcher data. Can't fix that if no projection exists for that pitcher but we can make sure that we don't pull in the wrong pitchers data. 2. Duplicate pitcher names. Can manage this but forcing Excel to pull in data from a specified row where the row number is a manually entered integer, in those instances where there is duplicate pitchers (rare but happens). There are several ways of doing this and I can provide an example if anyone wants me to. 3. Inconsistent naming of pitchers e.g Jer. Weaver vs J. Weaver. Manually specifying the row on those rare instances would solve this as well. My suggestion would also be to break out the calculations so that each relevant data point that is picked up from the individual sheets has its own column, rather than using multiple VLOOKUP formulas in a cell. That way it becomes much easier to verify that the data that is being pulled in is correct and easier to audit when Excel shows an error e.g. when data for a specific pitcher is missing from the projection sheets. Whenever I can, I try to use the same formula when getting data from the same sheet or table. The reason for this is that if I have to change the formula, I only need to change it once and copy it to all other cells. As a result, I am not a fan of using a VLOOKUP formula where the column number is hardcoded since if the data source is changed e.g. columns moved, all column numbers in all VLOOKUP formulas would have to be changed. A more flexible way would be to find the column for a specific header, e.g. "xERA", in the data source sheet using MATCH function (MATCH will return an integer representing the position of the item you are looking for in the specified range). Another problem with VLOOKUP is that if columns are added to the data source sheet e.g. by adding additional columns of relevant calculations, the range used in the VLOOKUP function would have be expanded to be able to pick up the new columns of data. If we have a lot of data and using a lot of VLOOKUP functions, the workbook can be slow to recalculate properly. A way to solve this would be to use a combination of functions. Use MATCH to find the row for the specified pitcher. Since this number doesn't change, store this in its own column and refer back to this in subsequent formulas where the pitcher row is needed. This is better than to include the MATCH function in each cell, both from an auditing and performance standpoint. Use MATCH to find the correct column for the data point we are looking for, e.g. "xERA". Store the result of that formula in its own row. Then we can use INDEX function to find the specific data point we are looking for, using the row and column numbers that we have previously determined using the MATCH function. This way, anytime we are looking to pull in data from a specific sheet, we can use the same formula, regardless of which row/column combination that we need to get the data from. Lastly (if anyone got this far) is the ZiPS data updated with the “ZiPS (Update)” data available at FanGraphs.com (or similar source) or is that not advisable? |
|
#13
|
|||
|
|||
|
thanks thom for the comments and interest
sorry I haven't updated or reuploaded here, I had fixed those issues and added a few more things along with a ZiPS update few days ago but never re-uploaded to cappersmall, maybe a mod will allow me to just update it to the first post instead of having to make continuous post everytime I update if I use match and offset and the index formula it would seem if I had to adjust the lookup sheets or values for any reason I would have the same issues as before, meaning I would have to change the reference cells in the formula on the main page match and index may be faster, though, you are probably right and yes the numbers come straight from fangraph (Update not ROS)
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 Last edited by uva3021; 05-18-2010 at 12:21 PM. |
|
#14
|
|||
|
|||
|
Here is the latest update
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra "Always tell the truth, that way you won't have to remember what you said." --Mark Twain *=$50,000 |
|
#15
|
|||
|
|||
|
Thanks for update uva - Did you get my reply regarding? Let me know if that did it
|
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
All times are GMT -5. The time now is 05:51 PM.








Linear Mode
