Go Back   Sports Handicapping Forum > Sports Information > Handicapping Mall


Handicapping Mall A heavily moderated forum without sarcasm or hijacks. The place for intelligent conversation related to all handicapping, betting systems, spreadsheets including mathematics & quantitative technicapping.

Reply
 
LinkBack Thread Tools Display Modes
  #1  
Old 07-27-2010, 03:40 AM
@R_Magillicutty
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,413
Rewards: 1,614
Extracting and Surveying Line Movement to measure a Model

Quickly, a nice an efficient way of extracting lines before I delve into explaining how I’m attempting to use line movement as a measure of success.

Excel macros are very easy to learn based on the process of recording and editing built-in or created macros. If you understand the basic methods of looping in any given language — for excel macros the programming language being the manageable Visual Basic– you can have a lot of power at your fingertips.

Here is my code to extract some archived closed lines, indirectly from Pinnacle by way of SBRForum. Some may already be fortunate enough and have the inherent foresight to realize the advantage of parsing the Pinnacle lines daily at close. I’m just either too lazy or never thought about it, but there are ways around such neglectful and myopic behavior.


Code:
Sub MLBfromSBR()
'
' This macro extracts and cleans the scoreboard from SBR
' Each sheet is given the name of the desired date
' After the data is stripped, the sheets contain line open, close, matchup, and score

'
On error resume next
Dim Datenum As Integer
Dim sht As String
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual

Datenum = 725 'This corresponds to the desired starting data that fits the SBR linking structure

Dim i As Integer
For i = 1 To Range("archivedates").Rows.Count 'archivedates refers to the Range of dates labeled in the workbook

sht = Range("archivedates").Cells(i, 1).Value
Worksheets.Add().Name = sht
Sheets(sht).Select
Datenum = Datenum - 1 'Count down or Count up from a given date

    With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;http://www.sbrforum.com/Scores/MLB+Odds/20100" & Datenum & ".aspx", Destination:= _
        Range("$A$1"))
        .Name = "20100" & Datenum
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With' Delete all non-numeric or relevant information

With Range("A2", Cells(Rows.Count, 1).End(xlUp))
.SpecialCells(xlCellTypeConstants, xlTextValues).EntireRow.Delete
.SpecialCells(xlCellTypeFormulas, xlTextValues).EntireRow.Delete
.SpecialCells(xlCellTypeBlanks, xlTextValues).EntireRow.Delete
End With

Next i

Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
The date information can obviously be changed, might have to get creative in order to negotiate different months of data. There is probably an easy linear mathematical formula waiting to be found to convert the raw sequent date numbers to match SBRForum (perhaps use typical date format [yyyy/mm/dd] and strip the “/” replacing with “” within the code), but the format must fit the link structure of whatever website. My earlier extraction posts using statfox has different link formatting. Adapt to the site. This worked wonders for me.


Now to measuring the success of a model.

Its hard to evaluate line movement. What moves a line, how much faith to we have in market efficiency vs human rational behavior? For any given game, one can make the assumption that there is 50% probability of a line moving with you or against you. That would appear to be a rather simplified observation given the nature of line movement. But in theory, its justified.

Let’s look at using line movement to assess the sophistication and accuracy of a model. Taking my last four days of MLB wagers (governed by my MLB model), giving a sample size of 27, I’ve made a graph of my wagered line versus the Pinnacle close (I think this is practical because I typically bet the Pinnacle line or better due to my myriad sportsbook options).

In order to do this, the lines have to be converted to a more manageable number. US line isn’t conducive to a comparative dataset. Based on my preference, I converted each line to the respected implied win probability, outlined here, feel free to use my odds calculator.

First, here is the graph, with the Pinnacle close in red superimposed over My line in green blue.



What this graph tells us is line movement is basically in stasis upon the submission of my wagers. The blue being placed almost exactly on the same path as the red indicates there is not much one can draw from the graph other than the average probably approaches zero. I should point out I am constrained to the sample size of the data. Four days worth of bets vs 5 months of the season that has been observed is hardly an optimal enough aggregation for measurement. But to lay the framework, using a smaller sample size to experiment allows for facilitating any possible scenarios that could be conditioned in the future using similar methods.
Keep in mind the concept of a 50/50 probability of the line moving with or against you. Therefore its rational to assume the average difference between observed line movement and placed wager should approach zero. The methods of a normal distribution data set are sufficient to produce reasonable data.

Percentage > 0%: 52%

This means 52% of the time through n sample size (in this case 27) the line movement, however slight, was in my favor, or simply the difference between close and my line was greater than zero.

For example, Team A line was -110 at the time I made the bet, and the Pinnacle closing line of Team A was -112, which would make a difference of -2 cents, or when converted to a percentage, ~ 0.4%.

Using that particular approach, applying to the entire sample, and assigning a boolean proposition (T | F = 1 | 0) to above or below zero, the percentage of time the line movement moved ‘with’ me, was 52%.
Now, the descriptive statistics of the four day sample are below, showing mean, standard deviation, and standard error estimation, labeled appropriately.



Surveying the table, the conclusion can be reached that my betting habits function around a devotion to orthogonality (breaking even). The average implied probability of the teams I wager on hover around even, with very little fluctuation in line movement from the time I bet to close. I should note that the time in which these wagers are placed is typically between 9:00 am and 10:00 am EST. I try to maintain consistency in that respect, to combat the inconsistencies of keeping a database using the lines posted when acted upon. One would expect there to be a mutual understanding of rationing line movement as it relates to information and time. To some degree that is why some sportsbooks choose to appropriate overnight lines. If this were not the case, books would leave themselves vulnerable to night vultures, terrorizing the market during the night with a sagacious eye for arbitraging. So relying on some sense of natural constraining mechanisms in the market would have to be in practice.

As I mentioned, the data has an asymptotic quality that assures a likely tendency to regress towards the mean as the sample size approaches infinity. This is in accordance with the line differential, not the actual line I wagered or the closing line. Because of the expected equi-probable scenario of line movement, up or down, the central limit theorem directs the data to a true normal distribution. Now if you noticed I used the term ‘likely tendency’ instead of morphologically constructing a guarantee. If one truly creates a +EV model, the products’ efficacy will be reflected in the line movement. Lest one is just extremely lucky. This distinguishing luck from efficacy is the very essence of the model survey.

The standard error is the next aspect of digesting a tiny sample size. That is an estimation of the interval where the standard deviation is most likely to fall given the sample size. The point of emphasize being the Average Differential (Avg D). As one knows (Wikipedia), the standard deviation of a normal distribution is an indicator of where each data point falls in relation to the mean. In a normal distribution, a standard deviation allows for 2/3 of the data to fall within one standard deviation. One standard deviation here being 1.13%. A standard error of 0.22% may seem miniscule. However, take in to account that the percentage probability can translate to money, in this case one would most likely say cents. 0.22% is about $1 for every $100 bet on two possible outcomes of one equi-probable event with very subtle line movement. So a 51% probability versus a 50% probability equals $1. After making 500-1000 wagers, that $1 has a habit of accumulating.
The mean fitting within the 95% confidence level (CI, within +- 1.96 standard deviations) meets the conditions laid forth by our prior assumptions, that the difference of line movement on wagers placed convenes about zero.

The great thing about a conservative model is the risks being tethered to the orthogonal nature of the model. I will in all probability rarely see a significant stream of wagers lead to a destruction of bank roll, or a massive reduction in growth potential. However, to the more desired positions of pecuniary resolve opposite that of the aforementioned extremes, such approach will meet a similar fate. That is a regression to some break even point.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000

Last edited by uva3021; 07-27-2010 at 03:43 AM.
Reply With Quote
  #2  
Old 07-27-2010, 04:16 AM
@R_Magillicutty
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,413
Rewards: 1,614
Quote:
So a 51% probability versus a 50% probability equals $1
This should read a 100 line versus a -101 line (roughly 0.22% probability) equals $1
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #3  
Old 07-31-2010, 11:44 AM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
The line movement test is a very effective way to measure a model's validity in my opinion, even though some sharper guys disagree. If you're winning at 55% but the market is only agreeing 50%, you're probably getting lucky and due for a correction at some point.
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -5. The time now is 02:28 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2012, vBulletin Solutions, Inc.