Go Back   Sports Handicapping Forum > Welcome Forums > Main Street > Capping

Capping All handicapping, betting systems, spreadsheets, mathematics & quantitative technicapping.

Reply
 
LinkBack Thread Tools Display Modes
  #1  
Old 06-27-2010, 11:15 PM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
Extracting Data/Stats from Statfox for College Football

Some people have been asking on how to build a database or get information (scrape data). Well this post explains one method, often seen as cheating, but nonetheless gets the job done and the programming is easier to absorb.


College Football season is shortly finding its way into our interests once again, the season is only about ten weeks away. For me, this spells chaos since I suck at betting college football. And I’m determined to find a way to get better, or at least be somewhat serviceable. I need to start building a database.


How does one do this without going step by step, page by page copy and paste or the excel table import function, taking hours and even days to get all the information you want?


Loops.


Open excel and record a macro of yourself extracting a web table using the built in function(Data>>Source>>From Web). Once the data is imported, stop the macro and briefly survey the VBA code. You’ll get a good idea how the code is formatted, and the objects/properties involved in processing information by way of macro.


Find a nice list of teams from somewhere, preferably Statfox so as to remain consistent with the pages to be included. Fortunately enough Statfox formats their page links with identical parameters surrounding the name of the team. All spaces in the team names are simply replaced with “+”.


After having put a list of teams together in excel, replacing any spaces with “+”, name the range whatever, maybe something like “teams”, and stick a FOR…NEXT loop around the VBA code. Then wait.


Code:
 Sub GetDataFromStatfox()
'
' Grabbing CFB Data from Statfox Macro
'
'
Dim i As Integer
Dim cnstr As String
i = 1
For i = 1 To Range("teams").Rows.Count
cnstr = "URL;http://www.statfox.com/cfb/gamelog~teamid~" &  Range("teams").Cells(i, 1).Value & ".htm"
Sheets.Add.Name = Range("teams").Cells(i, 1).Value
With ActiveSheet.QueryTables.Add(Connection:=cnstr,  Destination:=Range("$A$1"))
.Name = Range("teams").Cells(i, 1).Value
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = True
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=0
Next i
End Sub
My computer is piss poor therefore it takes about 3-5 minutes for the process to arrive at its completion. A fast computer should complete the task within 20 or 30 seconds. Now if you want to take it a step further and extract more data (a severe invasive extraction, sorry statfox), maybe from previous years, or get all stats from 1997 to 2009 (Statfox only goes back to 1997), then what do you do? Stick another FOR…NEXT loop in there and this time, you want to add new workbooks as well as new worksheets within those workbooks.


Quote:
Dim yearstart as Integer
Dim year end as Integer
Dim k as Integer
Dim wbname as String

yearstart = 1997
yearend = 2009
k = 0
For k = 0 to (yearend - yearstart)
wbname = yearstart + k & ".xls"
Workbook.Add
ActiveWorkbook.SaveAs Filename: = wbname , FileFormat:=Normal _
ReadonlyRecommended:=False, Createbackup:=False
Windows(wbname).Active
'Code from above
Next k

Once again you can even take it a step further and write a macro that only finds the rows that contain a particular piece of data. Most games that will be displayed on the statfox page will correspond with a date. The macro elements ‘IsDate’ or ‘InStr(string1, string2, date)’, or ‘IsNumber’ if you just want the line, wrapped around an If…Then statement within the inner FOR…NEXT loop, and whatever information or set of statistics that you want to operate with can be manipulated.
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #2  
Old 06-28-2010, 11:30 AM
the straightshooter
 
Join Date: Aug 2004
Location: an onrushing tram, on collision course
Posts: 46,820
Rewards: 855
<tup>
__________________
The only bridge I've ever burned along this legacy I dance is the one that linked the cities of prosperity and chance

Check out Technicapping for quantitative sport analysis
Reply With Quote
  #3  
Old 06-28-2010, 11:48 AM
Registered User
 
Join Date: Apr 2010
Posts: 111
Rewards: 171
Another great contribution from CM's best poster. The most annoying thing about gathering CFB data is the inconsistencies with the naming i.e. Miami, Fl vs. Miami vs. Miami (Fl.) vs. Miami FL
Reply With Quote
  #4  
Old 06-28-2010, 08:21 PM
Registered User
 
Join Date: Oct 2008
Location: Minnesota
Posts: 3,992
Rewards: 82
thanks for the info UVA!
Reply With Quote
  #5  
Old 07-01-2010, 08:04 PM
Registered User
 
Join Date: May 2010
Posts: 11
Rewards: 295
Include the following code directly after the variable declarations (Dim statements) at the beginning of the code (right before "i = 1")

Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual

And then set the calculation back to automatic by adding

Application.Calculation = xlCalculationAutomatic

at the end right before the End Sub statement (directly after "Next i").

Now the code should run a lot faster.
Reply With Quote
  #6  
Old 07-02-2010, 05:48 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
thanks thom, I had the autocalculation toggle in my NBA Playoff simulator, forgot to include here

You seem to be privy to programming language, how deep is ur knowledge because I could use some help
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000
Reply With Quote
  #7  
Old 07-02-2010, 08:17 AM
Registered User
 
Join Date: May 2010
Posts: 11
Rewards: 295
Happy to help when I can. However, I would say that my programming knowledge is extremely Excel specialized since I use Excel on a daily basis. I have a pretty good knowledge of VBA and have created quite a few Excel workbooks that extract data from files or the internet. As long as I have the full URL to the data source, I think that I should be able to get the data I want from it. These days a lot of web sites use java scripts where the underlying url is not obvious, unless (I assume it would help) you know java script (which I don't) and then I run into trouble.

I have used both web queries where I import the data to Excel before saving it as well as extracting the data from the html source code and storing it to arrays and then saving it e.g. to csv files. Depends on the data layout which method is preferable.

So if you have questions regarding Excel and VBA I think I would be able to help. As an example, I have been planning to use Excel/VBA to extract historical odds (and scores) from sites like BetExplorer. I know it is very possible but haven't gotten around to actually programming it.
Reply With Quote
  #8  
Old 07-02-2010, 08:25 AM
Hall of Fame
 
Join Date: Oct 2005
Location: Salem, VA
Posts: 22,450
Rewards: 1,800
what kind of data do you want

bet explorer is merely a linking affiliate to oddsportal
__________________
"Nobody goes there anymore, its too crowded." --Yogi Berra

"Always tell the truth, that way you won't have to remember what you said." --Mark Twain


*=$50,000

Last edited by uva3021; 07-02-2010 at 08:29 AM.
Reply With Quote
  #9  
Old 07-02-2010, 01:45 PM
Registered User
 
Join Date: May 2010
Posts: 11
Rewards: 295
I was/am going to get the historical odds and scores for some of the larger soccer leagues to start off with. I frequently use OddsPortal to compare odds manually but BetExplorer will show you historical odds for what appears to be all bookies without having to be logged in (OddsPortal requires you to be logged in to see all bookies), which helps when downloading/scraping the odds data.
Reply With Quote
  #10  
Old 07-02-2010, 07:23 PM
Registered User
 
Join Date: Oct 2008
Posts: 70
Rewards: 164
This is great! Thank you UVA!

:che:
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -5. The time now is 05:47 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2012, vBulletin Solutions, Inc.