Posted on 08/30/2012 at 12:45am
As a programmer and a fantasy football addict, I am embarassed by the means through which we must expend ourselves to get data in a machine readable form. This lack of open source software cripples the community with sub-standard tools, and most importantly, detracts from some really cool and fun things that could be done with easily available statistics. Many tools are either out-dated or broken, or if they work, they are closed source and often cost money.
Yesterday I started work on a new library package that I hope will start to improve this sorry state of affairs.
Since game statistics never change after a game has been played, the JSON data is automatically cached and saved to disk if the game is no longer being played. The next time statistics for that game are queried, the data will be read from disk. (nflgame comes preloaded with data from every game in the pre- and regular season since 2009.)
The API for nflgame is small and hopefully easy to use—even for those without much or any experience programming.
Let’s start off with a quick teaser to showcase some of nflgame’s power.
Who lead the league in rushing between weeks 10 and 14 of the 2010 regular season?
I won’t make you hunt down the answer. Here’s five lines of code that lists the top ten rushers in weeks 10-14 of the 2009 season:
>>> import nflgame >>> games = nflgame.games(2010, week=[10, 11, 12, 13, 14]) >>> players = nflgame.combine(games) >>> for p in players.rushing().sort("rushing_yds").limit(10): ... print p, p.rushing_yds ... ... M.Jones-Drew 632 M.Turner 480 A.Foster 466 F.Jackson 462 K.Moreno 462 J.Charles 458 P.Hillis 426 C.Johnson 416 S.Jackson 405 B.Green-Ellis 401
Back to basics
If you are a beginning programmer (or don’t have any experience programming), I strongly urge you to read my Tutorial for non programmers: Installation and examples. What follows is a condensed version of the tutorial that might be a bit too confusing for those without programming experience.
nflgame is designed around three core concepts: games, players and lists of players (that are implemented as Python generators). Games can be selected based on season, week and team. Players in each game can then be accessed by name, statistical categories (i.e., passing, rushing, defense, etc.), or even statistical values—such as finding all players in a list with at least one receiving touchdown.
Games can be selected one at a time:
>>> import nflgame >>> buf_at_ne = nflgame.one(2011, 17, "NE", "BUF")
Or in bulk (every game in the 2009 season):
>>> import nflgame >>> season09 = nflgame.games(2009)
Each game comes with its own
players attribute that holds player statistics
for every player that participated in the game. Additionally, games have
clock that report meta-information about the game itself.
So to get every player with at least one passing statistic in a game:
>>> import nflgame >>> game = nflgame.one(2011, 17, "NE", "BUF") >>> print game.players.passing() [B.Hoyer, T.Brady, R.Fitzpatrick]
And the same thing can be done with rushing, receiving, defense, kicking, etc.,
by simply replacing
passing with one of the aforementioned.
Each player comes with his own grouping of statistics. To extend upon the previous example, consider printing out some passing statistics associated with each passer in the game:
>>> for p in game.players.passing(): ... print p, p.passing_cmp, p.passing_att, p.passing_yds ... ... B.Hoyer 1 1 22 T.Brady 23 35 338 R.Fitzpatrick 29 46 307
Filtering, sorting and combining—oh my!
No data API would be complete without a means of filtering the data according to its values.
To find all players on the home team of the current game:
>>> print game.players.filter(home=True) [B.Hoyer, T.Brady, B.Green-Ellis, A.Hernandez, J.Edelman, S.Ridley, D.Woodhead, R.Gronkowski, W.Welker, S.Gostkowski, Z.Mesko, M.Slater, K.Arrington, M.Anderson, J.Mayo, A.Molden, N.Jones, P. Chung, B.Deaderick, D.Fletcher, D.McCourty, V.Wilfork, N.Koutouvides, R.Ninkovich, K.Love, L.Polite, B.Spikes, S.Moore]
In this case, New England is the home team, so only players on the Patriots are returned.
A more advanced use of
filter is to use predicates to determine whether a
particular stat should be filtered or not. For example, here we look at every
player in the game on the home team with at least one interception:
>>> print game.players.defense().filter(home=True, defense_int=lambda x: x >= 1) [A.Molden, D.McCourty, S.Moore]
Any player list can be sorted according to a statistical field. For example, we might want to see a list of rushing leaders in the game by yards:
>>> for p in game.players.rushing().sort("rushing_yds"): >>> ... print p, p.rushing_att, p.rushing_yds >>> ... >>> ... S.Ridley 15 81 C.Spiller 13 60 R.Fitzpatrick 5 36 A.Hernandez 2 26 B.Green-Ellis 7 22 J.Edelman 1 6 G.Wilson 1 6 D.Woodhead 1 5 T.Choice 1 4 B.Hoyer 3 -2
Player statistics for the same player from different games can be combined to represent statistics from multiple games. Additionally, player generators can be concatenated. This combination allows one to construct searchable player generators of any makeup: from only games in a certain week, to all games in a season (or multiple seasons!).
For example, to find the top ten rushing leaders from week 2 of the 2009 season, we simply select all games from that week, combine the games into a single player list, and use our familiar searching methods exemplified above to get our answer:
>>> week2 = nflgame.games(2009, 2) >>> players = nflgame.combine(week2) >>> for p in players.rushing().sort("rushing_yds").limit(10): ... print p, p.rushing_att, p.rushing_yds, p.rushing_tds ... ... F.Gore 16 207 2 C.Johnson 16 197 2 F.Jackson 28 163 0 C.Benson 29 141 0 R.Brown 24 136 2 M.Barber 18 124 1 M.Turner 28 105 1 S.Jackson 17 104 0 F.Jones 7 96 1 A.Peterson 15 92 1
What if you wanted to see who passed for the most touchdowns in the first five weeks of the 2011 season?
>>> games1_5 = nflgame.games(2011, week=[1, 2, 3, 4, 5]) >>> players = nflgame.combine(games1_5) >>> for p in players.passing().sort("passing_tds").limit(10): ... print p, p.passing_tds ... ... T.Brady 13 A.Rodgers 12 M.Stafford 11 D.Brees 10 R.Fitzpatrick 9 M.Hasselbeck 8 E.Manning 8 K.Orton 8 J.Flacco 7 M.Schaub 7
Or how about the receiving leaders for the entire 2009 season?
>>> season2009 = nflgame.games(2009) >>> players = nflgame.combine(season2009) >>> for p in players.receiving().sort("receiving_yds").limit(15): ... print p, p.receiving_yds, p.receiving_rec, p.receiving_tds ... ... A.Johnson 1504 95 9 W.Welker 1336 122 4 S.Holmes 1243 78 4 R.Wayne 1243 95 10 M.Austin 1230 74 11 S.Rice 1200 78 6 R.Moss 1189 78 13 S.Smith 1163 97 7 A.Gates 1145 78 7 D.Jackson 1120 60 9 H.Ward 1106 87 6 V.Jackson 1097 63 9 G.Jennings 1091 66 4 R.White 1087 79 10 B.Marshall 1081 93 10
Finally, with any of the above examples, you can export the statistics to a CSV file that can be read by Excel. For example, to export the entire 2011 season in just a single line:
For the short-term, I’d really like to come up with an easy and elegant way of providing alerts (emails and texts) for your fantasy football team. For example, whenever a player on your team—or your opponent’s team—scores a lot of points like a touchdown or a field goal. I did something like this last year using a cobbled together hack-job that I’m ashamed of, and it was a lot of fun. (I did screen scraping on the ESPN fantasy web site for all the statistics.)
The real problem here is keeping your roster up to date. It’s pretty easy if you’re using Yahoo because of their Fantasy Sports API, but I don’t think any other league web site offers such amenities.
In the long-term, nflgame could certainly be the statistical back-bone for fantasy football league software. But I don’t think that’s on my personal radar any time soon.
I am aware of phpFFL, which purports to be open source fantasy football league software—but development seems to have stalled. (It also looks like they are screen scraping CBS Sports for statistics, which I really want to avoid.) Plus, I vowed a long time ago never to take up another serious project in PHP again. I value my mental health too highly.
- Tutorial for non programmers: Installation and examples
- PyPI package page
- github project page
- Archlinux user repository package page
Add a Comment
Posted on 05/17/2013 at 7:01am
Posted on 05/07/2013 at 10:40am
Posted on 10/05/2012 at 12:49am
This is really beautiful.
Such a clean API!
I'm moving some of the data into pandas to play with (preferring pandas to SQL). In any case, a huge thanks for such a great python package.
Posted on 09/20/2012 at 7:17pm
I'm a fantasy junkie as well and have some theories about how to find breakout players automatically and also assist in picking players to start/sit based on differing sets of criteria (injuries, past performance, home/away, last time played and other statistical data)..
Ah, I haven't gone down that road yet. I'm just trying to make it easier to track what's going on, since I'm in four leagues this year. What I've come up with so far is to have a live stream of all plays, and have plays involving one of my players highlighted. It's quite addicting to watch on Sunday…
I have written some code that pulls my rosters and matchup information from Yahoo and ESPN leagues. It uses YQL for Yahoo and (unfortunately) BeautifulSoup for ESPN. But it works!
Do you have twitter?
Posted on 09/20/2012 at 4:14pm
Thanks for the heads up on loading the data to a database. I'll definitely use nflgame.live to load the data into the DB and as suggested have the web app reading from that db to display the data. Gonna start ultra simple and extend it out from there.
I'm a fantasy junkie as well and have some theories about how to find breakout players automatically and also assist in picking players to start/sit based on differing sets of criteria (injuries, past performance, home/away, last time played and other statistical data).. Do you have twitter?
Posted on 09/20/2012 at 3:41pm
I'll see if I can't extend this and give us live visualization using Django.
That'd be awesome! I've done a little work on visualizing my fantasy football leagues (that I'll share soon) using Bottle (I'm really loving its simplicity). In my experience so far, I strongly urge you to load the data into a relational database and have the web-frontend pull data from there rather than nflgame explicitly. Using nflgame with more than a dozen games or so is just going to be too slow.
So basically, I'd see this having two components: 1) A background process that uses
nflgame.live (or not) to load data into a relational database and 2) a web app that reads from the database and displays the data.
It might also be worth checking out NoSQL. It might be a better approach to storing statistics.
Posted on 09/20/2012 at 3:35pm
WOW. This is about as BOSS as humanly possible. Followed you on Github and starred the project. I'll see if I can't extend this and give us live visualization using Django. Excellent Job! Looking forward to contributing.
Posted on 09/07/2012 at 4:07pm
Andrew, that's perfect. Very cool. I'll keep you posted how it works.
Posted on 09/07/2012 at 1:35pm
Also, if you post your questions on github, you should get email notifications when I respond. But you'll probably need a github account (and feel free to continue using this blog entry if you don't want to register for a github account).
Posted on 09/07/2012 at 1:32pm
One question for you: Is it possible to generate player/team statistics by filtering on individual plays?
Yes! I will warn you, I've added this functionality recently, so it may be a little raw in places. But if you're using the latest version (1.0.7), then you should be OK.
For example, if I want to find Brady's pass attempts and completions ONLY on third down plays. That would be really interesting.
Here ya go:
>>> import nflgame >>> g = nflgame.one(2011, 17, "NE", "BUF") >>> plays = g.drives.plays().filter(third_down_att=1) >>> brady = plays.players().name('T.Brady') >>> brady.formatted_stats() 'passing_att: 8, passing_incmp: 3, passing_incmp_air_yds: 35, passing_sk: 2, passing_sk_yds: -10, passing_yds: 58, passing_cmp: 5, passing_cmp_air_yds: 29, passing_int: 1, passing_tds: 1'
So on third down, Brady went 5⁄8 for 58 yards. Was sacked twice for 10 yards. Threw one pick and one TD. :-)
You can do this for an entire season too (or multiple games), but the API doesn't directly support this. I plan on adding it though. (Right now, you'd have to get all plays for each game manually, then simply add them together. Kind of how nflgame.combine works.)
Posted on 09/07/2012 at 8:36am
Well done, this is great. One question for you: Is it possible to generate player/team statistics by filtering on individual plays? For example, if I want to find Brady's pass attempts and completions ONLY on third down plays. That would be really interesting.
I'm playing around this module not for fantasy football, but for a team stats model. Definitely better than screens scraping.