Kevin Nelson Marshall
Other entries:
« There's always a better way...

Today's quick Ruby hack is a script I threw together REAL quick (in about five minutes) to help a friend.

Basically, my friend needed a pipe delimited list of NFL players from sportsline.com - I think he is doing something on his local machine with his fantasy roster - but honestly, I don't know why he really needed this.

In any case, what he wanted was the sportsline.com playerid, the player's name, the player's position, and the NFL team they are on. As luck would have it, pages like this one have just that easily available.

So it's really a simple case of looping through each letter listing on Sportsline.com and breaking out the data we want. Again, this is a case of a one-off script I'm running local and that doesn't really need to be all that fast.

What this means is I can take a bunch of shortcuts, I can hard code against my data scenarios, and I don't really have to worry about security...

so without further ado, here's the quick hack in it's full:

require 'net/http'

players = File.new("players.txt", "w+")
('A' .. 'Z').each do |letter|
  begin
    response = Net::HTTP.get_response(URI.parse("http://www.sportsline.com/nfl/playerindex/" + letter))
    data = response.body
    playerfound = false
    while !playerfound do
      # grab player details if we can; example - <a href="/nfl/players/playerpage/187381"> Abraham, John</a> DE, Atlanta Falcons
      data =~ /"/nfl/players/playerpage/(d+)/is
      playerid = $1
      data =~ /#{playerid}"> ([a-zA-Z' .-,]+)/is
      playername = $1
      data =~ /#{playername}</a> ([a-zA-Z]+),/is
      position = $1
      data =~ / #{position}, ([A-Za-z. ]+)/is
      nflteam = $1
      line = "#{playerid}|#{playername}|#{position}|#{nflteam}"
      if playerid != nil
        players.puts line
        puts line
      end
      chopspot = data.index(playerid) + 20
      datasize = data.length - chopspot
      data = data[chopspot, datasize]
      if (!data =~ /"/nfl/players/playerpage/(d+)/)
        playerfound = true
      end
      playerid, playername, position, nflteam = "","","",""
    end
  rescue
  end
end


As you can see, it's very light on comments (and logic really)...so just to break it down a little bit:

1. Sportsline.com lists players who's last name start with each letter - so we start by looping through the alphabet.

2. We wrap our process in a begin/rescue/end loop just in case we hit a problem on a given page, our program will continue to grab the data for the other letters.

3. We use a simple Net::HTTP call to grab the data for each page.

4. We use a handful of regular expressions to get the data we want out...I could have done these calls/assignments all in one regular expression, but I found it easier to build it up in bits and so I just kept it that way. (In something I was going to spend more time on, I would have purged these down into one regex call)

5. I only write the data out to the file (and screen) if there was a player ID found...this way we ignore any junk lines or false matches.

And it's basically that simple. Here's the first few lines of the generated file:

405198|Abdullah, Husain|DB|Minnesota Vikings
405208|Abiamiri, Victor|DE|Philadelphia Eagles
187381|Abraham, John|DE|Philadelphia Eagles
395911|Adams, Anthony|NT|Chicago Bears
1614642|Adams, Chester|G|Chicago Bears
12175|Adams, Flozell|T|Dallas Cowboys
405275|Adams, Gaines|DE|Tampa Bay Buccaneers
517269|Adams, Jamar|DB|Seattle Seahawks
1222573|Adams, Michael|DB|Seattle Seahawks

posted by Kevin Marshall on 2008-10-06 00:00:00+00

Subscribe »

BotFu feed with RSS reader

BotFu feed by Email


Search All Posts »

Blog Details »

This blog now includes 286 wonderfully exciting posts from 1 unique and very special writer!


Kevin Marshall - Who's That?

I'm just your basic programmer. I can't spell to save my life, I'm not the greatest story teller, and I often ramble on about nothing. This blog showcases all of that!

If you're bored drop me an email at info at falicon.com or view my outdated resume.


Stalk me on »

Twitter (@falicon) »
Delicious »
Digg »
Disqus »
Facebook »
Flickr »
FriendFeed »
Last.fm »
LinkedIn »
StumbleUpon »

Archives by Category »

(24) Code »
(5) ColdFusion »
(11) Database »
(7) Factor »
(286) General »
(9) JavaScript »
(15) Perl »
(13) PHP »
(17) Ruby »

Archives by Month »

(1) February 2010 »
(5) January 2010 »
(2) October 2009 »
(6) August 2009 »
(11) July 2009 »
(2) May 2009 »
(3) April 2009 »
(2) March 2009 »
(7) February 2009 »
(9) January 2009 »
(14) December 2008 »
(5) November 2008 »
(12) October 2008 »
(13) September 2008 »
(16) August 2008 »
(23) July 2008 »
(20) June 2008 »
(24) May 2008 »
(23) April 2008 »
(27) March 2008 »
(28) February 2008 »
(26) January 2008 »
(7) December 2007 »

Published Works »

Beginning Amazon's SimpleDB (Apress in dev.)
Pro Active Record (Apress 2007)
Web Services with Rails (O'Reilly 2006).

Contributed To »

Ruby Cookbook (O'Reilly 2006)
SQL Cookbook (O'Reilly 2005)
Various Reviews published in Computing Reviews

Free Code I've Created »

SimpleDB library in Python 3.0

Fantasy focused domains »

draftwizard.com
fantasy-football-draft.com
fantasyfootballkit.com
fantasyfootballquiz.com
hockeynotes.com
pegg.it
rosterhelp.com
sportsxml.com
statsfeed.com
supermug.com

Tech. focused domains »

factorcode.com
perlquiz.com
simpledb.info

Social Tool focused domains »

conversationlist.com
friendstat.us
fuzzypop.com
gawk.it
grou.pe
halfbite.com
jivegas.com
pu.ly
tagli.st
timelylinks.com
tym.ly
wow.ly

Utility focused domains »

fubnub.com

Other domains »

betaread.com
botfu.com
falicon.com
storyrank.com

Not yet live domains »

bar.ackoba.ma
basketballnotes.com
buddydirt.com
budrank.com
cakntoba.com
coachwizard.com
cointhief.com
ezbcs.com
falconsrule.com
fantasydeke.com
fantasyfootballrank.com
ffkit.com
footballnotes.com
footballpublishing.com
giggletweet.com
greentile.com
herobrawl.com
kacode.com
kickasscode.com
knowabout.it
leaguewizard.com
nfldraftnews.com
pa.ly
rorbe.com
slidepitch.com
startfail.com
survivorhub.com
tagli.st
thedfl.com
thescoutsreport.com
toptenify.com
tripacation.com
tweetwiki.com
umock.com

* Yes I realize I have a bit of an addiction to domain names, but I really do have specific ideas for each of the above.



This blog is powered by KickAssCode.