Today I threw together the start of a simple Perl script to parse RSS 2.0 feeds...and since I'm not in the mood to do a lot of typing tonight, I'm just going to share that script with you tonight:
use Net::HTTP;
use XML::Simple;
sub getHTML {
my $s = Net::HTTP->new(Host => shift) || die $@;
my $page = shift || "/";
$s->write_request(GET => $page, 'User-Agent' => "Mozilla/5.0");
my($code, $mess, %h) = $s->read_response_headers;
my $data = "";
while (1) {
my $buf;
my $n = $s->read_entity_body($buf, 1024);
die "read failed: $!" unless defined $n;
last unless $n;
$data .= $buf;
}
return $data;
}
sub parseRSS {
# title, link, description, {pubdate}
my $config = XMLin(shift);
$items = $config->{channel}->{item};
foreach (@$items) {
my $item = $_;
print "$item->{title}:$item->{link}:$item->{pubDate}n";
}
}
# get the data from the URL
$htmldata = getHTML("feeds.feedburner.com","/BotFu");
# parse the RSS
parseRSS($htmldata);
The above script doesn't actually do anything all that interesting - it just prints out the title, link, and pubDate of the feed you pass it...but it shows the core of getting at that data. What you do with it from there really depends on your needs - but isn't that usually the case?



