Kevin Nelson Marshall
Other entries:
« Train all you want...

OK - my weekend of fun was, well, fun! But my friends have all gone home now and it's time to get back to work.

Tops on my plate right now (or at least tops on my brain) is cleaning up the last few important parts to have a real beta version of fubnub.com.

As I saw it, to achieve this goal I had to get over one big road block: The ability to schedule your feeds for automatic fubnub'ing.

Actually, I've had this challenge on my to-do list from the start of the project, but I kept putting it off because my brain was thinking it was going to be a fairly annoying thing to implement. When I say annoying, I mean in the sense that it was going to take some solid design and serious thought before I sat at my keyboard and starting hacking. That is, if I wanted to create a solution that had the potential to scale over time.

Here's what I came up with:

1. Ultimately a scheduled task will run once an hour to fubnub any feeds that should be. Since I'm doing the backend of the rest of this project in Perl, I decided that this scheduled task should also be a Perl program.

The biggest challenge with this step is actually providing information to the users about how the system is works in the backend (which I haven't done yet).

The trick is that users can only have feeds published by the hour, the day, the week, or the month...so the first time you publish, you might say to publish every hour or month but it may publish the first feed almost immediately (because you have never published and you just set it up before the program started to run).

After the first time the feed gets processed, it should then be on the proper schedule and so I'm hoping it's not a big deal. But it may create some confusion for people as they get started. Knowing that up front allows me to prepare for handling all the confusion.

2. To actually accomplish scheduled posting, I created one table that stores the details about what feed you want to have automatically posted. This table includes information about what the interval for posting should be as well as when the last time the feed was published.

The user populates this table via the web site when they set up the schedule, but it's really the Perl program in the first step that uses this table.

Here's a brief rundown of the table structure:

scheduled_id - int
scheduled_title - varchar
scheduled_frequency - varchar
scheduled_time - int
account_id - int
scheduled_lastpublished - datetime
delivery_id - int


The trick here is that, to make things ready to scale, I really wanted to limit the number of database calls my Perl program has to make.

The easy solution of course would be to first grab all the records, pulling out the interval and determining when the next time a feed should be published is. Then doing a second query pulling out just the details of those records.

That's an easy solution, but it really doesn't scale for a couple of reasons.

First, you have to deal with every single record in the database - many of which won't be ready for processing this time through the program (and imagine you have a million plus users!).

Second, you've got to make at least two database calls. And the more database calls you have to make, the harder it is to scale your system.

So I wanted to be able to come up with a query that pulls up the details of which feeds need published (and only those feeds) all in one query.

And since I control the format and details of what data actually gets into the table (by limiting the options the user has on the front end), I can use that inside knowledge to code against certain expected values.

In the end, it took a little bit of brain power and use of some slightly more complex T-SQL than your day-to-day stuff, but here's the query I came up with:

select
  scheduled_id,
  scheduled_title,
  scheduled.account_id,
  scheduled_lastpublished,
  command
from
  scheduled,
  delivery,
  commands
where
  case lower(scheduled_frequency)
    when 'hours' then dateadd(hour, -scheduled_time, getdate())
    when 'days' then dateadd(day, -scheduled_time, getdate())
    when 'weeks' then dateadd(week, -scheduled_time, getdate())
    when 'months' then dateadd(month, -scheduled_time, getdate())
  end > isnull(scheduled_lastpublished, dateadd(month, -50, getdate())) and
  delivery.delivery_id = scheduled.delivery_id and
  delivery.feedtype = commands.feedtype


If you break down the query, you see that it really was a simple matter of a CASE statement, the DATEADD statement, and a ISNULL statement. With a few joins thrown in to actually get at all the data I needed (details of that are omitted from this post because they aren't relevant to this topic right now)

As I mentioned above, I know the small set of intervals a user can pick (hours, days, weeks, months). This means I can hard code against that.

I use the user defined time range with the DATEADD feature to move back in time to the appropriate spot. Once I'm there, I just need to see if it's still more recent than the last published date.

Next, since the first time you publish there is no last published date, I use the ISNULL statement to move to a time that will always be less than what a user could pick (in this case 50 months ago).

This way, a feed will always get published the first time the program sees the feed is scheduled.

The rest of the Perl program I mentioned in step one simply kicks off the fubnub'ing of the feeds and updates the scheduled table as needed (eventually using threads to publish, and wrapping the update calls into as few queries as possible).

Once again it's really that simple - much easier than my brain thought it was going to be! You just need to have that inside knowledge of your system, your data, and think at least a little!

posted by Kevin Marshall on 2008-05-26 00:00:00+00

Subscribe »

BotFu feed with RSS reader

BotFu feed by Email


Search All Posts »

Blog Details »

This blog now includes 286 wonderfully exciting posts from 1 unique and very special writer!


Kevin Marshall - Who's That?

I'm just your basic programmer. I can't spell to save my life, I'm not the greatest story teller, and I often ramble on about nothing. This blog showcases all of that!

If you're bored drop me an email at info at falicon.com or view my outdated resume.


Stalk me on »

Twitter (@falicon) »
Delicious »
Digg »
Disqus »
Facebook »
Flickr »
FriendFeed »
Last.fm »
LinkedIn »
StumbleUpon »

Archives by Category »

(24) Code »
(5) ColdFusion »
(11) Database »
(7) Factor »
(286) General »
(9) JavaScript »
(15) Perl »
(13) PHP »
(17) Ruby »

Archives by Month »

(1) February 2010 »
(5) January 2010 »
(2) October 2009 »
(6) August 2009 »
(11) July 2009 »
(2) May 2009 »
(3) April 2009 »
(2) March 2009 »
(7) February 2009 »
(9) January 2009 »
(14) December 2008 »
(5) November 2008 »
(12) October 2008 »
(13) September 2008 »
(16) August 2008 »
(23) July 2008 »
(20) June 2008 »
(24) May 2008 »
(23) April 2008 »
(27) March 2008 »
(28) February 2008 »
(26) January 2008 »
(7) December 2007 »

Published Works »

Beginning Amazon's SimpleDB (Apress in dev.)
Pro Active Record (Apress 2007)
Web Services with Rails (O'Reilly 2006).

Contributed To »

Ruby Cookbook (O'Reilly 2006)
SQL Cookbook (O'Reilly 2005)
Various Reviews published in Computing Reviews

Free Code I've Created »

SimpleDB library in Python 3.0

Fantasy focused domains »

draftwizard.com
fantasy-football-draft.com
fantasyfootballkit.com
fantasyfootballquiz.com
hockeynotes.com
pegg.it
rosterhelp.com
sportsxml.com
statsfeed.com
supermug.com

Tech. focused domains »

factorcode.com
perlquiz.com
simpledb.info

Social Tool focused domains »

conversationlist.com
friendstat.us
fuzzypop.com
gawk.it
grou.pe
halfbite.com
jivegas.com
pu.ly
tagli.st
timelylinks.com
tym.ly
wow.ly

Utility focused domains »

fubnub.com

Other domains »

betaread.com
botfu.com
falicon.com
storyrank.com

Not yet live domains »

bar.ackoba.ma
basketballnotes.com
buddydirt.com
budrank.com
cakntoba.com
coachwizard.com
cointhief.com
ezbcs.com
falconsrule.com
fantasydeke.com
fantasyfootballrank.com
ffkit.com
footballnotes.com
footballpublishing.com
giggletweet.com
greentile.com
herobrawl.com
kacode.com
kickasscode.com
knowabout.it
leaguewizard.com
nfldraftnews.com
pa.ly
rorbe.com
slidepitch.com
startfail.com
survivorhub.com
tagli.st
thedfl.com
thescoutsreport.com
toptenify.com
tripacation.com
tweetwiki.com
umock.com

* Yes I realize I have a bit of an addiction to domain names, but I really do have specific ideas for each of the above.



This blog is powered by KickAssCode.