I'm building out a few status-like reports for various projects right now (you might even call them TPS reports). There's nothing too special about these type of pages. Really they are mostly just a collection of record counts from various tables.
When I do this type of work, I notice that I often end up doing a lot of re-factoring as the page explodes to include/display more and more data. The problem is often due to my lack of designing what I really want the page to display before I start coding it (I'm especially bad at 'design a plan first' for my own projects and usually dive right into code).
So I often start out building one or two SQL queries to grab the initial data I want. Then I think of something new I want to add, and so I build a new query for that. I repeat that process over an over until suddenly my page is one huge mess of code with tons of little queries and I think to myself 'why am I doing it this way? My page is getting too slow and unmanageable!'
That's when I usually sit down and re-factor my code. Often times I find that I can take many of my little queries and wrap them all up into one or two larger queries. This gives me the benefit of less code and a lot less round trips to the database server (a VERY good thing).
This is especially true when I have a page of a lot of SQL count type statements. But don't take my word for it, use the examples I outline below to test out why re-factoring like I do really is better!
And just because I've been spending a lot of time in both Ruby and PHP as of late, I'm going to give you examples for both languages. (Two for the price of none! Awesome.)
To be complete and thorough let's start our example Ruby code with some the usual database set up stuff. Here I'm referencing the Active Record gem, defining the table classes I'll be working with, and making a connection to my SQL Server database. Both of our Ruby examples below reference this set up code (ie. make sure you include it at the top of your ruby file if you really are trying to run these Ruby examples).
require 'rubygems'
require_gem 'activerecord'
class Tablea < ActiveRecord::Base
set_table_name "tablea"
end
class Tableb < ActiveRecord::Base
set_table_name "tableb"
end
ActiveRecord::Base.establish_connection(:adapter => "sqlserver", :mode => "odbc", :dsn => "myDSN", :database => "myDatabase", :username => "username", :password => "password", :port => "1433")
Now that we've got our basic database stuff defined, we can move on to focus on the real meat of our examples.
In our first version, we are going to going to use the ActiveRecord::Calculations count method. The method is a convenience method that basically builds and executes a SQL count statement in the background. The code for this approach follows:
t1 = Time.now
temp = Tablea.count
temp2 = Tableb.count
t2 = Time.now
puts "#{temp} records in tablea"
puts "#{temp2} records in tableb"
puts "Took: #{t2 - t1}"
If you're a newer programmer (or not a programmer at all) you'll probably love this approach. And I'll admit that it's a short amount of code that is pretty easy to figure out without knowing anything outside of the ActiveRecord basics. Heck, it even executes pretty fast.
The problem is that, weather you know it or not, convenience methods like the count method are generating a lot of extra overhead that you really don't need. In small apps with average use, it's not a big deal and you'll probably never really notice it.
But in larger applications, or pages that require a lot of database action like the status report ones I've been working on, you'll start to notice a bit of lag.
Besides, all it's really doing is generating a SQL count statement, and well those are pretty easy to write on our own anyway. Even better, if we do it on our own, we can use a UNION statement to roll all our counts up into just one query saving
A LOT of over head from the above example.
Here's the re-factored Ruby version:
t1 = Time.now
temp = Tablea.find_by_sql("select count(0) mycount, 'tablea' as type from tablea UNION select count(0) mycount, 'tableb' as type from tableb")
t2 = Time.now
puts "#{temp[0].mycount} records in tablea"
puts "#{temp[1].mycount} records in tableb"
puts "Took: #{t2 - t1}"
While it probably feels a bit strange to an Active Record newbie, it shouldn't be that hard to really follow. The important things to note with the above example are:
1. In this example we didn't need to define the Tableb class since we just used the find_by_sql method on the Tablea class (actually we could have used it on either table in this example). Since we can execute any SQL select statment via the find_by_sql method and we are simply going to deal with the results as a basic Array, this works perfect for our needs.
2. Because of the UNION statement we should get two records back. These get rolled up as elements of the result Array thanks to the find_by_sql method; so in this example all we need to know is which order we executed the statements so that we know which element in the array refers to which count. (we could also have tested against the temp[0].type value to determine which is which, but there is no real need in this example)
OK so moving on to our PHP version of the same concept. I think this first example below really shines a light on how quickly doing this the wrong way can get messy requiring lots and lots of extra code.
<?
$conn = odbc_connect('myDSN','username','password');
$t1 = microtime();
$sql = "select count(0) mycount from tablea";
$rs = odbc_exec($conn, $sql);
$tableacount = 0;
while(odbc_fetch_row($rs)) {
$tableacount = odbc_result($rs, "mycount");
}
$sql = "select count(0) mycount from tableb";
$rs = odbc_exec($conn, $sql);
$tablebcount = 0;
while(odbc_fetch_row($rs)) {
$tablebcount = odbc_result($rs, "mycount");
}
$t2 = microtime();
echo $tableacount . " records in tablean";
echo $tablebcount . " records in tablebn";
echo "Took: " . $t2 - $t1 . "n";
?>
Again, it works fine, but it's a lot of code (and code duplication) to just produce a few simple counts. Plus it requires a lot of trips back and forth to the database engine.
Finally let's take a look at the re-factored PHP version:
<?
$conn = odbc_connect('myDSN','username','password');
$t1 = microtime();
$sql = "select count(0) mycount, 'tableacount' as type from tablea UNION select count(0) mycount, 'tablebcount' as type from tableb";
$rs = odbc_exec($conn, $sql);
$tableacount = 0;
$tablebcount = 0;
while(odbc_fetch_row($rs)) {
$val = odbc_result($rs, "mycount");
$type = odbc_result($rs, "type");
switch(true) {
case($type == "tableacount"):
$tableacount = $val;
break;
case($type == "tablebcount"):
$tablebcount = $val;
break;
}
}
$t2 = microtime();
echo $tableacount . " records in tablean";
echo $tablebcount . " records in tablebn";
echo "Took: " . $t2 - $t1 . "n";
?>
The trick here is basically the switch case statement. As we walk through the results of the query, we determine which value we want to set based on the value of the 'type' of record.
As a side note, we could have done this with a counter much like we accessed the array's in the Ruby version; but that relies on knowing the order the results are returned in. Since, in this case, that approach would not really have been any less code I think it's a better choice to specifically test which row we are dealing with.
Alright, so what's my point after all this? Well basically that I'm an idiot who wastes a lot of time building things the wrong way before I re-factor it to a 'better' way.
That, or 'learn SQL' it's really not so bad (or hard) and it really can make your programs 'better'.
posted by Kevin Marshall on 2008-02-15 00:00:00+00