Along the lines of yesterday's post I thought I would share a few more of my SQL tips & tricks with you. Technically most of these are T-SQL specific since all my personal projects use a MS SQL server and I therefore use T-SQL the most.
Note: I do also work a decent amount with Oracle and even MySQL from time to time, but I'll save some of my tips & tricks for them for another day. If you're really interested in solid SQL tricks for various database engines, you should check out the SQL Cookbook writen by my friend (and former co-worker) Anthony Molinaro [he even based one or two recipies in there on some of my old hack-job queries].
Anyway, let's get on to the quick tips (presented in no special order):
Tip 1 Comparing the date of a record to the current date
It's a pretty common situation to have to get all the records out of a table for a given date (or that have been added since a given date). Even more specific, I often need to get just the records that were added today.
You can use a code based solution to generate today's date, and then bake that into your SQL query but really that just adds un-needed processing to your code. Plus there's a strong chance that your code based dates will not easily match the T-SQL format for dates (so you'll still need to do some date conversion or formatting).
The following T-SQL handles it much nicer (IMHO):
SELECT * FROM tablea WHERE CONVERT(varchar, date1, 1) = CONVERT(varchar, GETDATE(), 1)
The basic trick in the above SQL is to use the T-SQL getdate() function to get the current date/time (as the SQL engines has it) and then use the T-SQL convert() function to make sure that you are comparing apples to apples. The third param of the convert function determines what format your date is returned in this example, and if you play around with other values for it, you can also use the same basic code to narrow by month, year, day, time, or various other versions of dates and times.
Tip 2 Randomly selecting a row from a table
Every so often I need to pick a random record from a table. For example, when I want to suggest a random fight command in the BotFu fight game.
A code based approach would require me to first get a list of the existing ids within the table, randomly select one, and then query the database a second time for that associated record.
This T-SQL works much better:
SELECT TOP 1 * FROM tablea ORDER BY NEWID()
The trick this time was the use of the newid() function in the order by clause which will basically sort our records in a random order. We then just use the 'top' function to return only the first record since we only wanted one random row (we could change the top value to something else if we needed more than one random record).
Tip 3 Getting a range of records from a table (pagination)
Once your tables grow beyond a simple size, you're most likely going to need to work with displaying only subsets of records.
Often times, even limiting your results with a 'where' clause is going to return many more records than you actually want to display or deal with on a given page. In those cases, you are going to want to be able to page through the results (ie. treat a range of records as a page and be able to move from one range to another).
Again you could tackle this problem in your code with a simple counter. Just start displaying records after the counter gets to the right spot, and stop once it goes beyond the end spot. However that approach still requires you to grab the full data set from the database and loop over all the results (or at least a larger set than you really need to). All of which means using more resources than you should.
Here's a much better T-SQL based solution:
SELECT * FROM (SELECT TOP 5 * FROM (SELECT TOP 10 * FROM tablea ORDER BY id) AS tmp ORDER BY id DESC) AS tmp1 ORDER BY id
In the above example we are getting 5 records at a time (the first number you see is the number of results we'll get back), and we are starting with the second 'page' of five (the second number is the number of results we get back [5] times the 'page' we want [2]).
It takes a little bit of getting used to, but the nested select sequence here is really pretty cool. Working from the inside out, all it's really doing is grabbing the top 10 records ordered by our id field, then from that result, taking the top 5 records ordered by our id field descending (ie. flipping the 10 and taking the top 5 of that), and finally taking all of those results and re-ordering them by our id field (since that's how we really want them ordered).
In real-life, you'll want to manage the number bits of the statement with code so that you can actually 'flip' through the various pages.
Tip 4 Sorting null values properly (moving them to the bottom of a list)
I almost always have to sort my results in some fashion, and often it's based on a user-input, non-required text field like address. Since the field is not 'required', many users have not provided the information and so the field has a NULL value.
The problem is that, when sorting by that type of field, I generally want to give the least amount of importance to people who have left the value blank (ie. Nulls) but T-SQL treats the NULL value as less than NON-NULL values. So in a normal sort, the people who didn't provide data end up being listed first - YUCK.
The following T-SQL helps to correct the problem so that NULL values are listed last:
SELECT * FROM tablea ORDER BY CASE WHEN address IS NULL THEN 'zzzz' END
The trick in the above example is the use of the CASE WHEN/THEN END statement. Here we basically use the statement to 'change' the NULL value to a value of 'zzzz' which we assume to sort last in our results (you should make sure to choose a value that you can be sure will properly sort to the end of your results).
NOTE: That CASE statement is actually VERY useful in a number of other situations as well (but I'll leave it for you to explore those on your own).
Tip 5 Limiting results based to a subset of records within the same table.
One of the trickier problems that I often run into (thanks to my own table design idioms - saved for future post) is the need to limit the results of my query to a sub set of records that have something in common with each other. I think a plain English example helps to explain this better:
Let's say that I have a table where I store usernames and game_ids (so I know who's played what games). Now let's say that I want to know what games 'Kevin' has played that 'Keith' has also played.
Again, you could solve this problem via code and two queries. For example putting both result sets into arrays and then comparing the arrays for similarities.
Still that's a lot of extra processing and work when you can just use this type of T-SQL solution:
SELECT * FROM tablea WHERE username = 'Kevin' AND game_id IN (SELECT game_id FROM tablea WHERE username = 'Keith')
Basically it's a simple subquery. We first get all the game_ids for the games that 'Keith' has played, and then we get all the game_ids for the games that 'Kevin' has played AND that are in the results of our first query.
Back to my point
None of the tips I showed above are that ground breaking or amazing, and I hope that they actually come across as pretty simple and straight forward (so maybe you'll use them too).
In the end, I think it's all about taking advantage of the things each part of your environment provides. Letting each handle the things it does best (ie. most efficient) and figuring out alternative solutions to those problems that it doesn't handle well.
T-SQL (and SQL in general) really does provide you with a lot of hidden power. Learning even just a handful of simple tips & tricks can really help make your life as a developer a lot easier!



