Tuesday, 17 March 2009

Dodgy Joins

When do you use the ON statement in a join and when do you use the WHERE clause?
Many newbies fall over on this and scratch their heads. I have fallen foul of it many times but have now learnt what I've done wrong.
Take the following tables:
itemprices
Price: decimal (19,5)
QuotationReference: varchar(128) (foreign key references quotes)
ItemName: varchar(128) (foreign key references items)

items
ItemName: varchar(128)
DefaultPrice: decimal(19,5)

It has any calculated prices that have been applied for particular quotations. The first table has 0 to count(*) from items rows per quotation. In other words, for quotation X, the first table might have between 0 and lets say 100 rows (if there are 100 rows in the items table).
Now, I want to get the price for each of the 100 items in the second table for a particular quote but I want to use the Price from the first table (if it exists) in preference over the DefaultPrice in the second.
The SQL might be:
SELECT items.ItemName, coalesce(Price,DefaultPrice) as Price
FROM items LEFT OUTER JOIN itemprices ON items.ItemName = itemprices.ItemName
WHERE QuotationReference = @QuoteRef

It looks correct doesn't it? You use LEFT OUTER JOIN since there might not be an entry for each item in the prices table and then you filter out all the rows that do not belong to your quotation. You will find that it does not work. Can you see what might happen?
The join is incomplete. Simply joining on the itemname would be fine if your quotation had an entry in itemprices but if it didn't then it would join rows from any other quotation. You would then apply the WHERE clause and lose all of the rows. Interestingly if you had a single entry for your quotation OR multiple entries including one for your own quotation, the join would work it is ONLY in the case where your quotation has no entry and at least one other quotation does. In other words you might not notice straight away.
The reason the join is incorrect is that logically you do not want to relate rows that simply have the same name, they also need to have the same quotation reference. A join should only link rows that are related so that if you had no WHERE clause, the data would still be logically correct and consistent. In our case, the join should be:
SELECT items.ItemName, coalesce(Price,DefaultPrice) as Price
FROM items LEFT OUTER JOIN itemprices ON items.ItemName = itemprices.ItemName
AND itemprices.QuotationReference = @QuoteRef
WHERE QuotationReference = @QuoteRef

If you were joining two tables that both had quotation references, you might want the ON statement to use the column name rather than the parameter but either way works. Now what happens if you have no entry for your quotation in itemprices but you DO have entries for other quotations, the join will fail. You will get NULL for the itemprices columns and the select will return the DefaultPrice instead. You would have to tailor it slightly if you were writing a view since you would have no parameters to join on. In this case you would have to start from the quotes table so that you can return info for all quotes:
SELECT quotes.Reference,items.ItemName,coalesce(Price,DefaultPrice) as Price 
FROM quotes
CROSS JOIN items ---Ensure you have all items for all quotes
LEFT OUTER JOIN itemprices ON items.ItemName = itemprices.ItemName
AND itemprices.QuotationReference = quotes.Reference
Post a Comment