Thursday, 12 March 2009

INNER JOINS, OUTER JOINS and CROSS JOINS

Sql Joins, they seem easy to begin with but then as soon as you think you understand, something happens leaving you scratching your head. This is a brief but hopefully very useful introduction to the 4 main joins.
We will begin by assuming we have a database with two tables that is used to track people borrowing books from the library. We will ignore all the irrelevant columns that we might actually have and keep it simple:


Table 1 (Members)
Name: varchar(250)
ID: int (identity, autoinc, primary key)


Table 2 (Bookings)
Book: int (foreign key to books table)
Member: int (foreign key to members table)
DateOut: DateTime


So when a user takes a book out, its ID number, the member's ID number and the date are put into the Bookings table. We now want to use the data in the tables. Suppose we want to know who has what books checked out, we need information from more than one table (we will forget the book name for now and assume we only want the date, book id number and the person's name). We need to join them together.
What are we actually doing? We are relating information from the rows of one table to the rows of another. In some cases there will be 1 row in each that matches but in most cases, as here, there can be many. For instance a single person can borrow e.g. 5 books so 1 row in members might match 5 rows in Bookings. It is worth noting for later that a row in one table might match zero rows in the other table (somebody with no books on loan) which we will look at later.

INNER JOIN


An inner join simply says that the information we want to return from our select statement will have at least one entry in both tables. For each row we can join using the criteria (the columns that we will match), a row will be returned in the results. If we cannot match a row from one to the other, the row will not be returned.
select Name, Book, DateOut from Members INNER JOIN Bookings on Members.ID = Bookings.Member

If John has a book taken out (and therefore e.g. an entry for book 11 on 20th Jan), the previous select would include: John, 11, 20th Jan. If however there was a member called Luke who had no books on loan, his name would NOT show in the results because his row in members would not JOIN any rows in the bookings table.

It is very common that actually we want to know everybody in the system and if they have any books on loan or not. We simply use...

LEFT OUTER JOIN


A left outer join says that we will return items from the left-hand table even if there are no entries in the right hand table. If we run exactly the same select as before but with a LEFT OUTER JOIN,
select Name, Book, DateOut from Members LEFT OUTER JOIN Bookings on Members.ID = Bookings.Member

We would get "Luke, NULL, NULL" as well as "John, 11, 20th Jan" in the results. The NULLS mean there is nothing there, not zero, not blank, nothing i.e. NULL. We can use the coalesce function if we would rather return something useful instead of NULL:
select Name, coalesce(Book, ''), coalesce(DateOut, '') from Members etc..

RIGHT OUTER JOIN


Don't bother. It is the same as the LEFT flavour but the other way round and to be honest, it is usually better to rewrite your select so it effectively reads from left to right and only uses left outer joins.

FULL OUTER JOIN


A full outer join returns all items from both sides whether or not there are matching rows but if they match, they will be returned together. Not generally considered good practice since are the tables even related if something can exist on either side and not the other?

CROSS JOIN


Sometimes, it is not possible to join a table based on any relationship. For instance, suppose we want to work out the prices to replace certain types of books for a particular person. There is a table of 10 book categories with base prices but this has to be multiplied by a certain value that exists in the Members table (perhaps how rich they are!!). You have nothing to join on.
Table 1 (Members)
Name: varchar(250)
ID: int (identity, autoinc, primary key)
PriceFudgeFactor: decimal(9,5)


Table 2 (BookReplacements)
BookType: int (foreign key to book categories table)
BasePrice: Decimal(19,5)


What do you do? Well a CROSS JOIN matches each item in the left hand table to each item on the right with no linked columns. What would result in our case would be a list where the number of rows equals the number of users times 10 (the number of categories) What you would probably then do, using a sub-query is do the calculation using the base price from one side and the factor from the other:

SELECT PriceFudgeFactor * BasePrice as TotalPrice FROM
(
SELECT Name, PriceFudgeFactor, BasePrice FROM Members CROSS JOIN BookReplacements
)

Note there is no "ON" clause after the join since it will match all rows with all other rows. I have used INNER and LEFT OUTER joins frequently, CROSS JOINS occassionally and never RIGHT or FULL OUTER joins so that should give you some help knowing which to use.
Post a Comment