Wednesday, 29 December 2010

Making your queries faster

If you run SQL Server management studio, you can include the execution plan with your query (Ctrl-M), it displays a network of how the tables and views join and more importantly, what types of scans are used to get the data. It also shows lines that are proportional to the number of rows being operated on. If you can see any big lines, then possibly your join condition is too generous (and you are using the WHERE clause incorrectly to fix the join) or maybe you do actually want to select tons of data in which case big lines might be acceptable.
As far as table and index scans are concerned, there are really two levels of understanding, one of which will help massively and the other is perhaps when you want to squeeze the last drops of performance out of the query once you have made the major changes.
Let's first consider the easier deductions from the execution plan. If you see any "table scan" elements on your diagram, that is generally bad and unless you have a good reason, you need to create an index on the table that can be used (even if your table is small - it can decrease join performance noticeably). What a table scan means is that the query has to scan all columns in the whole table to find and match rows. Virtually every table should contain a primary key column(s) and this creates an implicit clustered index so at very least you should be scanning the clustered index. If you do not have a primary key (suspicious!), you can still create indexes and it is these that will remove table scans as well as removing clustered index scans which are slightly better than table scans but include all columns still and can be slow also. By creating an index you should be able to include a few key columns which will dictate the sort order (which can help with join performance) but also (SQL Server 2005) you can include additional columns which the query can use but which will not unduly slow the creation and update of the index since they do not dictate ordering. If you have an index on TABLE with, e.g. columns A and B and your query says something like select A from TABLE where B = 2 will use the index directly. You will get an index seek ideally (a subset of rows from the full index, less than about 15%), which is very fast, or perhaps an index scan which has to read all rows but is preferable to table scans since the amount of data being used is a subset of the table data.
The second level of performance comes when your indexes are not quite right and you get strange items like "Bookmark Lookup" or "RID Lookup" which are both similar and caused by the same basic problem. Imagine in your example above you did select A, C from TABLE where B = 2. Although the index can be used to find the matching rows which is fast, in order to return column C, the query needs to find the primary key in the index (which is included by default) and then using this it needs to scan the clustered index or the base table if there is no clustered index in order to find the data for column C. This is a bookmark lookup (or an RID lookup if the base table has no clustered index - it is a 'heap'). These are also slow but possibly faster than a clustered index scan or a table scan. You might find sometimes that the query optimiser decides to not bother using the index and assumes you might as well go straight to the base table. To fix this issue, you would need to include column C into your index or create a new index with A, B and C in it.
Bear in mind that there is always a trade-off between the number of indexes and the insert/update/delete performance of the table as well as the amount of memory required depending on the size of the index which is related to the size of the underlying table. Generally, it is better to add a column into an existing index rather than create a whole new index which is slightly different. If one index has the same columns as another (even if the other index has a few more) then it is redundant e.g. Index1 = A, B, C; Index2 = A, B means that Index2 is redundant since all of the information is already available in Index1.
Also, the actual performance gain depends on the indexed columns relative to the base table columns. If your index had 5 of 6 available columns from the table then there will be little difference between an index scan and a table scan. Also, if you are selecting a text field from a table but using a WHERE clause which is NOT based on the text field, you might find it quicker to split the index into one without the text field (which reduces the size of the index and speeds it up) and then create a second index with just the text field in which the query should be able to use to get the resulting text field for each matching row.
Post a Comment