Sunday 21 July 2019

Clustered and Non Clustered Index: Benefits and Side Effects of Index





Benefits and Side Effects of Index


A table without a clustered-index is called a “heap table”.
A heap table has not its data sorted.
The SQL server has to scan the entire table in order to locate the data, in a process called a “scan”.
In the case of a clustered index, the data are sorted on the key values (columns) of the index.
The SQL server is now able to locate the data by navigating down from the root node, to the branch and finally to the leaf nodes of the B-tree structure of the index. This process is called a “seek”.
The later approach is much faster, when you want to filter or sort the data you want to retrieve.
A non-clustered index, on the other hand, is a completely different object in the table.
It contains only a subset of the columns.
It also contains a row locator looking back to the table’s rows, or to the clustered index’s key.
Because of its smaller size (subset of columns), a non-clustered index can fit more rows in an index page, therefore resulting to an improved I/O performance.
Furthermore a non-clustered index can be allocated to a different File Group, which can utilize a different physical storage in order to improve performance even more.
The side effects of indexes are related to the cost of INSERT, UPDATE, MERGE and DELETE statements.
Such statements can take longer to execute, in the presence of indexes, as it alters the data in the table, thus to the indexes too.
Imagine the situation of an INSERT statement.
It has to add new rows in a table with a clustered index.
In such case the table rows may need to be repositioned


Remember…? The clustered index needs to order the data pages themselves! This will cause overhead.

So, it is crucial to take into account the overhead of
INSERT, UPDATE and DELETE statements before designing your indexing strategy.
Although there is an overhead to the above statements, you have to take into
account that many times, an UPDATE or DELETE statement will execute in a subset of data.
This subset can be defined by a WHERE clause, where indexing may outweigh the additional cost
of index updates, because the SQL server will have to find the data before updating them.
As explained above, a non-clustered index includes the clustered index’s key as its row locator,
in the presence of a clustered index in the table.

This comes with a cost and a benefit:

The cost has to do with the non-clustered index bookmark lookup.
What if a query has to return more columns that the ones hosted in the index itself?
In the case of a HEAP table, the SQL server would have to check the RID of the non-clustered index,
in order to navigate directly to the row, where the rest of the columns belong
In the case of a clustered index, the SQL server would have to check the row locator of the non-clustered index, in order to do an additional navigation to the B-tree structure of the clustered index,
to retrieve the desired row. You see, the row locator does not contain the RID,
but the clustered-index key.
On the other hand, there is a benefit. It has to do with the clustered index updates.
Imagine the following situation: Two new rows with index key values of A2 and A3 have to be added in the clustered index below.
Because this is a clustered index page, its physical structure has to be reallocated in order to fit A2 and A3 between A1 and A4.
It has to maintain index’s order. Since there is no free space in the index page to accommodate these changes, a page split will occur. Now, there is enough space to fit A2 and A3 between A1 and A4.
The goal achieved and the order maintained within the index.
But imagine what would have happened if the non-clustered index was looking at the RID,
instead of the clustered index’s key? It would have to change its row locators to reflect
the changes. This could have been a huge performance hit!
Especially, in the case of large clustered indexes.
Instead of the RID, the row locators now point at the clustered index key.
Meaning, that there is no longer needed to change its values.
This is quite a benefit if you think of the large clustered indexes, that are usually maintained in many tables.

    No comments:

    Post a Comment