SQL Server Indexing
Indexing on SQL Server 2008
Contents
Indexes are used to improve query performance by making it quicker to locate data in a table. But indexes add an overhead for INSERT, UPDATE, DELETE and MERGE operations - each time data is changed in a table, each index on the table has to be updated also. Thus a lot of indexes on tables used in OLTP environments is normally a bad idea, but plenty of indexes can be useful of tables used in OLAP processing where the prime activity is reading table data rather than changing it.
B-Tree
Top BottomSQL Server uses B-Tree to create indexes. A B-Tree is constructed from leaf node pages that contain index data for each row in the table, sorted in the specified order. The next level up in the hierarchy is constructed by taking the first entry in each leaf node plus a pointer to the leaf node page. This hierarchical construction continues until a single root node is created.
Each search on the index, begins at the root node, using the pointers to child nodes to traverse the index until the record is found in the leaf node. A failed search occurs when the next highest index value is found before the required value.
A data page in SQL Server is 8,192 bytes and provides 8060 bytes of actual storage. Thus an index on an 8-byte column could index 1007 records in a single page. But the addition of the 1008th record will cause the index to add two more pages to the index. When the 1008th record is added to the table, the data from the original single node becomes a leaf node and half of its data is transferred to a new leaf node. A new root node is added to the hierarchy, containing the first entry in each leaf node. New leaf nodes can be added until the root node has 1007 entries. With this arrangement, the two-level index can index 1,014,049 records in the table with an 8-byte index. When the 1,014,050th record is added, the root node is pushed down to the intermediate level, and half its data is transferred to a new intermediate node at the same level. A new root mode is created with the first entry from each intermediate node. The three-level index (root -> intermediate -> leaf) can now handle over 1 billion (1007*3) records for an 8-byte index.
Clustered Index
Top BottomAn index defined as clustered causes SQL Server 2008 to sort the underlying table in accordance with the clustered index definition. Thus only one clustered key can be defined per table.
A primary key is by default a clustered index - to add a clustered index to a table with a primary key defined, the primary key must be first set to unclustered.
-Drop the primary key (clustered) ALTER TABLE Robots.Robot DROP CONSTRAINT pk_robots GO -Recreate primary key (nonclustered) ALTER TABLE Robots.Robot ADD CONSTRAINT pk_robots PRIMARY KEY NONCLUSTERED (RobotID) GO -Create a clustered index on CreateDate for faster searches CREATE CLUSTERED INDEX icx_createddate ON Robots.Robot (CreatedDate) GO
Covering Indexes
Top BottomAn index that contains enough columns to satisfy a query is called a covering index. Using covering indexes can be useful to optimise query performance, as searches can find all the data they need from the index without having to access the table.
In SQL Server 2008 an index must not contain more than 16 columns, and the combined size of the columns must not exceed 900 bytes. But an index can be defined with an 'included' column. The data from the included column is stored at the leaf node level, but does not count in the 900-byte limit for index size, thus allowing you to add a large field to a covering index.
CREATE TABLE Orders.DeliveryAddress ( AddressID INT IDENTITY(1,1), AddressLine1 VARCHAR(50) NOT NULL, AddressLine2 VARCHAR(50) SPARSE NULL, AddressLine3 VARCHAR(50) SPARSE NULL, Country VARCHAR(50) NOT NULL, CONSTRAINT pk_deliveryaddress PRIMARY KEY (AddressID)) GO CREATE NONCLUSTERED INDEX idx_address_country ON Orders.DeliveryAddress(Country) INCLUDE(AddressLine1) GO
Filtered Indexes
Top BottomThe choice of index to use during a table search is decided by a SQL Server component called the 'Optimiser'. When an index is created, a histogram is generated representing the distribution of data values. The greater the number of unique values: the flatter the histogram. Indexes with higher proportions of unique values are said to be more selective and therefore more useful for searches. However, if an index contains many NULL values then this will skew the histogram for the index and make it less selective. A 'filtered' index can be defined in such circumstances to create an index of the values that are not null.
CREATE NONCLUSTERED INDEX idx_address_country ON Orders.DeliveryAddress(Country) INCLUDE(AddressLine1) WHERE Country IS NOT NULL GO
Relational Index Options
Top BottomThe syntax for CREATE INDEX is documented as:
CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON <object> ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
[ WHERE <filter_predicate> ]
[ WITH ( <relational_index_option> [ ,...n ] ) ]
By default, indexes are created offline (WITH ONLINE = OFF). This means that SQL Server locks the table during index creation, ensuring that no updates occur on the table until the index is created. With Enterprise editions of SQL Server, it is possible to specify on online index (WITH ONLINE = ON), which uses row versioning functionality via tempdb to build the index without preventing updates to the table.
A fill factor (WITH FILL FACTOR = xx) can be specified, which tells SQL Server what percentage of space to allocate from each leaf page on creation. The resulting free space can be used to insert values later on, and reducing the amount of page splitting that occurs as the index is updated
Alter Index
Top BottomThe syntax for ALTER INDEX is:
ALTER INDEX index_name ON object [DISABLE | REBUILD | REORGANIZE]
The ALTER INDEX statement allows you to disable an index. A disabled index still exists in the database, but is unuseable. If a clustered index is disabled, then the whole table is unuseable. To re-enable an index use the REBUILD option of ALTER INDEX
The REORGANIZE option is used to defragment an index
