SQL Server Index

An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.

Poorly designed indexes and a lack of indexes are primary sources of database application bottlenecks. Designing efficient indexes is paramount to achieving good database and application performance. This SQL Server index design guide contains information on index architecture, and best practices to help you design effective indexes to meet the needs of your application.

Database Considerations

When you design an index, consider the following database guidelines:

  • Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes.
    • Avoid over-indexing heavily updated tables and keep indexes narrow, that is, with as few columns as possible.
    • Use many indexes to improve query performance on tables with low update requirements, but large volumes of data. Large numbers of indexes can help the performance of queries that do not modify data, such as SELECT statements, because the query optimizer has more indexes to choose from to determine the fastest access method.
  • Indexing small tables may not be optimal because it can take the query optimizer longer to traverse the index searching for data than to perform a simple table scan. Therefore, indexes on small tables might never be used, but must still be maintained as data in the table changes.
  • Indexes on views can provide significant performance gains when the view contains aggregations, table joins, or a combination of aggregations and joins. The view does not have to be explicitly referenced in the query for the query optimizer to use it.
  • Use the Database Engine Tuning Advisor to analyze your database and make index recommendations.

Query Considerations

When you design an index, consider the following query guidelines:

  • Covering indexes can improve query performance because all the data needed to meet the requirements of the query exists within the index itself.
  • Write queries that insert or modify as many rows as possible in a single statement, instead of using multiple queries to update the same rows. By using only one statement, optimized index maintenance could be exploited.
  • Create nonclustered indexes on the columns that are frequently used in predicates and join conditions in queries. These are your Search Argumentable (SARGable) columns. However, you should avoid adding unnecessary columns. Adding too many index columns can adversely affect disk space and index maintenance performance.
  • Evaluate the query type and how columns are used in the query.

Column Considerations

When you design an index consider the following column guidelines:

  • Keep the length of the index key short for clustered indexes.
  • Columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types cannot be specified as index key columns.
  • An xml data type can only be a key column only in an XML index.
  • Examine column uniqueness.
  • Examine data distribution in the column.
  • Consider using filtered indexes on columns that have well-defined subsets, a well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce storage costs.
  • Consider the order of the columns if the index will contain multiple columns.
  • Consider indexing computed columns.

Index Characteristics

Index characteristics include the following:

  • Clustered versus nonclustered
  • Unique versus nonunique
  • Single column versus multicolumn
  • Ascending or descending order on the columns in the index
  • Full-table versus filtered for nonclustered indexes
  • Columnstore versus rowstore
  • Hash versus nonclustered for Memory-Optimized tables

The following table lists the types of indexes available in SQL Server:

ClusteredA clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values.
NonclusteredA nonclustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table.
ColumnstoreAn in-memory columnstore index stores and manages data by using column-based data storage and column-based query processing.

Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.
UniqueA unique index ensures that the index key contains no duplicate values and therefore every row in the table or view is in some way unique.

Uniqueness can be a property of both clustered and nonclustered indexes.
memory-optimized NonclusteredFor memory-optimized nonclustered indexes, memory consumption is a function of the row count and the size of the index key columns
HashWith a hash index, data is accessed through an in-memory hash table. Hash indexes consume a fixed amount of memory, which is a function of the bucket count.
Index with included columnsA nonclustered index that is extended to include nonkey columns in addition to the key columns.
Index on computed columnsAn index on a column that is derived from the value of one or more other columns, or certain deterministic inputs.
FilteredAn optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.
SpatialA spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. The spatial index reduces the number of objects on which relatively costly spatial operations need to be applied.
XMLA shredded, and persisted, representation of the XML binary large objects (BLOBs) in the xml data type column.
Full-textA special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data.

Clustered Index Architecture

In SQL Server, indexes are organized as B-Trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom nodes in the index are called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.

This illustration shows the structure of a clustered index in a single partition.

Clustered Index Architecture
Structure of a clustered index

Query Considerations

Consider using a clustered index for queries that do the following:

  • Return a range of values by using operators such as BETWEEN, >, >=, <, and <=.
  • Return large result sets.
  • Use JOIN clauses; typically these are foreign key columns.
  • Use ORDER BY or GROUP BY clauses.

Column Considerations

Consider columns that have one or more of the following attributes:

  • Are unique or contain many distinct values
  • Are accessed sequentially
  • Defined as IDENTITY.
  • Used frequently to sort the data retrieved from a table.

Clustered indexes are not a good choice for the following attributes:
>> Columns that undergo frequent changes
>> Wide keys are a composite of several columns or several large-size columns.

Create Clustered Indexes

You can create clustered indexes on tables by using SQL Server Management Studio or Transact-SQL.

Permission: Requires ALTER permission on the table or view. User must be a member of the sysadmin fixed server role or the db_ddladmin and db_owner fixed database roles.

Using SQL Server Management Studio

To create a clustered index by using Object Explorer.

STEP 1: In Object Explorer, expand the table on which you want to create a clustered index.
STEP 2: Right-click the Indexes folder, point to New Index, and select Clustered Index….
STEP 3: In the New Index dialog box, on the General page, enter the name of the new index in the Index name box.
STEP 4: Under Index key columns, click Add….
STEP 5: In the Select Columns from table_name dialog box, select the check box of the table column to be added to the clustered index.
STEP 6: Click OK.
STEP 7: In the New Index dialog box, click OK.

To create a clustered index by using the Table Designer

STEP 1: In Object Explorer, expand the database on which you want to create a table with a clustered index.
STEP 2: Right-click the Tables folder and click New Table….
STEP 3: Create a new table as you normally would.
STEP 4: Right-click the new table created above and click Design.
STEP 5: On the Table Designer menu, click Indexes/Keys.
STEP 6: In the Indexes/Keys dialog box, click Add.
STEP 7: Select the new index in the Selected Primary/Unique Key or Index text box.
STEP 8: In the grid, select Create as Clustered, and choose Yes from the drop-down list to the right of the property.
STEP 9: Click Close.
STEP 10: On the File menu, click Save table_name.

Using Transact-SQL

STEP 1: To create a clustered index
STEP 2: In Object Explorer, connect to an instance of Database Engine.
STEP 3: On the Standard bar, click New Query.
STEP 4: Copy and paste the following example into the query window and click Execute.

USE AdventureWorks2012;
GO
-- Create a new table with three columns.
CREATE TABLE dbo.TestTable
(TestCol1 int NOT NULL,TestCol2 nchar(10) NULL,TestCol3 nvarchar(50) NULL);
GO
-- Create a clustered index called IX_TestTable_TestCol1
-- on the dbo.TestTable table using the TestCol1 column.
CREATE CLUSTERED INDEX IX_TestTable_TestCol1 ON dbo.TestTable (TestCol1);
GO

Limitations and Restrictions

  • When a clustered index structure is created, disk space for both the old (source) and new (target) structures is required in their respective files and filegroups.
  • If a clustered index is created on a heap with several existing nonclustered indexes, all the nonclustered indexes must be rebuilt so that they contain the clustering key value instead of the row identifier (RID).
  • The preferred way to build indexes on large tables is to start with the clustered index and then build any nonclustered indexes.
  • The index key of a clustered index cannot contain varchar columns that have existing data in the ROW_OVERFLOW_DATA allocation unit. If a clustered index is created on a varchar column and the existing data is in the IN_ROW_DATA allocation unit, subsequent insert or update actions on the column that would push the data off-row will fail.

Create Nonclustered Indexes

A nonclustered index is an index structure separate from the data stored in a table that reorders one or more selected columns. Nonclustered indexes can often help you find data more quickly than searching the underlying table; queries can sometimes be answered entirely by the data in the nonclustered index, or the nonclustered index can point the Database Engine to the rows in the underlying table.

Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to locate rows in a table without a clustered index (called a heap). You can create multiple nonclustered indexes on a table or indexed view.

The maximum number of nonclustered indexes that can be created per table is 999.

Nonclustered indexes are implemented in the following ways:

UNIQUE constraints – When you create a UNIQUE constraint, a unique nonclustered index is created to enforce a UNIQUE constraint by default.

Index independent of a constraint – By default, a nonclustered index is created if clustered is not specified.

Nonclustered index on an indexed view – After a unique clustered index has been created on a view, nonclustered indexes can be created.

Permission: Requires ALTER permission on the table or view. User must be a member of the sysadmin fixed server role or the db_ddladmin and db_owner fixed database roles.

Using SQL Server Management Studio

To create a nonclustered index by using the Table Designer.
  • In Object Explorer, expand the database that contains the table on which you want to create a nonclustered index.
  • Expand the Tables folder.
  • Right-click the table on which you want to create a nonclustered index and select Design.
  • On the Table Designer menu, click Indexes/Keys.
  • In the Indexes/Keys dialog box, click Add.
  • Select the new index in the Selected Primary/Unique Key or Index text box.
  • In the grid, select Create as Clustered, and choose No from the drop-down list to the right of the property.
  • Click Close.
  • On the File menu, click Save table_name.
To create a nonclustered index by using Object Explorer
  • In Object Explorer, expand the database that contains the table on which you want to create a nonclustered index.
  • Expand the Tables folder.
  • Expand the table on which you want to create a nonclustered index.
  • Right-click the Indexes folder, point to New Index, and select Non-Clustered Index.
  • In the New Index dialog box, on the General page, enter the name of the new index in the Index name box.
  • Under Index key columns, click Add.
  • In the Select Columns from table_name dialog box, select the check box or check boxes of the table column or columns to be added to the nonclustered index.
  • Click OK.
  • In the New Index dialog box, click OK.

Using Transact-SQL

To create a nonclustered index on a table
  • To create a nonclustered index on a table.
  • In Object Explorer, connect to an instance of Database Engine.
  • On the Standard bar, click New Query.
  • Copy and paste the following example into the query window and click Execute.
USE AdventureWorks2012;
GO
-- Find an existing index named IX_ProductVendor_VendorID and delete it if found.
IF EXISTS (SELECT name FROM sys.indexes
WHERE name = N'IX_ProductVendor_VendorID')
DROP INDEX IX_ProductVendor_VendorID ON Purchasing.ProductVendor;
GO
-- Create a nonclustered index called IX_ProductVendor_VendorID
-- on the Purchasing.ProductVendor table using the BusinessEntityID column.
CREATE NONCLUSTERED INDEX IX_ProductVendor_VendorID
ON Purchasing.ProductVendor (BusinessEntityID);
GO

Columnstore indexes

Columnstore indexes are the standard for storing and querying large data warehousing fact tables. This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in your data warehouse over traditional row-oriented storage.

You can also achieve gains up to 10 times the data compression over the uncompressed data size. Beginning with SQL Server 2016 (13.x) SP1, columnstore indexes enable operational analytics: the ability to run performant real-time analytics on a transactional workload.

What is a columnstore index?

A columnstore index is a technology for storing, retrieving, and managing data by using a columnar data format, called a columnstore.

Columnstore

A columnstore is data that’s logically organized as a table with rows and columns, and physically stored in a column-wise data format.

Rowstore

A rowstore is data that’s logically organized as a table with rows and columns, and physically stored in a row-wise data format. This format is the traditional way to store relational table data. In SQL Server, rowstore refers to a table where the underlying data storage format is a heap, a clustered index, or a memory-optimized table.

Rowgroup

A rowgroup is a group of rows that are compressed into columnstore format at the same time. A rowgroup usually contains the maximum number of rows per rowgroup, which is 1,048,576 rows.

Clustered columnstore index

A clustered columnstore index is the physical storage for the entire table.

Clustered columnstore index
Clustered columnstore index

To reduce fragmentation of the column segments and improve performance, the columnstore index might store some data temporarily into a clustered index called a deltastore and a btree list of IDs for deleted rows. The deltastore operations are handled behind the scenes. To return the correct query results, the clustered columnstore index combines query results from both the columnstore and the deltastore.

Syntax:

-- Create a clustered columnstore index on disk-based table.
CREATE CLUSTERED COLUMNSTORE INDEX index_name
ON { database_name.schema_name.table_name | schema_name.table_name | table_name }
[ WITH ( < with_option> [ ,…n ] ) ]
[ ON ]
[ ; ]

Example:

CREATE TABLE SimpleTable(
ProductKey [INT] NOT NULL,
OrderDateKey [INT] NOT NULL,
DueDateKey [INT] NOT NULL,
ShipDateKey [INT] NOT NULL);
GO
CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON SimpleTable;
GO

Nonclustered columnstore index

A nonclustered columnstore index and a clustered columnstore index function the same. The difference is that a nonclustered index is a secondary index that’s created on a rowstore table, but a clustered columnstore index is the primary storage for the entire table.

The nonclustered index contains a copy of part or all of the rows and columns in the underlying table. The index is defined as one or more columns of the table and has an optional condition that filters the rows.

Syntax:

--Create a nonclustered columnstore index on a disk-based table.
CREATE [NONCLUSTERED] COLUMNSTORE INDEX index_name
ON { database_name.schema_name.table_name | schema_name.table_name | table_name }
( column [ ,…n ] )
[ WHERE [ AND ] ]
[ WITH ( < with_option> [ ,…n ] ) ]
[ ON ]
[ ; ]

Example:

CREATE TABLE SimpleTable
(ProductKey [INT] NOT NULL,OrderDateKey [INT] NOT NULL,
DueDateKey [INT] NOT NULL,ShipDateKey [INT] NOT NULL);
GO
CREATE CLUSTERED INDEX cl_simple ON SimpleTable (ProductKey);
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX csindx_simple
ON SimpleTable (OrderDateKey, DueDateKey, ShipDateKey);
GO

Why should I use a columnstore index?

  • Columns store values from the same domain and commonly have similar values, which result in high compression rates. I/O bottlenecks in your system are minimized or eliminated, and memory footprint is reduced significantly.
  • High compression rates improve query performance by using a smaller in-memory footprint. In turn, query performance can improve because SQL Server can perform more query and data operations in memory.
  • Batch execution improves query performance, typically by two to four times, by processing multiple rows together.
  • Queries often select only a few columns from a table, which reduces total I/O from the physical media.

When should I use a columnstore index?

  • Use a clustered columnstore index to store fact tables and large dimension tables for data warehousing workloads. This method improves query performance and data compression by up to 10 times.
  • Use a nonclustered columnstore index to perform analysis in real time on an OLTP workload.

How do I choose between a rowstore index and a columnstore index?

Rowstore

Rowstore indexes perform best on queries that seek into the data, when searching for a particular value, or for queries on a small range of values. Use rowstore indexes with transactional workloads because they tend to require mostly table seeks instead of table scans.

Columnstore

Columnstore indexes give high performance gains for analytic queries that scan large amounts of data, especially on large tables. Use columnstore indexes on data warehousing and analytics workloads, especially on fact tables, because they tend to require full table scans rather than table seeks.

 

Published by Abdul Samad

Having 17+ years of extensive experience in IT industry, enabled to enhance the team performance and maximize customer satisfaction by strategically managing calls and implementing process improvements. Demonstrated ability to solve problems, meets challenging goals, and expedites delivery. Skilled MSSQL administrator guide team during the crisis situation. Apply Creative thoughts process in re-designing the workflow system to eliminate duplication of effort and increase productivity.

Leave a Reply