SQL Server Collation

Collations in SQL Server provide sorting rules, case, and accent sensitivity properties for your data. Collations that are used with character data types, such as char and varchar, dictate the code page and corresponding characters that can be represented for that data type.

A collation specifies the bit patterns that represent each character in a dataset. Collations also determine the rules that sort and compare data. SQL Server supports storing objects that have different collations in a single database. For non-Unicode columns, the collation setting specifies the code page for the data and which characters can be represented. The data that you move between non-Unicode columns must be converted from the source code page to the destination code page.

For example, the collation Japanese_Bushu_Kakusu_100_CS_AS_KS_WS_UTF8 is case-sensitive, accent-sensitive, kana-sensitive, width-sensitive, and UTF-8 encoded. As another example, the collation Japanese_Bushu_Kakusu_140_CI_AI_KS_WS_VSS is case-insensitive, accent-insensitive, kana-sensitive, width-sensitive, variation-selector-sensitive, and it uses non-Unicode encoding.

sql server collation
Collation Setup during SQL Server Installation

The behavior associated with these various options is described in the following table:

OptionDescription
Case-sensitive (_CS)Distinguishes between uppercase and lowercase letters. If this option is selected, lowercase letters sort ahead of their uppercase versions. If this option isn’t selected, the collation is case-insensitive. That is, SQL Server considers the uppercase and lowercase versions of letters to be identical for sorting purposes. You can explicitly select case insensitivity by specifying _CI.
Accent-sensitive (_AS)Distinguishes between accented and unaccented characters. For example, “a” is not equal to “ấ”. If this option isn’t selected, the collation is accent-insensitive. That is, SQL Server considers the accented and unaccented versions of letters to be identical for sorting purposes. You can explicitly select accent insensitivity by specifying _AI.
Kana-sensitive (_KS)Distinguishes between the two types of Japanese kana characters: Hiragana and Katakana. If this option isn’t selected, the collation is kana-insensitive. That is, SQL Server considers Hiragana and Katakana characters to be equal for sorting purposes. Omitting this option is the only method of specifying kana-insensitivity.
Width-sensitive (_WS)Distinguishes between full-width and half-width characters. If this option isn’t selected, SQL Server considers the full-width and half-width representation of the same character to be identical for sorting purposes. Omitting this option is the only method of specifying width-insensitivity.
Variation-selector-sensitive (_VSS)Distinguishes between various ideographic variation selectors in the Japanese collations Japanese_Bushu_Kakusu_140 and Japanese_XJIS_140, which are introduced in SQL Server 2017 (14.x). A variation sequence consists of a base character plus an additional variation selector. If this _VSS option isn’t selected, the collation is variation-selector-insensitive, and the variation selector isn’t considered in the comparison. That is, SQL Server considers characters built upon the same base character with differing variation selectors to be identical for sorting purposes. For more information, see Unicode Ideographic Variation Database.
Variation-selector-sensitive (_VSS) collations aren’t supported in full-text search indexes. Full-text search indexes support only Accent-Sensitive (_AS), Kana-sensitive (_KS), and Width-sensitive (_WS) options. SQL Server XML and CLR engines don’t support (_VSS) Variation selectors
Binary (_BIN)Sorts and compares data in SQL Server tables based on the bit patterns defined for each character. Binary sort order is case-sensitive and accent-sensitive. Binary is also the fastest sorting order. For more information, see the Binary collations section in this article.
Binary-code point (_BIN2)Sorts and compares data in SQL Server tables based on Unicode code points for Unicode data. For non-Unicode data, Binary-code point uses comparisons that are identical to those for binary sorts.
The advantage of using a Binary-code point sort order is that no data resorting is required in applications that compare sorted SQL Server data. As a result, a Binary-code point sort order provides simpler application development and possible performance increases. For more information, see the Binary collations section in this article.
UTF-8 (_UTF8)Enables UTF-8 encoded data to be stored in SQL Server. If this option isn’t selected, SQL Server uses the default non-Unicode encoding format for the applicable data types. For more information, see the UTF-8 Support section in this article.

NOTE: If Binary or Binary-code point is selected, the Case-sensitive (_CS), Accent-sensitive (_AS), Kana-sensitive (_KS), and Width-sensitive (_WS) options are not available.

Collation sets

SQL Server supports the following collation sets:

Windows collations

Windows collations define rules for storing character data that’s based on an associated Windows system locale.

Binary collations

Binary collations sort data based on the sequence of coded values that are defined by the locale and data type.

SQL Server collations

SQL Server collations (SQL_*) provide sort order compatibility with earlier versions of SQL Server.

Collation levels

Setting collations are supported at the following levels of an instance of SQL Server:

  1. Server-level collations
  2. Database-level collations
  3. Column-level collations
  4. Expression-level collations

Server-level collations

The default server collation is determined during SQL Server setup, and it becomes the default collation of the system databases and all user databases.

Query:
To query the server collation for an instance of SQL Server, use the SERVERPROPERTY function:
SELECT CONVERT(varchar, SERVERPROPERTY('collation'));
To query the server for all available collations, use the following fn_helpcollations() built-in function:
SELECT * FROM sys.fn_helpcollations();

Database-level collations

The database collation is used for all metadata in the database, and the collation is the default for all string columns, temporary objects, variable names, and any other strings used in the database. When you change the collation of a user database, there can be collation conflicts when queries in the database access temporary tables.

Note: You can’t change the collation of system databases unless you change the collation for the server.

When you create or modify a database, you can use the COLLATE clause of the CREATE DATABASE or ALTER DATABASE statement to specify the default database collation. If no collation is specified, the database is assigned the server collation.

Query:
You can retrieve the current collation of a database by using a statement that's similar to the following:
SELECT CONVERT (VARCHAR(50), DATABASEPROPERTYEX('database_name','collation'));
You can change the collation of a user database by using an ALTER DATABASE statement that's similar to the following:
ALTER DATABASE myDB COLLATE Greek_CS_AI;

Column-level collations

When you create or alter a table, you can specify collations for each character-string column by using the COLLATE clause. If you don’t specify a collation, the column is assigned the default collation of the database.

Query:
You can change the collation of a column by using an ALTER TABLE statement that's similar to the following:
ALTER TABLE myTable ALTER COLUMN mycol NVARCHAR(10) COLLATE Greek_CS_AI;

Expression-level collations

Expression-level collations are set when a statement is run, and they affect the way a result set is returned. This enables ORDER BY sort results to be locale-specific.

Query:
To implement expression-level collations, use a COLLATE clause such as the following:
SELECT name FROM customer ORDER BY name COLLATE Latin1_General_CS_AI;

Published by Abdul Samad

Having 17+ years of extensive experience in IT industry, enabled to enhance the team performance and maximize customer satisfaction by strategically managing calls and implementing process improvements. Demonstrated ability to solve problems, meets challenging goals, and expedites delivery. Skilled MSSQL administrator guide team during the crisis situation. Apply Creative thoughts process in re-designing the workflow system to eliminate duplication of effort and increase productivity.

Leave a Reply