Pages and Extents Architecture Guide

The page is the fundamental unit of data storage in SQL Server. An extent is a collection of eight physically contiguous pages. Extents help efficiently manage pages. This guide describes the data structures that are used to manage pages and extents in all versions of SQL Server. Understanding the architecture of pages and extents is important for designing and developing databases that perform efficiently.

Pages

In SQL Server all the data rows are written on pages. In a book, all pages are the same physical size. Similarly, in SQL Server all data pages are the same size – 8 kilobytes. In a book most pages contain the data – the main content of the book – and some pages contain metadata about the content – for example table of contents and index. Most pages contain actual rows of data which were stored by users; these are called Data pages and text/image pages (for special cases).

The following table shows the page types used in the data files of a SQL Server database.

Page typeContents
DataData rows with all data, except text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data, when text in row is set to ON.
IndexIndex entries.
Text/ImageLarge object data types: (text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data)
Variable length columns when the data row exceeds 8 KB: (varchar, nvarchar, varbinary, and sql_variant)
Global Allocation Map, Shared Global Allocation MapInformation about whether extents are allocated.
Page Free Space (PFS)Information about page allocation and free space available on pages.
Index Allocation MapInformation about extents used by a table or index per allocation unit.
Bulk Changed MapInformation about extents modified by bulk operations since the last BACKUP LOG statement per allocation unit.
Differential Changed MapInformation about extents that have changed since the last BACKUP DATABASE statement per allocation unit.

Note: Log files do not contain pages; they contain a series of log records.

sql page

Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each row offset entry records how far the first byte of the row is from the start of the page. Thus, the function of the row offset table is to help SQL Server locate rows on a page very quickly. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Extents

An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.

SQL Server has two types of extents:

Uniform extents

Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.

Mixed extents

Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.

sql extents

SQL Server does not allocate whole extents to tables with small amounts of data. A new table or index generally allocates pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

Reading Pages

The I/O from an instance of the SQL Server Database Engine includes logical and physical reads. A logical read occurs every time the Database Engine requests a page from the buffer cache. If the page is not currently in the buffer cache, a physical read first copies the page from disk into the cache.

The read requests generated by an instance of the Database Engine are controlled by the relational engine and optimized by the storage engine. The relational engine determines the most effective access method (such as a table scan, an index scan, or a keyed read); the access methods and buffer manager components of the storage engine determine the general pattern of reads to perform, and optimize the reads required to implement the access method. The thread executing the batch schedules the reads.

Read-Ahead

The Database Engine supports a performance optimization mechanism called read-ahead. Read-ahead anticipates the data and index pages needed to fulfill a query execution plan and brings the pages into the buffer cache before they are actually used by the query.

The read-ahead mechanism allows the Database Engine to read up to 64 contiguous pages (512KB) from one file.

Writing Pages

The I/O from an instance of the Database Engine includes logical and physical writes. A logical write occurs when data is modified in a page in the buffer cache. A physical write occurs when the page is written from the buffer cache to disk.

When a page is modified in the buffer cache, it is not immediately written back to disk; instead, the page is marked as dirty. This means that a page can have more than one logical write made before it is physically written to disk. For each logical write, a transaction log record is inserted in the log cache that records the modification. The log records must be written to disk before the associated dirty page is removed from the buffer cache and written to disk. SQL Server uses a technique known as write-ahead logging that prevents writing a dirty page before the associated log record is written to disk. This is essential to the correct working of the recovery manager.

The following illustration shows the process for writing a modified data page:

sql data page

When the buffer manager writes a page, it searches for adjacent dirty pages that can be included in a single gather-write operation. Adjacent pages have consecutive page IDs and are from the same file; the pages do not have to be contiguous in memory.

A dirty page is written to disk in one of three ways:

Lazy writing

The lazy writer is a system process that keeps free buffers available by removing infrequently used pages from the buffer cache. Dirty pages are first written to disk.

Eager writing

The eager write process writes dirty data pages associated with minimally logged operations such as bulk insert and select into. This process allows creating and writing new pages to take place in parallel. That is, the calling operation does not have to wait until the entire operation finishes before writing the pages to disk.

Checkpoint

The checkpoint process periodically scans the buffer cache for buffers with pages from a specified database and writes all dirty pages to disk. Checkpoints save time during a later recovery by creating a point at which all dirty pages are guaranteed to have been written to disk. The user may request a checkpoint operation by using the CHECKPOINT command, or the Database Engine may generate automatic checkpoints based on the amount of log space used and time elapsed since the last checkpoint. In addition, a checkpoint is generated when certain activities occur. For example, when a data or log file is added or removed from a database, or when the instance of SQL Server is stopped.

Published by Abdul Samad

Having 17+ years of extensive experience in IT industry, enabled to enhance the team performance and maximize customer satisfaction by strategically managing calls and implementing process improvements. Demonstrated ability to solve problems, meets challenging goals, and expedites delivery. Skilled MSSQL administrator guide team during the crisis situation. Apply Creative thoughts process in re-designing the workflow system to eliminate duplication of effort and increase productivity.

Leave a Reply