SQL Server Architecture - A basic guide to MSSQL
SQL Server Architecture - A basic guide to MSSQL
SQL Server
SQL Server, created by Microsoft, is a relational database management system (RDBMS) designed to store, retrieve,
and manage large volumes of information. It uses Structured Query Language (SQL) to communicate with
databases, making it an effective tool for enterprises with a wide range of data requirements.
SQL Server follows a client-server architecture. The client, often an application or user interface, sends SQL queries
to the server. The server, in turn, processes these queries, executing operations such as data retrieval, insertion,
or modification. The SQL Server engine manages the storage, indexing, and transactional aspects, ensuring data
integrity and performance.
Architecture Diagram
Architecture Explanation
MSSQL database has 3 major components.
1. Protocol Layer
The layer responsible to manages the communication between client and the database engine.
2. Relational Engine
The layer responsible to process the query, i.e. parsing, creating execution plan and optimizing.
3. Storage Engine
This layer is responsible to collaborate with relational engine and return processed data for final
result.
Let’s go into the details of each phase.
1. Protocol Layer
In SQL Server, the Server Network Interface (SNI) is a component that facilitates the communication between
SQL Server and its clients over a network. SQL server uses TDS (Tabular Data Stream) protocol to transfer data
between the server and the client applications. It defines the format of requests and responses, enabling
communication for queries, data retrieval, and other database operations over a network. TDS is used for SQL
Server communication via network protocols such as TCP/IP and Named Pipes.
How it works?
1. Shared Memory
2. TCP/IP
Client and MSSQL server are remote to each other, that is both are in different servers
3. Named Pipe
Client and MSSQL servers are in the same physical location and are connected via LAN.
2. Relational Engine
Relational Engine is also known as Query processor. Relational engine is responsible to execute the queries
by requesting the data stored in Storage Engine and processing the results that are returned.
1. CMD Parser
The user request received from protocol layer first reaches to the CMD parser and it verifies the query. It
has 3 operations to do.
2. Optimizer
Optimizer is responsible to create execution plan for the query and find the cheapest cost-effective
execution plan.
DDL Statements like CREATE,ALTER are not optimized, instead they are compiled into an internal form.
Query cost is calculated based on the factors like CPU Usage, Memory Usage and I/O needs.
MSSQL Optimizer works on inbuilt exhaustive/heuristic algorithms, the primary goal is to minimize query
runtime.
For some cases, there will be only one practical, workable plan for a query known as Trivial plan.
Optimizer need not to invest more time and utilize resources to find any other optimized plan. If no
Trivial plan found, phase 1 starts.
This includes the search for Simple/Complex plans for a query. For a statistical analysis, optimizer uses
the data of columns and indexes involved in the query. If a simple plan is not found for the query, more
complex plan will be searched.
If none of the strategies work, optimizer searches for a parallel processing possibility. This depends
on the configuration of the machine.
If this step is also not found as useful, final optimization step will be started to execute the query in
its best way.
3. Query Executor
The query executor is responsible for executing the query plan generated by the query optimizer. After the
SQL query is parsed and a query execution plan is created, the query executor carries out the actual data
retrieval, manipulation, or modification operations specified in the query and send the final result to the
end user.
Query executor calls Access Method, which refers to the techniques or algorithms used by the SQL Server
query processor to retrieve data from tables or indexes.
3. Storage Engine
The storage engine is a core component responsible for managing how data is stored, retrieved, and modified
on disk.
1. Data pages
Data is physically stored in the form of data pages, with each page having a size of 8kb [This is the smallest
storage unit in SQL Server]. Data pages can be mainly of 3 types.
Index pages - These pages do not contain the actual data but contain the index key values and pointers to
the corresponding rows.
Text/Image pages - These pages are used to store large object data types (LOBs), such as text, ntext, image,
varchar(max), nvarchar(max), varbinary(max), xml, etc.
2. Extents
3. Files
Types of files:
1. Primary file
• Every database contains one primary file.
• This stores all information related to Table, Indexes, Triggers etc.…
• This file is created with the installation of SQL Server database and is of .mdf extension.
2. Secondary file
• Secondary files can be created to store user-specific data.
• Extension is .ndf
3. Log files
• Log files are also known as WAL (Write Ahead Logs)
• This is used for Transaction management and recovery purposes.
• Extension is .ldf [Inside the log files, transaction logs get stored in files called .vlf (Virtual log
files) and no fixed size is set for vlf files]
File a.ldf
Log files
File b.ndf
1. Access Method
• It acts an interface between query executor and Buffer Manager/Transaction logs.
• The first action of access method is to determine whether the SQL Query is
o SELECT Statement
o Non-SELECT Statement
2. Buffer Manager
It is responsible for managing the in-memory cache of database pages, which significantly improves
database performance by reducing the need for frequent disk I/O operations.
1. Plan cache
If a query is already executed and is available in plan cache (Soft parsing) : When a query comes for
execution, buffer manager checks if execution plan is already available in the plan cache. If yes, then
it is used for execution.
First time execution of a query (Hard parsing) : If a query is executing for the first time, it’s execution
plan will be stored in the plan cache. This will ensure faster availability when the same query comes
for execution next time.
2. Buffer pool
The Buffer Pool is like a big storage area in memory where SQL Server keeps a mix of different types
of data it needs to work with, like:
• Table data (the actual rows from tables),
• Indexes (used to speed up searches),
• Metadata (information about the database itself).
3. Data cache
The Data Cache is a smaller part of the Buffer Pool. It specifically stores table data pages—the actual
rows of data in the tables that users are working with.
4. Dirty pages
In SQL Server, a dirty page refers to a page in the buffer pool that has been modified in memory but
has not yet been written (or "flushed") to disk.
3. Transaction Manager
4. Log Manager
The Log Manager is responsible for managing the transaction log, which records all changes made to
the database.
Logs have Log sequence number with the transaction ID and data modification record.
5. Lock Manager
The Lock Manager is responsible for controlling access to database resources to ensure Isolation
between concurrent transactions.
1. Checkpoint
The Checkpoint process is designed to ensure data durability and consistency in SQL Server. It writes
all dirty pages (modified pages in memory) to disk in order to synchronize the transaction log with the
data files.
Runs automatically at regular intervals (e.g., every 60 seconds) or manually with the CHECKPOINT
command.
2. Lazy writer
The Lazy Writer process is responsible for managing memory in SQL Server, specifically the buffer pool.
It ensures that SQL Server maintains efficient memory usage and doesn't run out of memory when
the buffer pool becomes full. It works to free up space in the buffer pool by writing dirty pages to disk
in order to make room for new data that needs to be cached.