Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
102 views

SQL Server Architecture - A basic guide to MSSQL

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

SQL Server Architecture - A basic guide to MSSQL

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SQL Server Architecture and Components

SQL Server

SQL Server, created by Microsoft, is a relational database management system (RDBMS) designed to store, retrieve,
and manage large volumes of information. It uses Structured Query Language (SQL) to communicate with
databases, making it an effective tool for enterprises with a wide range of data requirements.

Working model of SQL Server

SQL Server follows a client-server architecture. The client, often an application or user interface, sends SQL queries
to the server. The server, in turn, processes these queries, executing operations such as data retrieval, insertion,
or modification. The SQL Server engine manages the storage, indexing, and transactional aspects, ensuring data
integrity and performance.

Architecture Diagram

Architecture Explanation
MSSQL database has 3 major components.

1. Protocol Layer
The layer responsible to manages the communication between client and the database engine.
2. Relational Engine
The layer responsible to process the query, i.e. parsing, creating execution plan and optimizing.
3. Storage Engine
This layer is responsible to collaborate with relational engine and return processed data for final
result.
Let’s go into the details of each phase.

1. Protocol Layer
In SQL Server, the Server Network Interface (SNI) is a component that facilitates the communication between
SQL Server and its clients over a network. SQL server uses TDS (Tabular Data Stream) protocol to transfer data
between the server and the client applications. It defines the format of requests and responses, enabling
communication for queries, data retrieval, and other database operations over a network. TDS is used for SQL
Server communication via network protocols such as TCP/IP and Named Pipes.

How it works?

1. The client sends a request through the network to the server.


2. The SNI on the SQL Server side handles the connection and forwards the query to the SQL Server engine.
3. The server processes the query and sends the results back via the same protocol (e.g., TCP/IP).

Following are the three types of architectures used in protocol layer.

1. Shared Memory

Client and MSSQL server run on the same server.

2. TCP/IP

Client and MSSQL server are remote to each other, that is both are in different servers

3. Named Pipe

Client and MSSQL servers are in the same physical location and are connected via LAN.

2. Relational Engine
Relational Engine is also known as Query processor. Relational engine is responsible to execute the queries
by requesting the data stored in Storage Engine and processing the results that are returned.

Relational Engine also has 3 components.

1. CMD Parser

The user request received from protocol layer first reaches to the CMD parser and it verifies the query. It
has 3 operations to do.

1. Syntactic check – Check the syntax of the query


2. Semantic check – Checking whether column name, table name are exists in the schema.
3. Query Tree – Generates different execution trees in which the query can be run. All the different
query trees will have the same desired output.

2. Optimizer

Optimizer is responsible to create execution plan for the query and find the cheapest cost-effective
execution plan.

Optimization is done primarily for DML statements such as SELECT,INSERT,UPDATE.

DDL Statements like CREATE,ALTER are not optimized, instead they are compiled into an internal form.

Query cost is calculated based on the factors like CPU Usage, Memory Usage and I/O needs.
MSSQL Optimizer works on inbuilt exhaustive/heuristic algorithms, the primary goal is to minimize query
runtime.

At a high level, optimizer sends a query through 3 phases.

1. Phase 0 (Search for Trivial Plan)

This phase is known as pre-optimization stage.

For some cases, there will be only one practical, workable plan for a query known as Trivial plan.

Optimizer need not to invest more time and utilize resources to find any other optimized plan. If no
Trivial plan found, phase 1 starts.

2. Phase 1 (Search for Transaction processing plans)

This includes the search for Simple/Complex plans for a query. For a statistical analysis, optimizer uses
the data of columns and indexes involved in the query. If a simple plan is not found for the query, more
complex plan will be searched.

3. Phase 2 (Parallel processing and optimization)

If none of the strategies work, optimizer searches for a parallel processing possibility. This depends
on the configuration of the machine.

If this step is also not found as useful, final optimization step will be started to execute the query in
its best way.

3. Query Executor

The query executor is responsible for executing the query plan generated by the query optimizer. After the
SQL query is parsed and a query execution plan is created, the query executor carries out the actual data
retrieval, manipulation, or modification operations specified in the query and send the final result to the
end user.

Query executor calls Access Method, which refers to the techniques or algorithms used by the SQL Server
query processor to retrieve data from tables or indexes.

3. Storage Engine
The storage engine is a core component responsible for managing how data is stored, retrieved, and modified
on disk.

The storage structure of SQL Server.

1. Data pages

Data is physically stored in the form of data pages, with each page having a size of 8kb [This is the smallest
storage unit in SQL Server]. Data pages can be mainly of 3 types.

Data pages – Stores actual user data in Tables and Indexes.

Index pages - These pages do not contain the actual data but contain the index key values and pointers to
the corresponding rows.

Text/Image pages - These pages are used to store large object data types (LOBs), such as text, ntext, image,
varchar(max), nvarchar(max), varbinary(max), xml, etc.
2. Extents

Logical grouping of data pages is known as Extents.

Grouping of 8 pages constitutes an Extent.

Extents can be of two types.

1. Uniform Extent – Extent with group of same pages.


2. Mixed Extent – Extent with different types of pages.

3. Files

Group of extents called as files.

Types of files:

1. Primary file
• Every database contains one primary file.
• This stores all information related to Table, Indexes, Triggers etc.…
• This file is created with the installation of SQL Server database and is of .mdf extension.

2. Secondary file
• Secondary files can be created to store user-specific data.
• Extension is .ndf

3. Log files
• Log files are also known as WAL (Write Ahead Logs)
• This is used for Transaction management and recovery purposes.
• Extension is .ldf [Inside the log files, transaction logs get stored in files called .vlf (Virtual log
files) and no fixed size is set for vlf files]

File a.ldf
Log files

File b.ndf

Database File group 1

(Secondary file) File c.ndf

Primary file File a.mdf


Components of Storage Engine

1. Access Method
• It acts an interface between query executor and Buffer Manager/Transaction logs.
• The first action of access method is to determine whether the SQL Query is
o SELECT Statement
o Non-SELECT Statement

Transaction Non - Select ACCESS Select Buffer


Manager METHOD Manager

2. Buffer Manager

It is responsible for managing the in-memory cache of database pages, which significantly improves
database performance by reducing the need for frequent disk I/O operations.

The components of Buffer Manager are as follows.

1. Plan cache

A place where execution plans are stored.

If a query is already executed and is available in plan cache (Soft parsing) : When a query comes for
execution, buffer manager checks if execution plan is already available in the plan cache. If yes, then
it is used for execution.

First time execution of a query (Hard parsing) : If a query is executing for the first time, it’s execution
plan will be stored in the plan cache. This will ensure faster availability when the same query comes
for execution next time.

2. Buffer pool

The Buffer Pool is like a big storage area in memory where SQL Server keeps a mix of different types
of data it needs to work with, like:
• Table data (the actual rows from tables),
• Indexes (used to speed up searches),
• Metadata (information about the database itself).

3. Data cache

The Data Cache is a smaller part of the Buffer Pool. It specifically stores table data pages—the actual
rows of data in the tables that users are working with.
4. Dirty pages

In SQL Server, a dirty page refers to a page in the buffer pool that has been modified in memory but
has not yet been written (or "flushed") to disk.

3. Transaction Manager

The Transaction Manager is responsible for controlling the flow of transactions.

4. Log Manager

The Log Manager is responsible for managing the transaction log, which records all changes made to
the database.
Logs have Log sequence number with the transaction ID and data modification record.

5. Lock Manager

The Lock Manager is responsible for controlling access to database resources to ensure Isolation
between concurrent transactions.

Below background processes coordinates the operations:-

1. Checkpoint

The Checkpoint process is designed to ensure data durability and consistency in SQL Server. It writes
all dirty pages (modified pages in memory) to disk in order to synchronize the transaction log with the
data files.

Runs automatically at regular intervals (e.g., every 60 seconds) or manually with the CHECKPOINT
command.

2. Lazy writer

The Lazy Writer process is responsible for managing memory in SQL Server, specifically the buffer pool.
It ensures that SQL Server maintains efficient memory usage and doesn't run out of memory when
the buffer pool becomes full. It works to free up space in the buffer pool by writing dirty pages to disk
in order to make room for new data that needs to be cached.

Runs continuously in the background when memory is under pressure.

You might also like