Tech DBMS

XI.
Technologies for DBMSs
Buffer Management
Reliability Control
Physical Access Structures
Query Optimization
Physical Database Design
DB Technology -- 1
Technologies for D ˚ BMSs

Ideally, we’d like to have a DBMS support the
following:
Query optimizer - selects a data access
strategy;
Access Methods Manager - executes the
strategy using an
9 Relational Storage System, or
9 Object Manager.
Buffer Manager - manages page accesses;
Reliability Manager - manages failures;
Concurrency Control Manager - manages
interferences due to multi-user data access.
DB Technology -- 2
1
Buffer Management
A buffer is a section of main memory pre-allocated
to a DBMS for disk I/O.
The buffer is organized in pages, of a size equal or
multiple of the disk input/output blocks used by the
operating system;
The size of a page can range from a few Kbytes to
about a hundred Kbytes.
Access times for main memory are generally six
orders of magnitude faster than access times for
secondary memory
Sometimes a buffer can store the entire database
(main memory database)
DB Technology -- 3
Buffer Manager Principles

The buffer manager deals with I/O of pages between
main mamory and disk storage.
Manager supports primitive operations such as fix,
use, unfix, flush and force.
The principles and policies of buffer management
are similar to those of main memory management for
operating systems, for example:
9 Data locality: currently referenced data has a
greater probability of being referenced in the
immediate future;
9 80-20 Rule: On average, 80% of the applications
access 20% of available data.
DB Technology -- 4
2
Buffer Manager Organization
DB Technology -- 5
Primitive Buffer Operations

fix -- loads a page into the buffer, requires a read from
disk only when the page is not already in the buffer; after
the operation, the page is loaded and valid, i.e.,
allocated to an active transaction; a pointer to the page is
returned to the transaction.
use -- used by a transaction to gain access to a page
previously loaded in memory, confirming its presence in
the buffer and its status as a valid page
unfix -- signals the end of use of a page by a transaction;
force -- synchronously transfers a page from to disk.
flush -- transfers invalid (unused) pages to secondary
memory asynchronously and independently of active
transactions
DB Technology -- 6
3
fix
Searches the buffer for requested page, if found
and unused, its address is returned; else the page
is read from disk and its address is returned.
If the requested page is in the disk, a buffer page is
selected for loading; if the buffer page is free, just
load; else buffer page is sent to disk (victim) by
invoking flush, then reads reads the requested
page from disk and returns the page address.
DB Technology -- 7
Buffer Management Policies

The steal policy allows the buffer manager to select
an active page allocated to another transaction as a
victim. No-steal policy disallows this possibility.
The force policy requires that all active pages of a
transaction are written to disk before committing. The
no-force policy delegates the writing of pages of a
transaction to the buffer manager.
The no-steal/no-force policies are generally
preferred by the DBMSs.
There is also the possibility of ‘anticipating’ the
loading and unloading times of the pages, by means
of pre-fetching and pre-flushing policies.
DB Technology -- 8
4
Buffer Managers and File Systems
The file system is responsible for managing the data
structures stored on disk, also the current state of disk
use. It must identify which disk blocks are free and which
are allocated to files
A DBMS uses a file system for the following functions:
9 The creation (create) and removal (delete) of a file;
9 The opening (open) and closing (close) of a file;
9 read(fileid,block,buffer) for the direct access
to a block of a file which is copied to the buffer page;
9 read_seq(fileid,f-block,count,f-buffer)
for sequential access to a fixed number (count) of
blocks of a file;
9 The dual primitives write and write_seq.
DB Technology -- 9
Reliability Control System

Responsible for executing the transactional
commands begin transaction (B), commit
work (C), rollback work (A, for abort), also the
primitives for recovery after malfunctions warm
restart and cold restart.
Ensures atomicity and durability.
Uses as main data structure the log; this is a
sequential file written in a stable memory. which
registers the operations carried out by the DBMS.
DB Technology -- 10
5
Reliability
Control
System
Architecture
DB Technology -- 11
Stable Memory
Stable memory is a memory system that is failure-
resistant.
There is no such thing in practice; however,
replication and robust writing protocols can bring the
probability close to zero.
A failure of stable memory is assumed as
catastrophic and impossible, at least in this context.
Stable memories usually come in different forms:
9 a tape unit -- very reliable, but slow;
9 a pair of devices of different kind (e.g., a tape and
a disk) -- better alternative;
9 two mirrored disk units.
DB Technology -- 12
6
Log Organization
The actions carried out by the various transactions are
recorded in the log in chronological order (written
sequentially to the top block).
There are two types of log records:
9 Transaction records -- keep track of the activities
of the activities of a transaction by recording all
operations B(T), I(T,O,AS), D(T,O,BS),
U(T,O,BS,AS), C(T), A(T);
9 System records -- keep track of system-wide
operations dump (rare) and checkpoint (more
frequent).
DB Technology -- 13
Log Content
DB Technology -- 14
7
Undo and Redo
Undo of an action on an object O:
9 update, delete: copy BS into the object O;
9 insert: delete the object O;
Redo of an action on an object O:
9 insert, update: copy AS into the object O;
9 delete: re-insert the object O.
Idempotence of undo and redo: an arbitrary
number of undos and redos of the same action is
equivalent to the carrying out of such actions only
once:
9undo(undo(A)) = undo(A)
9redo(redo(A)) = redo(A)
DB Technology -- 15
Checkpoint
A checkpoint is carried out periodically, recording
active transactions and updating secondary memory
relative to all completed transactions
9 After having initiated a checkpoint, no commit
operations are accepted by the active transactions
9 The checkpoint ends by synchronously writing
(forcing) a checkpoint record CK(T1,T2,..Tn),
which contains the identifiers of all active
transactions;
9 In this way, the effects of all committed transactions
are permanently recorded in the database.
DB Technology -- 16
8
Dump
A dump is a complete copy of the database, which is
normally created when the system is not operative.
The copy is stored in the stable memory, typically on
tape, and is called backup.
A dump record DUMP in the log signals the presence
of a backup made at a given time and identifies the
file or device where the dump took place.
DB Technology -- 17
Transactional Rules
The reliability control system must follow two rules:
WAL rule (write-ahead log): before-state parts of the
log records must be written in the log before carrying
out the corresponding operation on the database.
Commit-Precedence rule: after-state parts of the
log records must be written in the log before carrying
out the commit.
DB Technology -- 18
9
Transaction Outcome
The atomic outcome of a transaction is established
at the time when it writes the commit record in the
log synchronously, using the force primitive
9 Before this event, a failure is followed by the undo
of the actions, thereby reconstructing the original
state of the database;
9 After this event, a failure is followed by the redo of
the actions carried out to reconstruct the final
state of the transaction;
abort records can be simply written asynchronously
into the top block of the log.
DB Technology -- 19
Protocol for Writing Log (I)
The transaction first records B(T), then carries out its

update actions by writing first the log records and then the
pages of the database, changing BS values to AS values.
This way, at commit all the pages of the database
modified by the transaction are already written on disk.
This scheme does not require redo operations.
DB Technology -- 20
10
Protocols for Writing Log (II)
The writing of log records precedes that of the

actions on the database.
This scheme does not requires undo operations
either.
DB Technology -- 21
Protocols for Writing Log (III)
Commonly used, the writing in the database once

protected by the appropriate writing on the log, can
happen at any time with regard to the writing of the
commit record in the log.
This scheme allows the buffer manager to optimize
the execution of flush operations.
However, it requires both undo and redo.
DB Technology -- 22
11
Failures in Database Systems
A system failure is caused by a software bug, (e.g.,
of the operating system), or an interruption of the
functioning of hardware (e.g., due to loss of power.)
/ Causes loss of the contents of main memory (and
thus of all buffers); but no loss of the contents of
secondary memory ( ).-
A device failure is a failure of a secondary memory
device (e.g., a disk head crash).
/ Causes loss of disk contents, but no loss of stable
storage (i.e., the log)
There are two types of restart protocols:
9 Warm restart used after system crashes;
9 Cold restart used after device crashes.
DB Technology -- 23
Fail-Stop Failure Model
DB Technology -- 24
12
Restart Process
When the system restarts after a failure, it has to decide
what to do with the transactions that were executing at the
time of the failure.
Transactions are classified as:
9
Completed -- when their actions were recorded in
stable storage before the failure;
9
Committed but possibly not completed -- whose
actions must be redone, e.g., because of data lost in
the buffer;
9
Not committed -- whose actions have to be undone
because the transaction did not commit.
DB Technology -- 25
Warm Restart
Consists of four sequential steps:
Trace back the log until the most recent checkpoint
record.
Construct the UNDO set (i.e.,transactions to be
undone) and the REDO set (transactions to be
redone).
Trace back the log until the first action of the ‘oldest’
transaction in the two sets, UNDO and REDO, is found,
and undo all the actions of the transactions in the
UNDO set.
Trace forward the log and redo all the actions of the
transactions in the REDO set.
DB Technology -- 26
13
Warm Restart Protocol
The protocol guarantees:
Atomicity in the sense that all the transactions in
progress at the time of failure leave the database
either in the initial state or in the final one
Durability in the sense that all pages of transactions
in progress are written to secondary memory.
DB Technology -- 27
Cold Restart
Now parts of the database may have been damaged.
Consists of three steps:
During the first phase, the du mp operation is
executed and damaged parts are selectively copied
from the database. The most recent du mp record in
the log is then accessed.
The log is traced forward. The actions on the
database and the commit or abort actions are
applied as appropriate to the damaged parts of the
database. The situation preceding the failure is thus
restored.
Finally, a warm restart is conducted.
DB Technology -- 28
14
Physical Access Structures
Used for efficient storage and manipulation of data.
For each access structure there are access
methods, i.e. software modules providing data
access and manipulation primitives for each physical
access structure.
Each DBMS has a limited number of access methods
available.
Sometimes, access methods can be used directly
from an application without going through a DBMS.
We will consider sequential, hash-based, and index-
based access structures and methods.
DB Technology -- 29
Architecture
of the
Access
Manager
DB Technology -- 30
15
Tuples within Pages
Each access method has its own page organization.
Each page includes:
9 An initial part and a final part with control
information for the file system and the access
method;
9 A page dictionary, containing pointers to
elementary data contained in the page;
9 A useful part, which contains data; page dictionary
and useful data grow as opposing stacks within the
page;
9 A checksum, to verify that page data are valid.
Tree structures have a different page organization.
DB Technology -- 31
Tuples within Pages
DB Technology -- 32
16
Page Manager Primitives
Insert or update a tuple -- may require a reorganization
of the page if there is sufficient space to manage the
extra bytes introduced).
Delete a tuple -- often carried out by marking the tuple
as ‘invalid’.
Access a tuple field -- first identify tuple by means of
its key or its offset, then locate field by its offset and
length.
Note that some page managers do not allow the
separation of a tuple in different pages.
Also, when all the tuples have the same size, the page
dictionary is simplified.
DB Technology -- 33
Sequential Access Structures

Characterized by a sequential arrangement of tuples
in the secondary memory.
In an entry-sequenced organization, the sequence
of the tuples is dictated by their order of entry.
In an array organization, the tuples are arranged as
in an array, and their positions depend on the values
of an index (or indexes.)
In a sequentially ordered organization, the
sequence of the tuples depends on the value
assumed by a field in each tuple that controls the
ordering, known as a key field.
DB Technology -- 34
17
Entry-Sequenced Sequential
Structure
This is optimal for carrying out sequential reading
and writing operations.
Uses all the blocks available in a file and all spaces
within each block.
Primitives:
9 Access with sequential scan operation;
9 Data loading and insertion happen at the end of
the file and in sequence;
9 Deletes are normally implemented by leaving
space unused;
9 Updates can cause problems because they
increase the file size.
DB Technology -- 35
Array Sequential Structure

This is useful only when the tuples are of fixed length.
An array sequential structure consists of n adjacent
blocks, each block with m available tuple slots.
Each tuple is assigned a numeric index i and is
placed in the i-th position of the array.
Primitives:
9 Accessed via read-ind (at a given index value).
9 Insertions always happen at the end of the file
(indices are obtained simply by increasing a
counter);
9 Deletions create free slots;
9 Updates are done in place.
DB Technology -- 36
18
Ordered Sequential Structure
Each tuple has a position based on its key value.
Historically, ordered sequential structures were used
on sequential devices (tapes) by batch processes;
data were located into the main file, modifications
were collected in differential files, and files were
periodically merged (this practice no longer applies.)
Key problem: insertions or updates that increase
physical space require reordering the whole file.
Options to avoid global reorderings:
9 Leave some free slots at the time of first loading.
This is followed by ‘local reordering’ operations;
9 Use an overflow file, where new tuples are
inserted into blocks linked to form an overflow
chain.
DB Technology -- 37
Hash-Based Structures
Ensure efficient associative access to data, based on
the value of a key field, composed of an arbitrary
number of attributes of a given table.
A hash-based structure has B blocks (often adjacent.)
The access method makes use of a hash algorithm,
which, once applied to the key, returns a value
between zero and B-1.
This value is interpreted as the position of the block in
the file, and used both for reading and writing tuples
to the file
This is the most efficient technique for queries with
equality predicates, but inefficient for queries with
interval predicates. (...why?)
DB Technology -- 38
19
Details on Hash-Based Structures
Primitive interface: hash(fileId,key):BlockId
The implementation of hash consists of two parts
9 folding, transforms the key values into positive
integers, uniformly distributed over a large range;
9 hashing, transforms the positive integer into a
number between 0 and B - 1.
This technique works best if the file is made larger
than necessary. Let:
9 T the number of tuples expected for the file;
9 F the average number of tuples in each page;
then a good choice for B is T/(0.8 x F), using only
80% of the available space in the file.
DB Technology -- 39
Collisions
Occur when the same block number is returned by
the algorithm to several different keys.
Collisions are addressed by adding an overflow chain
to each page. There are extra costs in scanning the
chain.
The following table computes the length of the
overflow chain:
1 2 3 5 10 (F)
.5 0.5 0.177 0.087 0.031 0.005
.6 0.75 0.293 0.158 0.066 0.015
.7 1.167 0.494 0.286 0.136 0.042
.8 2.0 0.903 0.554 0.289 0.110
.9 4.495 2.146 1.377 0.777 0.345
T/(F xB)
DB Technology -- 40
20
Tree Structures (B Trees)
These are the most frequently used physical storage
structures for relational DBMSs.
Tree structures give associative access (based on a
value of a key, consisting of one or more attributes)
without constraints on the physical location of the
tuples.
Note that the primary key of a relation and the key of
a tree structure storing this relation need not be the
same.
DB Technology -- 41
Tree Structure Organization

Each tree has a single root node, a number of
intermediate nodes, and a number of leaf nodes.
The links between nodes are defined by pointers.
Each node coincides with a page or block of the file
system and buffer manager levels.
In general, each node has a large number of
descendants (fan out), and therefore the majority of
pages are leaf nodes.
In a balanced tree, the lengths of the paths from the
root node to the leaf nodes are (almost) equal; this way,
the access times to the information contained in the
tree are almost constant (and optimal.)
DB Technology -- 42
21
Tree Structure Organization
DB Technology -- 43
Node Contents
Each intermediate node contains F keys (in
lexicographic order) and F + 1 pointers.
Each key Kj , 1 ≤ j ≤ F, is followed by a pointer Pj; K1 is
preceded by a pointer P0
Each pointer addresses a sub-tree:
9 P0 addresses the sub-tree with keys less than K1;
9 PF addresses the sub-tree with keys greater than or
equal to KF ;
9 Pj, 0 < j < F, addresses the sub-tree with keys
included in the interval Kj ≤ K < Kj+1 .
The value F + 1 is called the fan-out of the tree.
DB Technology -- 44
22
Search Technique
Assume that we are looking for the tuple with key V.
At each intermediate node, if V < K follow the pointer P ,
1 0
else if V KF follow the pointer PF, otherwise, follow the
pointer Pj such that Kj £ V < Kj+1
The leaf nodes of the tree can be organized in two ways:
9 In key-sequenced trees the tuples are contained in the
leaves of the tree;
9 In indirect trees leaf nodes contain pointers to the
tuples, that can be allocated by means of any other
‘primary’ mechanism (e.g.,, entry-sequenced or hash.)
Sometimes the index structure is sparse; you may then
locate a key value close to the value being sought, then
carry out a sequential search.
DB Technology -- 45
Tuple Insertion Operations

An insertion operation searches up to a leaf page.
When there is no free space on a page during an
insertion, a split operation is necessary, allocating
two leaf nodes in place of one.
A split causes an increment in the number of
pointers on the next (higher) level in the tree; this
may cause further split operations.
DB Technology -- 46
23
Tuple Deletion Operations
A deletion operation can always be carried out.
When the key of a to-be-deleted tuple is in an
intermediate node, it is best to recover the next
key and substitute for it.
When a deletion operation leaves two adjacent
pages underused, they are conbined into a
single page through a merge operation.
A merge causes a decrement in the number of
pointers on the next (higher) level in the tree and
may cause further merges higher up in the tree.
DB Technology -- 47
Split and Merge Operations

The update of the value of a key field is treated as
the deletion of its initial value followed by the
insertion of a new value.
The careful use of split and merge operations makes
it possible to maintain the tree balanced, with an
average occupancy of each node higher than 50%.
DB Technology -- 48
24
Split
and
Merge
Operations
DB Technology -- 49
B and B+ Trees
B+ trees are B trees, but their leaf nodes are linked into
a chain, which connects them in the order imposed by
the key.
B+-trees support interval queries efficiently, and are
widely used to implement relational DBMSs.
On the other hand, B trees don’t support sequential
access method of leaf nodes.
For B-trees, Intermediate nodes use two pointers for
each key value Ki:
9 One points directly to the block that contains the
tuple corresponding to Ki;
9 The other points to a subtree with keys greater
than Ki and less than Ki+1.
DB Technology -- 50
25
Example of B+ Tree
DB Technology -- 51
Example
of B Tree
DB Technology -- 52
26
Query Optimization
A query optimizer is an important component of a
DBMS architecture. An optimizer takes as input an
SQL query and produces an access plan in ‘object’
or ‘internal’ format.
Optimization steps:
9 Lexical, syntactic and semantic analysis, using the
data dictionary;
9 Translation into an internal, algebraic form;
9 Algebraic optimization through transformations;
9 Cost-based optimization;
9 Code generation using the physical data access
methods provided by the DBMS.
DB Technology -- 53
Approaches to Query Compilation

Queries are often used multiple times.
Compile and store -- the query is compiled once and
carried out many times;
9 The internal code is stored in the database,
together with an indication of the dependencies of
the code on the particular versions of tables and
indexes of the database.
9 On changes (e.g., new index), the compilation of
the query is invalidated and needs to be redone.
Compile and go -- involves compilation for immediate
execution, with no storage of the compiled query.
DB Technology -- 54
27
Relation Profiles
A profile contains quantitative information about a table
and is stored in the data dictionary:
9 the cardinality (number of tuples) of each table T;
9 the size in bytes of each tuple of T;
9 the size in bytes of each attribute A in T;
9 the number of distinct values of each attribute A in T;
j
9 the minimum and maximum values of each attribute A

j
j
in T;
A profile is periodically recomputed through appropriate
system primitives (e.g., update statistics).
Profiles are used in cost-based optimization to estimate
the size of intermediate results produced by the query
execution plan.
DB Technology -- 55
Internal Representation of Queries

Queries are internally represented as trees whose:
9 Leaves correspond to physical data structures
(tables, indexes, files)
9 Intermediate nodes represent data access
operations that are supported by the physical
structures
These operations include sequential scans,
orderings, indexed accesses and various types of
join.
DB Technology -- 56
28
Scan Operation
Performs a sequential access of all the tuples in a
table, and performs on each tuple various operations
of an algebraic or other nature:
9 Projection of a set of attributes;
9 Selection on a predicate (e.g., Ai = v);
9 Sort (ordering);
9 Insertions, deletions, and updates of the tuples that
are being accessed during the scan.
Primitives:
open, next, read, modify,insert, delete, close
DB Technology -- 57
Sort operation
Various methods for ordering the data contained in
the main memory, typically represented by means of
a record array.
DBMSs typically cannot load all data in the buffer;
thus, they separately order and then merge data
sets, using available buffer space.
DB Technology -- 58
29
Indexed Access
Indexes are created by the database administrator to
favor queries when they include:
9 simple predicates (of the type Ai = v)
9 interval predicates (of the type v1 ≤ Ai ≤ v2)
These predicates are supported by the index.
For a conjunction of predicates the DBMS chooses
the most selective supported predicate for primary
access, and evaluates the other predicates in main
memory.
For a disjunction of predicates, a scan is needed if
any of them are not supported; if they are all
supported, indexes can be used only with duplicate
elimination.
DB Technology -- 59
Join Methods
Joins are the most costly operation for a DBMS.
There are various methods for join evaluation,
among them we note nested-loop, merge-scan and
hashed join operations.
DB Technology -- 60
30
Nested-Loop Join
Join attribute
DB Technology -- 61
Merge-Scan Join
Both table must be ordered according to the join
attribute
DB Technology -- 62
31
Hashed Join
A hashing function h on the join attributes is used to
store both tables
DB Technology -- 63
Cost-Based Optimization
This is an optimization problem, whose decisions are:
9 The data access operations to execute, e.g., scan
vs index access;
9 The order of operations, e.g., the order of join
operations;
9 Options to be assigned to each operation, e.g.,,
choosing a join method;
9 Use of parallelism and pipelining to improve
performance.
Further options are available in selecting a plan for
distributed database query processing.
DB Technology -- 64
32
Approaches to Query Optimization
The approach often used includes the following:
9 Make use of approximate cost formulas;
9 Construct a decision tree, where each node
corresponds to a choice; each leaf node
corresponds to a specific execution plan ;
9 Assign to each plan a cost:
Ctotal = CI/Ox nI/O + Ccpu x ncpu
9 Choose the one with the lowest cost, based on
operations research (branch and bound.)
The optimizers should obtain good solutions, where
cost is near that of an optimal solution.
DB Technology -- 65
Example
of
Decision
Tree
DB Technology -- 66
33
Physical Database Design
The final phase of the database design process.
This phase takes as input the logical schema of a
database along with predictions for the application
load.
The phase produces as output a physical schema,
made up of the definitions of the relations and of the
physical access structures used, along with related
parameters.
Depends on the features that are supported by the
underlying DBMS.
DB Technology -- 67
Physical Design
for Relational DBMSs
Most relational DBMSs support index and tuple
clustering.
Physical design can be reduced to the activity of
identifying indices for each relation.
The key of a relation is usually involved in selection or
join operations; for this reason, each relation normally
supports a unique index on the primary key.
Other indices are added to support frequent queries.
If the performance is unsatisfactory, we can tune the
system by adding or dropping indices.
It is useful to check how indices are used by queries, by
using the show plan command.
DB Technology -- 68
34
Indices in SQL
Commands for creating or dropping an index are not
part of standard SQL, but their syntax is rather
similar in all DBMSs.
Syntax of the commands for the creation and
dropping of an index:
9 create [unique] index IndexName on
TableName(AttributeList)
9 drop index IndexName
DB Technology -- 69
35

Tech DBMS

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tech DBMS

Uploaded by

Copyright:

Available Formats

XI.

Technologies for DBMSs

Technologies for D ˚ BMSs

Buffer Manager Principles

Primitive Buffer Operations

Buffer Management Policies

Reliability Control System

Protocol for Writing Log (I)

 The transaction first records B(T), then carries out its

 The writing of log records precedes that of the

Protocols for Writing Log (III)

 Commonly used, the writing in the database once

Fail-Stop Failure Model

Tuples within Pages

Sequential Access Structures

Array Sequential Structure

Tree Structure Organization

Tuple Insertion Operations

Split and Merge Operations

Approaches to Query Compilation

9 the minimum and maximum values of each attribute A

Internal Representation of Queries

You might also like

The transaction first records B(T), then carries out its

The writing of log records precedes that of the

Commonly used, the writing in the database once