Adbms Unit 1
Adbms Unit 1
Adbms Unit 1
Advanced DBMS
Course code: M21DES212
Agenda
Importance of the subject Data
Prerequisites Database
Objectives DBMS
Course Content RDBMS
Course Outcome DBMS - Storage System
IMPORTANCE OF THE COURSE
2. database relationships
3. memory management in OS
OBJECTIVES
UNIT 1:
Overview of Storage and Indexing
Memory hierarchy: RAID,Disk space management, Buffer manager: Files of records; Page
formats and record format, Structured Indexing, Data on external storage; File organizations
and Indexing, Index data structures; Comparison of file organizations; Indexes and
performance tuning. Intuition for tree indexes; Indexed sequential access method; B+trees ,
Hash-Based Indexing.
UNIT 2:
Overview of Query Evaluation, External Sorting and Relational Query Optimizer
The system catalog, Introduction to operator evaluation; Algorithm for relational operations;
Introduction to query optimization; When does a DBMS sort data? A simple two-way merge
sort; External merge sort, Evaluating Relational Operators The Selection operation; General
selection conditions; The Projection operation; The Join operation; The Set operations;
Aggregate operations; The impact of buffering..
COURSE CONTENT CONTD..
UNIT 3: Concurrency Control: Serializability and Transaction processing:
Enforcing, Serializability by Locks, Locking Systems With Several, Lock Modes,
Architecture for a Locking Scheduler Managing . Transaction processing: Introduction
of transaction processing, Advantagess and Disadvantagess of transaction processing
system, online transaction processing system, resolving deadlock, Transaction
management in multi-database system, long duration transaction, high-performance
transaction system.
UNIT 4: Parallel and Distributed Databases and XML data
Architectures for parallel databases; Parallel query evaluation; Parallelizing individual
operations; Parallel query optimizations; Introduction to distributed databases;
Distributed DBMS architectures; Storing data in a Distributed DBMS; Information
retrival and XML data: Colliding Worlds: Databases, IR, and XML, Introduction to
Information Retrieval, Indexing for Text Search, Web Search Engines, Managing Text in
a DBMS, A Data Model for XML, XQuery: Querying XML Data. Mobile databases,
Multimedia databases, geographic databases, temporal databases, biological
databases
COURSE OUTCOME
Text book/s:
1. Raghu Ramakrishnan and Johannes Gehrke: Database Management
Systems, 3rd Edition, McGraw-Hill,
2003[Chapters:8,9,10,11,12,13,14,22,23,27,29]
References:
1. Michael Rosenblum and Dr. Paul Dorsey,” PL/SQL FOR
DUMMIES”,WILLEY Publications 2006
2. Elmasri and Navathe: Fundamentals of Database Systems,5th Edition,
Pearson Education, 2007.
3. Conolly and Begg: Database Systems, 4th Edition, Pearson Education,
2002.
DEFINITION
Data
Database
A DATABASE MANAGEMENT SYSTEM
Drawback of File System
DBMS - STORAGE SYSTEM
Databases are stored in file formats, which contain records. At physical level,
the actual data is stored in electromagnetic format on some device. These
storage devices can be broadly categorized into three types −
Primary Storage - The memory storage that is directly accessible to the CPU
comes under this category.
Secondary Storage − Secondary storage devices are used to store data for
future use or as backup.
DBMS - STORAGE SYSTEM
Data
Database
DBMS
RDBMS
DBMS - Storage System
QUIZ
1-The storage device that uses rigid, 2-Which of the following is the
permanently installed magnetic disks secondary storage device that uses a
to store data is long plastic strip coated with a
magnetic material as recording
1. Floppy medium?
2. Permanent disk
3. Optical disk 1. Compact disk
4. Hard disk 2. Hard disk
3. Magnetic tape
ANSWER: Hard disk
4. None of the above
ANSWER: Magnetic tape
LECTURE -2
OBJECTIVE
Is an arrangement of several
disks, organized to increase
performance and improve
reliability of the resulting storage
system.
When incorporating redundancy into a disk array design, we have to make two
choices.
Disk Array
Data striping
Reliability
RAID
Redundancy
RAID 0
RAID 1
QUIZ
OBJECTIVE
Advantages of RAID 2:
• Uses one designated drive to store parity.
• It uses the hamming code for error detection.
Disadvantages of RAID 2:
• It requires an additional drive for error detection.
LEVELS OF REDUNDANCY:
Level 3: Bit~Interleaved Parity
RAID 3 consists of byte-level striping with dedicated parity. the parity
information is stored for each disk section and written to a dedicated parity
drive.
In case of drive failure, the parity drive is accessed, and data is reconstructed
from the remaining devices. Once the failed drive is replaced, the missing
data can be restored on the new drive.
data can be transferred in bulk. Thus high-speed data transmission is
possible.
LEVELS OF REDUNDANCY:
Level 3: Bit~Interleaved Parity
Advantages of RAID 3:
• Data is regenerated using parity drive.
• It contains high data transfer rates.
• Data is accessed in parallel.
Disadvantages of RAID 3:
• It required an additional drive for parity.
• It gives a slow performance for operating on small sized files.
LEVELS OF REDUNDANCY:
Level 4: Block Interleaved Parity
RAID 4 consists of block-level stripping with a parity disk. Instead of
duplicating data, the RAID 4 adopts a parity-based approach.
This level allows recovery of at most 1 disk failure due to the way parity
works. if more than one disk fails, then there is no way to recover the
data.
Level 3 and level 4 both are required at least three disks to implement
RAID.
SUMMARY
RAID-2
Advantages and Disadvantages of RAID 2
RAID-3
Advantages and Disadvantages of RAID 3
RAID-4
QUIZ
OBJECTIVE
RAID-2 RAID-5
Advantages and Advantages and
Disadvantages of RAID 2 Disadvantages of RAID 5
RAID-3 RAID-6
Advantages and Advantages and
Disadvantages of RAID 3 Disadvantages of RAID 6
RAID-4 RAID-1+0
LEVELS OF REDUNDANCY:
Level 5: Block-Interleaved Distributed Parity
RAID 5 is a slight modification of the RAID 4 system. The only difference is
that in RAID 5, the parity rotates among the drives.
It consists of block-level striping with DISTRIBUTED parity.
Same as RAID 4, this level allows recovery of at most 1 disk failure. If more
than one disk fails, then there is no way for data recovery.
LEVELS OF REDUNDANCY:
Level 5: Block-Interleaved
Distributed Parity
Advantages of RAID 5:
• Cost effective and provides high performance.
• parity is distributed across the disks in an array.
• It is used to make the random write performance better.
Disadvantages of RAID 5:
• Disk failure recovery takes longer time as parity has to be calculated from all
available drives.
• This level cannot survive in concurrent drive failure.
LEVELS OF REDUNDANCY:
Level 6: P+Q Redundancy
This level is an extension of RAID 5. It contains block-level stripping with 2
parity bits.
In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using
RAID 5, and RAID 1. When your disks fail, you need to replace the failed disk
because if simultaneously another disk fails then you won't be able to recover
any of the data, so in this case RAID 6 plays its part where you can survive two
concurrent disk failures before you run out of options.
LEVELS OF REDUNDANCY:
RAID-5
Advantages and Disadvantages of RAID 5
RAID-6
Advantages and Disadvantages of RAID 6
RAID-1+0
QUIZ
The RAID level which mirroring is done Which one of the following is not a
along with stripping is secondary storage?
a) RAID 1+0 a) magnetic disks
b) RAID 0 b) magnetic tapes
c) RAID 2 c) ram
d) Both RAID 1+0 and RAID 0 d) none of the mentioned
Answer:d Answer:c
LECTURE -5
OBJECTI
VE
RAID-5 DISK SPACE MANAGEMENT
Advantages and Keeping Track of Free
Disadvantages of RAID 5 Blocks
RAID-6 OS File Systems to Manage
Disk Space
Advantages and
Disadvantages of RAID 6
RAID-1+0
DISK SPACE MANAGEMENT
The lowest level of software in the DBMS architecture called the disk
space manager, manages space on disk.
Abstractly, the disk space manager supports the concept of a page
as a unit of data and provides commands to allocate or deallocate a
page and read or write a page.
DISK SPACE MANAGEMENT
KEEPING TRACK OF FREE BLOCKS:
The disk space manager keeps track of which disk blocks are in use.
blocks are initially allocated sequentially on disk, subsequent allocations and
deallocations could in general create 'holes.‘
One way to keep track of block usage is to maintain a. list of free blocks.
second way is to maintain a bitmap with one bit for each disk block,
which indicates whether a block is in use or not.
DISK SPACE MANAGEMENT
OBJECTIVE
Suppose that the database contains 1 million pages, but only 1000 pages of
main memory are available for holding.
Because all the data cannot be brought into main memory at one time, the
DBMS must bring pages into main memory as they are needed and, in the
process, decide what existing page in main memory to replace to make space
for the new page.
The policy used to decide which page to replace is called the replacement
policy.
BUFFER MANAGER
The buffer manager is the software layer responsible for bringing pages from
disk to main memory as needed.
The main memory pages in the buffer pool are called frames
BUFFER MANAGER
Higher levels of the DBMS code can be written without worrying about
whether data pages are in memory or not; they ask the buffer manager for the
page, and it is brought into a frame in the buffer pool if it is not already there.
the higher-level code that requests a page must also release the page when it
is no longer needed, by informing the buffer manager, so that the frame
containing the page can be reused.
The higher-level code must also inform the buffer manager if it modifies the
requested page; the buffer manager then makes sure that the change is
propagated to the copy of the page on disk.
BUFFER MANAGER
The buffer manager maintains some book keeping information and two
variables for each frame in the pool: pin_count and dirty.
The number of times that the page currently in a given frame has been
requested but not released-the number of current users of the page is
recorded in the pin_count variable for that frame.
The Boolean variable dirty indicates whether the page has been modified
since it was brought into the buffer pool from disk.
BUFFER MANAGER
Initially, the pin_count for every frame is set to 0, and the dirty bits are turned
off. When a page is requested the buffer manager does the following:
1. Checks the buffer pool to see if some frame contains the requested page and,
if so, increments the pin_count of that frame. If the page is not in the pool, the
buffer manager brings it in as follows:
(a) Chooses a frame for replacement, using the replacement policy, and
increments its pin_count.
(b) If the dirty bit for the replacement frame is on, writes the page it contains to
disk (that is, the disk copy of the page is overwritten with the contents of the
frame).
(c) Reads the requested page into the replacement frame.
BUFFER MANAGER
2.Returns the (main memory) address of the frame containing the requested
page to the requestor.
Incrementing pin_count is often called pinning.
the pin_count of the frame containing the requested page is decremented.
This is called unpinning the page.
SUMMARY
Buffer manager
Replacement policy
pin_count and dirty.
Pinning
Unpinning
QUIZ
OBJECTIVE
the way pages are stored on disk and brought into main memory, to the
way pages are used to store records and organized into logical collections
or files.
Supported operations on a heap file include Create and destroy files, insert a
record, delete a record with a given rid, get a record with a given rid, and scan
all records in the file.
We must keep track of the pages in each heap file to support scans, and we
must keep track of pages that contain free space to implement insertion
efficiently
FILES OF RECORDS
Implementing Heap Files
There are two alternative ways to
maintain this information.
1. Linked List of Pages:
One possibility is to maintain a heap
file as a doubly linked list of pages.
The DBMS can remember where the
first page is located by maintaining a
table containing pairs of
(heap_file_name, page_Laddr) in a
known location on disk.We call the
first page of the file the header page.
FILES OF RECORDS
OBJECTIVE
Record Arrangement
fixed length records
fixed length records Problem and
solution
Variable length Records
Variable length Records Problem and
solution
QUIZ
1-Storing a separate copy of the 2-A unit of storage that can store
database at multiple locations is ? one or more records in a hash file
A) Data Replication organization is denoted as
OBJECTI
VE
Record Arrangement Index
fixed length records Primary Index
fixed length records Problem Secondary Index
and solution
Clustering Index
Variable length Records
Variable length Records
Problem and solution
STRUCTURED INDEXING
We know that data is stored in the form of records. Every record has a
key field, which helps it to be recognized uniquely.
1. Primary Index
Primary index is defined on an ordered data file. The data file is
ordered on a key field.
Index
Primary Index
Secondary Index
Clustering Index
QUIZ
OBJECTI
VE
Index Data on External Storage
Primary Index Magnetic tapes
Secondary Index Page
Clustering Index cost of page I/O
DATA ON EXTERNAL STORAGE
POINTS TO REMEMBER
Disks:
• Can retrieve random page at fixed cost
• But reading several consecutive pages is
much cheaper than reading them in random
order.
Tapes (magnetic tapes):
• Can only read pages in sequence
• Cheaper than disks, used for archival storage.
(Archival storage: data that may not be actively
needed)
DATA ON EXTERNAL STORAGE
POINTS TO REMEMBER
Page:
The unit of information read from or written to disk is a page.
Answer: C Answer: D
LECTURE -11
OBJECTI
VE
Data on External Storage File
Magnetic tapes File organization
Page Index data structures
cost of page I/O Search key
Data Reference
Type of Index data
structures
FILE ORGANIZATION
The File is a collection of records. Using the primary key, we can access
the records. The type and frequency of access can be determined by the
type of file organization which was used for a given set of records.
FILE ORGANIZATION
File organization is used to describe the way in which the records are stored
in terms of blocks, and the blocks are placed on the storage medium.
FILE ORGANIZATION
The first approach to map the database to the file is to use the
several files and store only one fixed length record in any given file.
Files of fixed length records are easier to implement than the files of
variable length records.
INDEX DATA STRUCTURES
1- The first column is the Search key that contains a copy of the
primary key or candidate key of the table. These values are stored in
sorted order so that the corresponding data can be accessed quickly.
Note: The data may or may not be stored in sorted order.
File
File organization
Index data structures
Search key
Data Reference
Type of Index data
structures
QUIZ
OBJECTI
VE
File Comparison of file
organizations
File organization
heap file
Index data structures
Clustered B+ tree
Search key
Heap file with an
Data Reference
unclustered B+ tree
Type of Index data structures
Heap file with an
unclustered hash
COMPARISON OF FILE
ORGANIZATIONS
Comparison of file
organizations
heap file
Clustered B+ tree
Heap file with an
unclustered B+ tree
Heap file with an
unclustered hash
QUIZ
OBJECTI
VE
Comparison of file Tree Structured index
organizations Tree Structured index
heap file ISAM (indexed sequential
Clustered B+ tree access method)
• The data entries are arranged in sorted order by search key value
and a hierarchical search data structure is maintained.
TREE-BASED INDEXING:
2. B+ Trees
Both supports effective range
searches
TREE-BASED INDEXING:
ISAM
– it is static index structure that is effective when the file is not
frequently updated.
– This method is not suitable for a file that grows and shrinks a
lot.
TREE-BASED INDEXING:
B + Trees A dynamic structure that adjusts to changes in the file
gracefully.
1-Which of the reasons will force 2-Which of the following is not a XML
you to use XML data model in storage option ?
SQL Server ?
A) Native storage as XML data type
A) Your data is sparse or you do
B) Mapping between XML and
not know the structure of the
relational storage
data
C) Small object storage
B) Your data represents
containment hierarchy D) None of the Mentioned
C) Order is inherent in your data Answer: C
D) All of the Mentioned
Answer: D
LECTURE -14
OBJECTI
VE
Tree Structured index ISAM
Tree Structured index ISAM structure
ISAM (indexed sequential leaf pages
access method)
Non leaf pages
B+ Trees
ISAM (INDEX SEQUENTIAL
ACCESS METHOD)
Data entries of the ISAM index are in the leaf of the tree and
additional overflow pages chained to some leaf pages.
ISAM
Non-leaf
Pages
Leaf
Pages
Overflow
page
Primary pages
ISAM (INDEX SEQUENTIAL
ACCESS METHOD)
When a file is created all leaf pages are allocated sequentially and sorted on
the search key value.
If there are several inserts to the file (but is there is no space ) then additional
pages are needed because the index is static (these pages are called Overflow
pages).
SUMMARY
ISAM
ISAM structure
leaf pages
Non leaf pages
QUIZ
OBJECTIVE
ISAM B+ Tree
ISAM structure Problem with ISAM
leaf pages Hash-Based Indexing
Non leaf pages
B+ TREES: DYNAMIC INDEX
STRUCTURE
• A static structure such as the ISAM index suffers from the problem
that long overflow chains can develop as the file grows, leading to
poor performance.
• This problem motivated the development of more flexible, dynamic
structures that adjust gracefully to inserts and deletes.
B+ TREES: DYNAMIC INDEX
STRUCTURE
Deletion:
}
insertion: In both operations
tree is balanced P
0
K
1 P1 K 2 P
2
K m Pm
used to quickly find records that have a given search key value.
B+ Tree
Problem with ISAM
Hash-Based Indexing
QUIZ
1-What are the leaf nodes in a B+ tree? 2-Dynamic hashing is also called
as _________
a) The topmost nodes
a) Extended hashing
b) The bottommost nodes
b) Extendable hashing
c) The nodes in between the top and bottom nodes
d) None of the mentioned c) Static hashing
d) Movable hashing
Answer:b
Answer:b
EXERCISE
A) Software cost
B) Software complexity
C) Slow Response
D) Modular growth
Answer: D
EXERCISE
A) The same DBMS is at each node and each DBMS works independently.
B) The same DBMS is at each node and a central DBMS coordinates database
access.
A transaction manager is ?
Answer: D
EXERCISE
Answer: D
EXERCISE
A) The same DBMS is used at each location and data are not distributed
across all nodes.
B) The same DBMS is used at each location and data are distributed across all
nodes.
C) A different DBMS is used at each location and data are not distributed
across all nodes.
D) A different DBMS is used at each location and data are distributed across
THANK YOU