0% found this document useful (0 votes)

34 views

Lecture 01 - File Storage - Part 1

Uploaded by

chamikanimnajith

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Lecture 01 - File Storage - Part 1

Uploaded by

chamikanimnajith

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

CSCI 21062-Advanced Database Management

Systems
Lecture 01

Miss. Subodha Rathnayake

Dept. of Software Engineering, Faculty of Computing and Technology ,
University of Kelaniya
Course Aim/Intended Learning Outcomes
Demonstrate theoretical knowledge on advanced database management
LO1
system design principles and techniques

Recognize and perform common database administration tasks, such

LO2 as database monitoring, performance tuning

Demonstrate how a DBMS guarantee transaction atomicity and

LO3 recovery from system crash

LO4 Critically assess new developments in database technology

.
Course Content
● File Organization
● Indexing structures
● Transaction Processing
● Concurrency Control
● Recovery Techniques
● Query Processing and Optimization
● Physical Design
● Physical Design and Tuning
Assessment Strategy

Continuous Assessment Final Assessment

20% 80%

Class test - 10% Theory (%) -80%

Assignments - 10% Practical (%) - 0%

Other (%)(specify) - 0%
References/Reading Materials

1. Ramez Elmasri and Shamkant B. Navathe (2017), Fundamentals of

Database Systems, , Addison-Wesley Longman Publishing Co., 7th edition.

2. Silberschatz A., Korth H.F. and Sudarshan S. (2019), Database System

Concepts, Abraham McGraw Hill, 7th edition

3. Raghu Ramakrishnan , and Johannes Gehrke (2014), Database

Management Systems, McGraw Hill India, 3rd Edition

4. Connolly T. Begg C (2015), Database Systems A practical Approach to

Design, Implementation, and Management, Pearson Education, 6th edition.

5. Rob T. Coronel C (2007), Database Systems Design, Implementation and

Management, Thomson Learning, 7th edition.
Lesson 01- File Organization
Part 1

-Describe how a DBMS organize files of data records on disk

-Describe the parameters of a disk

-Compare heap files and sequential files

-Describe the hashing techniques

-Describe and use Extendible hashing and Linear Hashing (Part 1 and Part 2)

-Compare Extendible hashing and Linear Hashing (Part 1 and Part 2)

Introduction
• The collection of data that makes up a computerized database must be stored
physically on some computer storage medium

• The DBMS software can then retrieve, update, and process this data as needed.

• Computer storage media form a storage hierarchy

• Storage hierarchy
Primary storage

Storage
Secondary storage
hierarchy

Tertiary storage
Memory Hierarchies and Storage Devices
● Primary storage:
- Can be operated directly by the CPU.
- Provides fast access to data but is of limited storage capacity

● Secondary storage and tertiary storage:

○ Includes magnetic disks, optical disks and tapes.
○ Hard disk drives are classified as secondary storage, whereas removable media
are considered tertiary storage.
○ Have a large capacity, less cost and provide slower access to data than primary
storage
○ Cannot be directly processed by the CPU. Must be copied into primary storage.
Hardware Description of Disk Devices
Hardware Description of Disk Devices
Parameters of Disks
● Most important disk parameter: random access time for accessing a disk block

● Random access time: the time required ,

○ to locate an arbitrary disk block, given its block address, and then
○ to transfer the block between the disk and a main memory buffer.

● Three time components to consider:

Seek time (s)

Rotational delay (rd)

Block transfer time (btt)

Parameters of Disks
● Seek time (s):
- To transfer a disk block given its address, the disk controller must first
mechanically position the read/write head on the correct track.
- The time required for this is the seek time.
- Disk manufacturers provides an average seek time in milliseconds (10 to 60 msec)

● Rotational delay (rd):

Once when the read/write head is at the correct track, the time taken for the
required block to rotate into position under the read/write head.

○ If the speed of disk rotation is p revolutions per minute (rpm) then the average
rotational delay rd
○ rd = (1/2)*(1/p) min = (60 *1000)/(2*p) msec
Parameters of Disks

● Block transfer time (btt)

○ The time needed to transfer the data in the block. This depends on the block
size, track size, and the rotational delay.
○ If the transfer rate for the disk is tr bytes/msec and the block size is B bytes
then
○ btt= B/tr msec
○ Track size = 50 Kbytes and p is 3600 rpm then the tr in bytes/msec is
■ transfer rate tr = (50*1000)/(60*1000/3600) = 3000 bytes/msec
Btt= B/3000 msec
time to a round = total time / number of rounds

when rotate a round can transfer a track of data

Parameters of Disks
● The average time needed to find and transfer a block, given its block address
○ (s + rd+ btt) msec

● To transfer consecutively k noncontiguous blocks that are on the same cylinder

○ (s + k*(rd+ btt)) msec

● To transfer consecutively k contiguous blocks that are on the same track/cylinder

○ (s + rd+ (k*btt)) msec
Placing File Records on Disk
● A file: a sequence of records
● If every record in the file has exactly the same size, the file is said to be made up of
fixed length records.
● If different records in the file have different sizes, the file is said to be made up of
variable-length records.
Placing File Records on Disk (cont’d.)
● Reasons for variable-length records

○ One or more fields have variable length

○ One or more fields are repeating

○ One or more fields are optional

○ File contains records of different types

Record Blocking and Spanned vs. Unspanned
Records
storing the records in blocks

● Records of a file must be allocated to disk blocks

Reason: block is the unit of data transfer between disk and memory

● When the block size is larger than the record size, each block will contain numerous
records.

● Some files may have unusually large records that cannot fit in one block.
Record Blocking and Spanned vs. Unspanned
Records (cont’d.)
● Records of a file must be allocated to disk blocks.
● Suppose that block size is B bytes. For a file of fixed-length records of size R bytes, with
B ≥ R,
○ we can fit Blocking factor bfr = ⎣B/R⎦ records per block,

○ where , ⎣(x)⎦ (floor function) rounds down the number x to an integer

○ In general, R may not divide B exactly, so unused space in each block equal to
B − (bfr * R) bytes
Record Blocking and Spanned vs. Unspanned
Records (cont’d.)
● To utilize the unused space, store part of the record in one block and the rest in
another.
● If consecutive blocks are not used, a pointer at the end of the first block points to
the block which has the rest of the records.
● This organization is called spanned.
● If records are not allowed to cross block boundaries, the organization is called
unspanned.
○ used with fixed-length records having B > R because it makes each record start
at a known location in the block, simplifying record processing.
the differencre between spanned and unspanned is , in spannd store part of records in blocks
Spanned vs. Unspanned Records (cont’d.)

● Unspanned Organization
Spanned vs. Unspanned Records (cont’d.)
● Spanned organization
Spanned vs. Unspanned Records (cont’d.)

● For variable-length records using spanned organization, each block may store a
different number of records.
● In this case, the blocking factor bfr represents the average number of records per
block for the file.
● bfr is used to calculate the number of blocks b needed for a file of r records:
○ b = ⎡(r/bfr)⎤ blocks
○ ⎡(x)⎤ (ceiling function) rounds the value x up to the next integer.
File Headers (file descriptor)

● Contains information about a file that is needed by the system programs that access
the file records
● Includes information
○ To determine the disk address of the file blocks
○ Record format descriptions
Files of Unordered Records
(Heap Files/Pile Files)
● Simplest and the most basic type of organization
● Records are unordered
● New records are inserted at the end of the file
● Can use spanned or unspanned organization
● Inserting a record
○ Very efficient, the last block of the file is copied into the buffer, the new record
is added, and the block is rewritten back to disk.
○ The address of the last block is kept in the file header.
Files of Unordered Records
(Heap Files/Pile Files)
● Searching a record
○ Involves a linear search, expensive procedure
○ Requires searching (b/2) blocks on average.
○ If no records or several records satisfy the search then all b blocks must be
searched.
● Deleting a record
○ Find the block, copy the block into the buffer, delete the record, rewrite the
block to the disk.
○ Leaves wasted storage space
○ Expensive operation
Files of Unordered Records
(Heap Files/Pile Files)
Techniques used in deleting
● Use the space when inserting new records
○ This need to keep track of empty locations.

● Have an extra bit/byte, called a deletion marker stored with each record
○ When a record is deleted set a deletion marker to a certain value.
○ When searching consider only valid records
○ Require periodic reorganization to reclaim unused space.
Files of Ordered Records (Sorted Files)
● Records are ordered based on the values of one of the fields –ordering field
● Ordering field may be the key field
● Reading the records in the order of the ordering key value becomes efficient.
● Finding the next record from the current one is efficient. The next record is in the
same block unless it is the last record
Files of Ordered Records (Sorted Files)
● Searching records in the ordered field is very efficient
○ Can use binary search
● Usually access log2(b) blocks
● Does not provide any advantage for random or ordered access based on non
ordering field.
● Inserting records is expensive.
○ Find the correct position
○ Must make space in the file to insert the record.
Files of Ordered Records (Sorted Files)
Techniques
● Keep unused space in each block
○ Once the space is used, original problem resurfaces.

● Create a temporary unordered file

○ This is know as overflow or transaction file
○ Actual ordered file is known as main or master file.
○ New records are inserted at the end of the overflow file.
○ Periodically the overflow file is sorted and merged with the master file.

● Insertion becomes efficient, but searching complex.

○ Overflow file must be searched using linear search.
○ But for application that do not require up to date information, searching the
○ overflow file could be ignored.
Hashing Techniques

● Primary file organization based on hashing.

● This organization is called a hash file.
● Provides fast access to records on certain search conditions.
● The search condition must be an equality condition on a single field. This field is
called the hash field.
● If the hash field is the key field then it is called the hash key.
Hashing Techniques
● Provides a function h, called a hash function (randomizing function), that is applied
to the hash field value of a record and yields the address of the block in which the
record is stored.

● A search for the record is done in the main memory buffer

● For most records, we need only a single block access

● Hashing is also used as an internal search structure.

Hashing Techniques
External Hashing for disk files
● Hashing for disk files is called external hashing.

Static hashing
● Address space is made of buckets, holds multiple records
● A bucket is either one disk block or a cluster of contiguous block.
● The hashing function maps a key into a relative bucket number.
● Using the file header the bucket number is converted into the corresponding disk
block address.
Hashing Techniques
Hashing Techniques

Dynamic hashing
● Extendible hashing
○ Uses a type of directory -an array of 2d bucket address .
○ d is known as the global depth.
● The directory consists of an array size 4 (22)
● Each element in the array is a pointer to the bucket.
To locate an entry (record) apply the hash function to the search field, and take the last
two bits ( because d is 2) of its binary representation.
Eg:
● Locate the entry with hash value 5.
○ Binary code 101
○ Look at directory entry 01 and follow the pointer.
● Insert a data entry with hash value 13.
○ Take the binary code 1101
○ Consider the last two bits 01
○ Go to bucket B
○ Page has space insert it.
● Insert an entry with hash value 20
○ Binary code 10100
○ Last two bits 00
○ Led to bucket A
○ Bucket A is full.
○ Spilt the bucket by allocating a new bucket. and redistribute the contents
across the old bucket and its split image.
•We need three bits to discriminate between A and A2.
•The directory has only enough slots to store all two bit patterns.
•Double the directory
Binary codes
20- 10100
4- 100
12- 1100
32- 100000
16-10000
● Whether splitting a bucket necessitates a directory doubling.
○ Not always
○ Eg:
○ Insert entry of hash value 9 (001)
○ Belongs to bucket B
○ Bucket is full split the bucket and use directory elements 001 and 101.
○ But if A and A2 are full and an insertion forces a bucket split, the directory is
doubled.
Binary codes
1- 001
9- 1001
5- 101
21- 10101
13-1101
● To determine whether a directory doubling is needed, we maintain a local depth for
each bucket.
● If a bucket whose local depth is equal to the global depth is split, the directory must
be doubled.
● Initially all local depth are equal to the global depth
● Increase the global depth by 1, each time the directory is doubled.
● Increate by 1 the local depth of the split bucket and assign this same local depth to
its split image.
Exercise 01
Consider the following extendible hashing index
diagram. You may assume that the entries in the
index are hash values.

(i) What is the value of the global depth?

(ii) What are the values of P and Q
(iii) What is the use of the local depth?
(iv) Is it possible to give the last entry that was
inserted into the index? Give reasons
(v) Is it possible to say that a split has occurred
due to an insertion of any entry? If so what could
be the possible entry that was inserted.
Exercise 01
(vi) List the steps to search the hash value 20.
Thank You!

Access 2021
No ratings yet
Access 2021
39 pages
Indian Bank Case Study
No ratings yet
Indian Bank Case Study
7 pages
Storing and Querying Blobs and Text Documents in SQL Server
No ratings yet
Storing and Querying Blobs and Text Documents in SQL Server
28 pages
PG Program in Analytics: SQL Exam Questions Time: 1 HR
No ratings yet
PG Program in Analytics: SQL Exam Questions Time: 1 HR
1 page
ComTIA Server+ Study Guide
0% (2)
ComTIA Server+ Study Guide
11 pages
Creating A Microsoft Access Splash Screen
No ratings yet
Creating A Microsoft Access Splash Screen
4 pages
Ict Lab 8
No ratings yet
Ict Lab 8
6 pages
Difference Between Temporary Table and Table Variable in SQL Server
No ratings yet
Difference Between Temporary Table and Table Variable in SQL Server
2 pages
Transfer Devices and Feeders: Industrial Automation
100% (1)
Transfer Devices and Feeders: Industrial Automation
18 pages
Lab Task 02
No ratings yet
Lab Task 02
3 pages
Staff Company Item Unit Sales Delivery
No ratings yet
Staff Company Item Unit Sales Delivery
2 pages
SQL Queries and PL/SQL
No ratings yet
SQL Queries and PL/SQL
92 pages
SQL Server and ASP Net Questions & Answers
No ratings yet
SQL Server and ASP Net Questions & Answers
12 pages
Access Part 2 Bangla Book
0% (1)
Access Part 2 Bangla Book
16 pages
SQL Interview Question Must Learn
No ratings yet
SQL Interview Question Must Learn
18 pages
Database Management Short Notes
No ratings yet
Database Management Short Notes
5 pages
1.+basics of DBMS
0% (1)
1.+basics of DBMS
45 pages
DDL DML New
No ratings yet
DDL DML New
91 pages
Top 100+ SQL Interview Questions and Answers for 2025
No ratings yet
Top 100+ SQL Interview Questions and Answers for 2025
24 pages
Module 09 Advance Query Techniques
No ratings yet
Module 09 Advance Query Techniques
28 pages
Intro VBA
No ratings yet
Intro VBA
39 pages
SQL Syllabus
100% (1)
SQL Syllabus
9 pages
DBMS Practical List PDF
No ratings yet
DBMS Practical List PDF
6 pages
Assignment Chapter 3 PDF
No ratings yet
Assignment Chapter 3 PDF
2 pages
MySQL Books
No ratings yet
MySQL Books
2 pages
Professional Microsoft SQL Server 2014 Integration Services 1st Edition Brian Knight 2024 Scribd Download
100% (4)
Professional Microsoft SQL Server 2014 Integration Services 1st Edition Brian Knight 2024 Scribd Download
65 pages
Database Management Systems
No ratings yet
Database Management Systems
15 pages
Access Tutorial Part 3-1
No ratings yet
Access Tutorial Part 3-1
7 pages
Query by Example QBE Tutorial
No ratings yet
Query by Example QBE Tutorial
16 pages
RDBMS Concepts
100% (1)
RDBMS Concepts
73 pages
Computer Excel Tests
No ratings yet
Computer Excel Tests
4 pages
SQL
100% (1)
SQL
100 pages
BI Projects
No ratings yet
BI Projects
17 pages
Oracle Forms 6i and Reports 6i Training - 080112
No ratings yet
Oracle Forms 6i and Reports 6i Training - 080112
7 pages
Finals Questions CS 4
No ratings yet
Finals Questions CS 4
5 pages
Oracle Datatypes: Data Types For Oracle 8 To Oracle 11g
No ratings yet
Oracle Datatypes: Data Types For Oracle 8 To Oracle 11g
9 pages
ER Diagram Question (Section-10)
No ratings yet
ER Diagram Question (Section-10)
4 pages
Access Tutorial Part 1
No ratings yet
Access Tutorial Part 1
8 pages
SQL Interview
No ratings yet
SQL Interview
5 pages
Topic 5 Database Development
No ratings yet
Topic 5 Database Development
27 pages
Table Partitioning in SQL Server
No ratings yet
Table Partitioning in SQL Server
11 pages
File Handling in
No ratings yet
File Handling in
10 pages
SQL Function Types
No ratings yet
SQL Function Types
61 pages
School Case Study
No ratings yet
School Case Study
4 pages
Computer Application For Business - Module 3: Steps For Mail Merge in Ms Word
No ratings yet
Computer Application For Business - Module 3: Steps For Mail Merge in Ms Word
12 pages
SQL Practice Problems
No ratings yet
SQL Practice Problems
5 pages
Purbanchal University: BCA274CO User Interface Design
No ratings yet
Purbanchal University: BCA274CO User Interface Design
1 page
SQL
No ratings yet
SQL
141 pages
CH 3 - SQL Fundamentals
No ratings yet
CH 3 - SQL Fundamentals
23 pages
A Step-By-Step Guide To Normalization in DBMS With Examples
No ratings yet
A Step-By-Step Guide To Normalization in DBMS With Examples
28 pages
Access 2010 Cheat Sheet
No ratings yet
Access 2010 Cheat Sheet
3 pages
SQL Practice Question Set - 4
No ratings yet
SQL Practice Question Set - 4
1 page
Advantages of An Integrated Database System With Regards To Expanding Website Capability
No ratings yet
Advantages of An Integrated Database System With Regards To Expanding Website Capability
1 page
Logical DB
No ratings yet
Logical DB
8 pages
SAS Access 92
No ratings yet
SAS Access 92
984 pages
Bi Manual
No ratings yet
Bi Manual
66 pages
03 - SQL Server Data Types and Functions
No ratings yet
03 - SQL Server Data Types and Functions
37 pages
Training Assignments: SQL Basics
No ratings yet
Training Assignments: SQL Basics
5 pages
Assignment Access PDF
No ratings yet
Assignment Access PDF
9 pages
The ITIL Process Manual
From Everand
The ITIL Process Manual
James Persse
5/5 (1)
Microsoft Dynamics NAV 7 Programming Cookbook
From Everand
Microsoft Dynamics NAV 7 Programming Cookbook
Rakesh Raul
No ratings yet
Getting Started With Armsim#: Downloading and Installing
No ratings yet
Getting Started With Armsim#: Downloading and Installing
2 pages
9.2.1.6 Lab - Using Wireshark To Observe The TCP 3-Way Handshake
14% (7)
9.2.1.6 Lab - Using Wireshark To Observe The TCP 3-Way Handshake
7 pages
Bluetooth Module HC-05
No ratings yet
Bluetooth Module HC-05
5 pages
To Install Any Guest Operating System Like Linux Using Vmware
0% (1)
To Install Any Guest Operating System Like Linux Using Vmware
4 pages
Et200sp Im 155 6 PN ST Manual en-US en-US
No ratings yet
Et200sp Im 155 6 PN ST Manual en-US en-US
62 pages
Telecommunication Networks 15B11EC611: Dr. Bhagirath Sahu Assistant Professor, JIIT, Noida
No ratings yet
Telecommunication Networks 15B11EC611: Dr. Bhagirath Sahu Assistant Professor, JIIT, Noida
26 pages
All-Products Esuprt Software Esuprt It Ops Datcentr MGMT High-Computing-Solution-Resources White-Papers84 En-Us
No ratings yet
All-Products Esuprt Software Esuprt It Ops Datcentr MGMT High-Computing-Solution-Resources White-Papers84 En-Us
8 pages
Project: Virtual School Management System
No ratings yet
Project: Virtual School Management System
17 pages
Surface Computing: Presented by
No ratings yet
Surface Computing: Presented by
26 pages
Design and Simulate HSRP Protocol Based Network On Packet Tracer
No ratings yet
Design and Simulate HSRP Protocol Based Network On Packet Tracer
6 pages
Activation PDF
No ratings yet
Activation PDF
12 pages
Darktrace Virtualized Enterprise Immune System Deployments
No ratings yet
Darktrace Virtualized Enterprise Immune System Deployments
4 pages
Unit 9 - Week 7: Assignment 7
No ratings yet
Unit 9 - Week 7: Assignment 7
5 pages
Blockchain Unconfirmed Transaction Hack Script 3 PDF Free
No ratings yet
Blockchain Unconfirmed Transaction Hack Script 3 PDF Free
4 pages
Cisco UCS Hardware Compatibility List
No ratings yet
Cisco UCS Hardware Compatibility List
2 pages
Bits g553 Real Time Systems
No ratings yet
Bits g553 Real Time Systems
2 pages
3. Ông Phạm Hoàng Linh, HPE Việt Nam - Bảo vệ dữ liệu và giải pháp phục hồi Ransomware
No ratings yet
3. Ông Phạm Hoàng Linh, HPE Việt Nam - Bảo vệ dữ liệu và giải pháp phục hồi Ransomware
24 pages
OS9 Complete Manual
No ratings yet
OS9 Complete Manual
110 pages
Xerox Workcentre 5225 - NVRAM PDF
100% (1)
Xerox Workcentre 5225 - NVRAM PDF
1 page
ICT Midterm
No ratings yet
ICT Midterm
2 pages
VF-co-cc: Celeron-M/Pentium-M
No ratings yet
VF-co-cc: Celeron-M/Pentium-M
29 pages
Satellite 18001805 1001
No ratings yet
Satellite 18001805 1001
7 pages
Lec 5
No ratings yet
Lec 5
23 pages
TBRD Bkmap Trblshoot
No ratings yet
TBRD Bkmap Trblshoot
432 pages
PK (MCITP) Notes
No ratings yet
PK (MCITP) Notes
90 pages
CSCI 2132 Final Exam Solutions: Faculty of Computer Science
No ratings yet
CSCI 2132 Final Exam Solutions: Faculty of Computer Science
18 pages
VI IPv4 and IPv6 Interoperability
No ratings yet
VI IPv4 and IPv6 Interoperability
14 pages
Computer Networks Application Layer Notes
No ratings yet
Computer Networks Application Layer Notes
23 pages
Geology Database
No ratings yet
Geology Database
52 pages