Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
34 views

Lecture 01 - File Storage - Part 1

Uploaded by

chamikanimnajith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Lecture 01 - File Storage - Part 1

Uploaded by

chamikanimnajith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

CSCI 21062-Advanced Database Management

Systems
Lecture 01

Miss. Subodha Rathnayake


Dept. of Software Engineering, Faculty of Computing and Technology ,
University of Kelaniya
Course Aim/Intended Learning Outcomes
Demonstrate theoretical knowledge on advanced database management
LO1
system design principles and techniques

Recognize and perform common database administration tasks, such


LO2 as database monitoring, performance tuning

Demonstrate how a DBMS guarantee transaction atomicity and


LO3 recovery from system crash

LO4 Critically assess new developments in database technology


.
Course Content
● File Organization
● Indexing structures
● Transaction Processing
● Concurrency Control
● Recovery Techniques
● Query Processing and Optimization
● Physical Design
● Physical Design and Tuning
Assessment Strategy

Continuous Assessment Final Assessment


20% 80%

Class test - 10% Theory (%) -80%

Assignments - 10% Practical (%) - 0%

Other (%)(specify) - 0%
References/Reading Materials

1. Ramez Elmasri and Shamkant B. Navathe (2017), Fundamentals of


Database Systems, , Addison-Wesley Longman Publishing Co., 7th edition.

2. Silberschatz A., Korth H.F. and Sudarshan S. (2019), Database System


Concepts, Abraham McGraw Hill, 7th edition

3. Raghu Ramakrishnan , and Johannes Gehrke (2014), Database


Management Systems, McGraw Hill India, 3rd Edition

4. Connolly T. Begg C (2015), Database Systems A practical Approach to


Design, Implementation, and Management, Pearson Education, 6th edition.

5. Rob T. Coronel C (2007), Database Systems Design, Implementation and


Management, Thomson Learning, 7th edition.
Lesson 01- File Organization
Part 1

-Describe how a DBMS organize files of data records on disk

-Describe the parameters of a disk

-Compare heap files and sequential files

-Describe the hashing techniques

-Describe and use Extendible hashing and Linear Hashing (Part 1 and Part 2)

-Compare Extendible hashing and Linear Hashing (Part 1 and Part 2)


Introduction
• The collection of data that makes up a computerized database must be stored
physically on some computer storage medium

• The DBMS software can then retrieve, update, and process this data as needed.

• Computer storage media form a storage hierarchy

• Storage hierarchy
Primary storage

Storage
Secondary storage
hierarchy

Tertiary storage
Memory Hierarchies and Storage Devices
● Primary storage:
- Can be operated directly by the CPU.
- Provides fast access to data but is of limited storage capacity

● Secondary storage and tertiary storage:


○ Includes magnetic disks, optical disks and tapes.
○ Hard disk drives are classified as secondary storage, whereas removable media
are considered tertiary storage.
○ Have a large capacity, less cost and provide slower access to data than primary
storage
○ Cannot be directly processed by the CPU. Must be copied into primary storage.
Hardware Description of Disk Devices
Hardware Description of Disk Devices
Parameters of Disks
● Most important disk parameter: random access time for accessing a disk block

● Random access time: the time required ,


○ to locate an arbitrary disk block, given its block address, and then
○ to transfer the block between the disk and a main memory buffer.

● Three time components to consider:

Seek time (s)

Rotational delay (rd)

Block transfer time (btt)


Parameters of Disks
● Seek time (s):
- To transfer a disk block given its address, the disk controller must first
mechanically position the read/write head on the correct track.
- The time required for this is the seek time.
- Disk manufacturers provides an average seek time in milliseconds (10 to 60 msec)

● Rotational delay (rd):


Once when the read/write head is at the correct track, the time taken for the
required block to rotate into position under the read/write head.

○ If the speed of disk rotation is p revolutions per minute (rpm) then the average
rotational delay rd
○ rd = (1/2)*(1/p) min = (60 *1000)/(2*p) msec
Parameters of Disks

● Block transfer time (btt)


○ The time needed to transfer the data in the block. This depends on the block
size, track size, and the rotational delay.
○ If the transfer rate for the disk is tr bytes/msec and the block size is B bytes
then
○ btt= B/tr msec
○ Track size = 50 Kbytes and p is 3600 rpm then the tr in bytes/msec is
■ transfer rate tr = (50*1000)/(60*1000/3600) = 3000 bytes/msec
Btt= B/3000 msec
time to a round = total time / number of rounds

when rotate a round can transfer a track of data


Parameters of Disks
● The average time needed to find and transfer a block, given its block address
○ (s + rd+ btt) msec

● To transfer consecutively k noncontiguous blocks that are on the same cylinder


○ (s + k*(rd+ btt)) msec

● To transfer consecutively k contiguous blocks that are on the same track/cylinder


○ (s + rd+ (k*btt)) msec
Placing File Records on Disk
● A file: a sequence of records
● If every record in the file has exactly the same size, the file is said to be made up of
fixed length records.
● If different records in the file have different sizes, the file is said to be made up of
variable-length records.
Placing File Records on Disk (cont’d.)
● Reasons for variable-length records

○ One or more fields have variable length

○ One or more fields are repeating

○ One or more fields are optional

○ File contains records of different types


Record Blocking and Spanned vs. Unspanned
Records
storing the records in blocks

● Records of a file must be allocated to disk blocks


Reason: block is the unit of data transfer between disk and memory

● When the block size is larger than the record size, each block will contain numerous
records.

● Some files may have unusually large records that cannot fit in one block.
Record Blocking and Spanned vs. Unspanned
Records (cont’d.)
● Records of a file must be allocated to disk blocks.
● Suppose that block size is B bytes. For a file of fixed-length records of size R bytes, with
B ≥ R,
○ we can fit Blocking factor bfr = ⎣B/R⎦ records per block,

○ where , ⎣(x)⎦ (floor function) rounds down the number x to an integer

○ In general, R may not divide B exactly, so unused space in each block equal to
B − (bfr * R) bytes
Record Blocking and Spanned vs. Unspanned
Records (cont’d.)
● To utilize the unused space, store part of the record in one block and the rest in
another.
● If consecutive blocks are not used, a pointer at the end of the first block points to
the block which has the rest of the records.
● This organization is called spanned.
● If records are not allowed to cross block boundaries, the organization is called
unspanned.
○ used with fixed-length records having B > R because it makes each record start
at a known location in the block, simplifying record processing.
the differencre between spanned and unspanned is , in spannd store part of records in blocks
Spanned vs. Unspanned Records (cont’d.)

● Unspanned Organization
Spanned vs. Unspanned Records (cont’d.)
● Spanned organization
Spanned vs. Unspanned Records (cont’d.)

● For variable-length records using spanned organization, each block may store a
different number of records.
● In this case, the blocking factor bfr represents the average number of records per
block for the file.
● bfr is used to calculate the number of blocks b needed for a file of r records:
○ b = ⎡(r/bfr)⎤ blocks
○ ⎡(x)⎤ (ceiling function) rounds the value x up to the next integer.
File Headers (file descriptor)

● Contains information about a file that is needed by the system programs that access
the file records
● Includes information
○ To determine the disk address of the file blocks
○ Record format descriptions
Files of Unordered Records
(Heap Files/Pile Files)
● Simplest and the most basic type of organization
● Records are unordered
● New records are inserted at the end of the file
● Can use spanned or unspanned organization
● Inserting a record
○ Very efficient, the last block of the file is copied into the buffer, the new record
is added, and the block is rewritten back to disk.
○ The address of the last block is kept in the file header.
Files of Unordered Records
(Heap Files/Pile Files)
● Searching a record
○ Involves a linear search, expensive procedure
○ Requires searching (b/2) blocks on average.
○ If no records or several records satisfy the search then all b blocks must be
searched.
● Deleting a record
○ Find the block, copy the block into the buffer, delete the record, rewrite the
block to the disk.
○ Leaves wasted storage space
○ Expensive operation
Files of Unordered Records
(Heap Files/Pile Files)
Techniques used in deleting
● Use the space when inserting new records
○ This need to keep track of empty locations.

● Have an extra bit/byte, called a deletion marker stored with each record
○ When a record is deleted set a deletion marker to a certain value.
○ When searching consider only valid records
○ Require periodic reorganization to reclaim unused space.
Files of Ordered Records (Sorted Files)
● Records are ordered based on the values of one of the fields –ordering field
● Ordering field may be the key field
● Reading the records in the order of the ordering key value becomes efficient.
● Finding the next record from the current one is efficient. The next record is in the
same block unless it is the last record
Files of Ordered Records (Sorted Files)
● Searching records in the ordered field is very efficient
○ Can use binary search
● Usually access log2(b) blocks
● Does not provide any advantage for random or ordered access based on non
ordering field.
● Inserting records is expensive.
○ Find the correct position
○ Must make space in the file to insert the record.
Files of Ordered Records (Sorted Files)
Techniques
● Keep unused space in each block
○ Once the space is used, original problem resurfaces.

● Create a temporary unordered file


○ This is know as overflow or transaction file
○ Actual ordered file is known as main or master file.
○ New records are inserted at the end of the overflow file.
○ Periodically the overflow file is sorted and merged with the master file.

● Insertion becomes efficient, but searching complex.


○ Overflow file must be searched using linear search.
○ But for application that do not require up to date information, searching the
○ overflow file could be ignored.
Hashing Techniques

● Primary file organization based on hashing.


● This organization is called a hash file.
● Provides fast access to records on certain search conditions.
● The search condition must be an equality condition on a single field. This field is
called the hash field.
● If the hash field is the key field then it is called the hash key.
Hashing Techniques
● Provides a function h, called a hash function (randomizing function), that is applied
to the hash field value of a record and yields the address of the block in which the
record is stored.

● A search for the record is done in the main memory buffer

● For most records, we need only a single block access

● Hashing is also used as an internal search structure.


Hashing Techniques
External Hashing for disk files
● Hashing for disk files is called external hashing.

Static hashing
● Address space is made of buckets, holds multiple records
● A bucket is either one disk block or a cluster of contiguous block.
● The hashing function maps a key into a relative bucket number.
● Using the file header the bucket number is converted into the corresponding disk
block address.
Hashing Techniques
Hashing Techniques

Dynamic hashing
● Extendible hashing
○ Uses a type of directory -an array of 2d bucket address .
○ d is known as the global depth.
● The directory consists of an array size 4 (22)
● Each element in the array is a pointer to the bucket.
To locate an entry (record) apply the hash function to the search field, and take the last
two bits ( because d is 2) of its binary representation.
Eg:
● Locate the entry with hash value 5.
○ Binary code 101
○ Look at directory entry 01 and follow the pointer.
● Insert a data entry with hash value 13.
○ Take the binary code 1101
○ Consider the last two bits 01
○ Go to bucket B
○ Page has space insert it.
● Insert an entry with hash value 20
○ Binary code 10100
○ Last two bits 00
○ Led to bucket A
○ Bucket A is full.
○ Spilt the bucket by allocating a new bucket. and redistribute the contents
across the old bucket and its split image.
•We need three bits to discriminate between A and A2.
•The directory has only enough slots to store all two bit patterns.
•Double the directory
Binary codes
20- 10100
4- 100
12- 1100
32- 100000
16-10000
● Whether splitting a bucket necessitates a directory doubling.
○ Not always
○ Eg:
○ Insert entry of hash value 9 (001)
○ Belongs to bucket B
○ Bucket is full split the bucket and use directory elements 001 and 101.
○ But if A and A2 are full and an insertion forces a bucket split, the directory is
doubled.
Binary codes
1- 001
9- 1001
5- 101
21- 10101
13-1101
● To determine whether a directory doubling is needed, we maintain a local depth for
each bucket.
● If a bucket whose local depth is equal to the global depth is split, the directory must
be doubled.
● Initially all local depth are equal to the global depth
● Increase the global depth by 1, each time the directory is doubled.
● Increate by 1 the local depth of the split bucket and assign this same local depth to
its split image.
Exercise 01
Consider the following extendible hashing index
diagram. You may assume that the entries in the
index are hash values.

(i) What is the value of the global depth?


(ii) What are the values of P and Q
(iii) What is the use of the local depth?
(iv) Is it possible to give the last entry that was
inserted into the index? Give reasons
(v) Is it possible to say that a split has occurred
due to an insertion of any entry? If so what could
be the possible entry that was inserted.
Exercise 01
(vi) List the steps to search the hash value 20.
Thank You!

You might also like