Disk

Components of a DBMS,
Disks and Files

Last Lecture…
 Advanced SQL
 Any questions?
This Lecture…
 Components of a DBMS…
 Data Storage: Disks and Files…

Structure of a DBMS
Structure of a DBMS…
(contd.)
 DBMSs accept SQL statements
 The queries are parsed and presented to the

optimizer (for query optimization)
 An execution plan is produced after

optimization
 Execution plan consists of a set of relational

operators and extra information
(contd.)
 All data is stored in logical storage called files
 Files and Access Methods provide an

interface to manipulate files
 Files are considered to be a set of pages
 Buffer manager brings pages from disk to

main memory
(contd.)
 Disk Space Manger deals with the
management of space on disk.
 DBMS supports concurrency control and crash

recovery
 This requires users scheduling the transactions

carefully and maintaining a log of all changes to
the database
(contd.)
 DBMS components for transactions
processing include
 Transaction Manager: ensures that
transactions requests/releases locks according
to a appropriate locking protocol
 Lock Manager: Keeps track of locks for each
database object
 Recovery Manager: Maintains the log and
restores the system after a crash
Data Storage: Disks and Files
 DBMS stores information on (“hard”) disks.

 This has major implications for DBMS design!
 READ: transfer data from disk to main memory
(RAM).
 WRITE: transfer data from RAM to disk.
 Both are high-cost operations, relative to in-
memory operations, so must be planned carefully!
Why Not Store Everything in Main
Memory?
 Costs too much. Rs. 9000 will buy you either 256MB
of RAM or 40GB of disk.
 Main memory is volatile. We want data to be saved
between runs. (Obviously!)
 Typical storage hierarchy:
 Main memory (RAM) for currently used data
(primary storage).
 Disk for the main database (secondary storage).
 Tapes for archiving older versions of the data
(tertiary storage).
Memory Hierarchy
Characteristics of Storage
Medium
Cost Speed
Primary Storage
(main memory,
cache, etc.)
Increase Decrease
Secondary
Storage
(magnetic disks,
optical disks, etc.)
Tertiary Storage
(tapes)
Disks
 Secondary storage device of choice.
 Main advantage over tapes: random access vs.
sequential.
 Data is stored and retrieved in units called disk
blocks or pages.
 Unlike RAM, time to retrieve a disk page varies
depending upon location on disk.
 Therefore, relative placement of pages on disk has
major impact on DBMS performance!
Components of a Disk Spindle
Tracks
Disk head
 The platters spin (say, 90rps).
Sector
Track: concentric rings on platters
Cylinder: the set of all tracks with
the same diameter
Sector: each track is divided into
fixed-sized arcs, called sectors
Block: the unit in which data is Platters
Arm movement
written and read from disk;
block size is a multiple of sector size.
Disk heads: move as a unit;
only one head read/write at
any one time. Arm assembly
Arm assembly: move in or out to
position a disk head
on a desired track.
Components of a Disk
 A disk controller is an interface which controls the
disk drive.
 This interface commands to read/write sectors by

moving the arm assembly and transferring data
to/from the disk surfaces
 A well known interface for PCs is SCSI (Small

Computer Storage Interface)
Accessing a Disk Page
 Reading/ writing a disk block is called an
I/O operation:
 seek time (moving arms to position disk head on
track):1-20 ms
 rotational delay (waiting for block to rotate under
head)0-10ms usually less than seek time
 transfer time (actually moving data to/from disk
surface) 1ms per 4KB block
Disk Performance
 Seek time and rotational delay dominate.
 Seek time varies from about 1 to 20msec
 Rotational delay varies from 0 to 10msec
 Transfer rate is about 1msec per 4KB page
 Key to lower I/O cost: reduce seek/rotation

delays! Hardware vs. software solutions?
Disk Performance… (contd.)
 ‘Next’ block concept:
 blocks on same track, followed by
 blocks on same cylinder, followed by
 blocks on adjacent cylinder
 Blocks in a file should be arranged sequentially
on disk (by `next’), to minimize seek and
rotational delay.
 For a sequential scan, pre-fetching several
pages at a time is a big win!
Questions
 Consider a disk with a sector size of 512 bytes, 2000 tracks per
surface, 50 sectors per track, five double-sided platters, and
average seek time of 10 msec.
1. What is the capacity of a track in bytes? What is the capacity of
each surface? What is the capacity of the disk?
2. How many cylinders does the disk have?
3. Give examples of valid block sizes. Is 256 bytes a valid block size?
2048? 51200?
4. If the disk platters rotate at 5400 rpm (revolutions per minutes)
what is the maximum rotational delay?
5. If one track of data can be transferred per revolution, what is the
transfer rate?
RAID-Redundant Array of
Independent Disks
 It is a setup consisting of multiple disks for data
storage.
 They are linked together to prevent data loss

and/or speed up performance.
 Having multiple disks allows the employment of

various techniques like disk striping, disk
mirroring and parity.
RAID… (contd.)
 Disk Array: Arrangement of several disks that gives
abstraction of a single, large disk.
 Goals: Increase performance and reliability.
 Two main techniques:

 Data striping: Data is partitioned; size of a partition is
called the striping unit. Partitions are distributed over

several disks.
 Redundancy: More disks -> more failures. Redundant
information allows reconstruction of data if a disk fails.

RAID… (contd.)
 Data Striping Example…
 D disks = D blocks are transferred simultaneously
 Transfer Rate = D times faster
 An array of disks reduces reliability

 Example
 1 disk (MTTF) ~ 50,000 hours
 100 disks 50,000 hours / 100 ~ 21 days
RAID… (contd.)
 In RAID, redundant information is
stored to increase reliability
 Example.. parity scheme

RAID Levels
 Level 0: No redundancy
 Uses data striping
 Reliability an issue
 Best Write Performance (No writing of redundant
data)
RAID Levels… (contd.)
 Level 1: Mirrored (two identical copies)
 No data striping
 Most expensive solution
 Each disk has a mirror image (check disk)
 Parallel reads, a write involves two disks.
 Maximum transfer rate = transfer rate of one disk
(this level does not stripe the data).

 Space utilization = 50%
Level 0+1: Striping and Mirroring


 Parallel reads, a write involves two disks.

 Maximum transfer rate = aggregate
bandwidth
 Space utilization = 50%
 Level 2: Error-Correcting Codes
 Striping unit = 1 bit
 Redundancy scheme = Hamming Code
 Level 3: Bit-Interleaved Parity

 Striping Unit: One bit. One check disk.
 Each read and write request involves all disks;
disk array can process one request at a time.
 Level 4: Block-Interleaved Parity
 Striping Unit: One disk block. One check
disk.
 Parallel reads possible for small requests,
large requests can utilize full bandwidth
 Writes involve modified block and check
disk
 Level 5: Block-Interleaved Distributed
Parity
 Similar to RAID Level 4, but parity blocks
are distributed over all disks
 Level 6: P+Q Redundancy
 Similar to RAID Level 5
 Two check disks (uses Reed-Solomon
codes)
 Able to recover from 2 failures
Disk Space Management
 Lowest layer of DBMS software manages space on
disk.
 Higher levels call upon this layer to:
 allocate/de-allocate a page
 read/write a page
 Page = 1 disk block

 1 page request = 1 disk block
 Request for a sequence of pages must be satisfied by
allocating the pages sequentially on disk!
Disk Space Management…
(contd.)
 Higher levels don’t need to know how this
is done, or how free space is managed.
 Maintaining free blocks…

 Keeping a list of free blocks & a pointer to
the first free block
 Maintaining a Bitmap
Buffer Management in a DBMS
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK choice of frame dictated

DB by replacement policy
 Data must be in RAM for DBMS to operate on it!
 Table of <frame#, pageid> pairs is maintained.
 For each frame#, 2 variables (pin count: no of current users & dirty bit: page is modified)
When a Page is Requested ...
 The buffer manager checks the buffer pool to see if some frame contains
the requested page.
 If so, (pin count ++) for that frame
 If requested page is not in pool:
 Choose a frame for replacement
 If frame is dirty, write it to disk
 Read requested page into chosen frame

Pin the page and return its address.
If requests can be predicted (e.g., sequential scans)

pages can be pre-fetched several pages at a time!
More on Buffer Management
 If a requested page is not in the buffer pool

and a free frame is not available.
 A frame with pin_count =0 is chosen for
replacement according to the buffer replacement
policy.
 If no page in the pool has pin_count 0, the buffer
manager has to wait till some page is released;
the requesting transaction may be simply aborted.
More on Buffer Management
 Requestor of page must unpin it, and indicate

whether page has been modified:
 dirty bit is used for this.
 If a page is requested by several different
transactions
 Each transaction should obtain a lock on
the page before it read or modify the page

(locking protocol)
Buffer Replacement Policy
 Least recently used(LRU)

 A queue of pointers to frames with pin_count 0
 A frame is added to the end of the queue. When it becomes
a candidate for replacement.
 The frame at the head of queue is chosen for replacement.
 Clock
 First in first out(FIFO)
 Most recently used (MRU)
Buffer Replacement Policy
 Frame is chosen for replacement by a replacement

policy:
 Least-recently-used (LRU), Clock, MRU, random etc.
 Policy can have big impact on # of I/O’s; depends on

the access pattern.
 Sequential flooding: Nasty situation caused by LRU +
repeated sequential scans.
 # buffer frames < # pages in file means each page
request causes an I/O. MRU much better in this

situation (but not in all situations, of course).
DBMS vs. OS File System
OS does disk space & buffer mgmt: why not let OS manage
these tasks?
 Differences in OS support: portability issues

 Some limitations, e.g., files can’t span disks.
 Buffer management in DBMS requires ability to:
 pin a page in buffer pool, force a page to disk (important
for implementing CC & recovery),

 adjust replacement policy, and pre-fetch pages based on
access patterns in typical DB operations.

Page Formats
 DBMS see data as a collection of records
 Hence, for fixed-length records, pages can be

considered as a collection of slots
 Each slot contains a record
 Each record contains a record id

Page Formats: Fixed Length
Records
Alternative 1 Alternative 2
Slot 1 Slot 1
Slot 2 Slot 2
Free
... Space ...
Slot N Slot N
Slot M
N 1 . . . 0 1 1M
number M ... 3 2 1 number
PACKED of records UNPACKED, BITMAP of slots
 Record id = <page id, slot #>. In first alternative,

moving records for free space management changes
rid; may not be acceptable.
Page Formats: Variable Length
Records
 Fixed length slots not possible
 Slot should be as large as the largest record
 If not, waste space for smaller records
 In variable-length records, having contiguous free

space needs is highly desirable (avoid free blocks
in the middle of pages)
Page Formats: Variable Length
Records
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20 16 24 N Pointer
N ... 2 1 # slots to start
of free
space
SLOT DIRECTORY
 Maintains <record offset, record length> parameters. Offset in
slot directory & length first few bytes of the record or catalog
for fixed length records
 Can move records on page without changing rid; so, attractive
for fixed-length records too.
Record Formats: Fixed Length
F1 F2 F3 F4
L1 L2 L3 L4
Base address (B) Address = B+L1+L2
 Information about field types same for all

records in a file; stored in system catalogs.
 Finding i’th field requires scan of record.
Record Formats: Variable
Length
 Two alternative formats (# fields is fixed):
F1 F2 F3 F4
4 $ $ $ $
Field
Fields Delimited by Special Symbols
Count
F1 F2 F3 F4
Array of Field Offsets

 Second offers direct access to i’th field, efficient storage
of nulls (special don’t know value); small directory overhead.
Summary
 Components of a DBMS
 Disks
 RAID
 Disk Space Management
 Buffer Management
 Page Formats
 Record Formats

Disk

Uploaded by

Copyright:

Available Formats

Disk

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Disk

Uploaded by

Copyright:

Available Formats

Components of a DBMS,

Disks and Files

 Data Storage: Disks and Files…

 The queries are parsed and presented to the

 An execution plan is produced after

 Execution plan consists of a set of relational

 Files and Access Methods provide an

 Files are considered to be a set of pages

 Buffer manager brings pages from disk to

 DBMS supports concurrency control and crash

 This requires users scheduling the transactions

 DBMS stores information on (“hard”) disks.

 Tapes for archiving older versions of the data

 This interface commands to read/write sectors by

 A well known interface for PCs is SCSI (Small

 Key to lower I/O cost: reduce seek/rotation

 They are linked together to prevent data loss

 Having multiple disks allows the employment of

 Goals: Increase performance and reliability.

 Two main techniques:

called the striping unit. Partitions are distributed over

information allows reconstruction of data if a disk fails.

 An array of disks reduces reliability

 Example.. parity scheme

 Most expensive solution

 Each disk has a mirror image (check disk)

 Parallel reads, a write involves two disks.

 Maximum transfer rate = transfer rate of one disk

(this level does not stripe the data).

Level 0+1: Striping and Mirroring

 Parallel reads, a write involves two disks.

 Level 3: Bit-Interleaved Parity

 Page = 1 disk block

 Maintaining free blocks…

DISK choice of frame dictated

 If frame is dirty, write it to disk

 Read requested page into chosen frame

If requests can be predicted (e.g., sequential scans)

 If a requested page is not in the buffer pool

 Requestor of page must unpin it, and indicate

 If a page is requested by several different

the page before it read or modify the page

 Least recently used(LRU)

 Frame is chosen for replacement by a replacement

 Policy can have big impact on # of I/O’s; depends on

request causes an I/O. MRU much better in this

 Differences in OS support: portability issues

for implementing CC & recovery),

access patterns in typical DB operations.

 Hence, for fixed-length records, pages can be

 Each slot contains a record

 Each record contains a record id

 Record id = <page id, slot #>. In first alternative,

 Slot should be as large as the largest record

 If not, waste space for smaller records

 In variable-length records, having contiguous free

Base address (B) Address = B+L1+L2

 Information about field types same for all

Array of Field Offsets