Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Disk

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 49

Components of a DBMS,

Disks and Files


Last Lecture…
 Advanced SQL

 Any questions?
This Lecture…
 Components of a DBMS…

 Data Storage: Disks and Files…


Structure of a DBMS
Structure of a DBMS…
(contd.)
 DBMSs accept SQL statements

 The queries are parsed and presented to the


optimizer (for query optimization)

 An execution plan is produced after


optimization

 Execution plan consists of a set of relational


operators and extra information
Structure of a DBMS…
(contd.)
 All data is stored in logical storage called files

 Files and Access Methods provide an


interface to manipulate files

 Files are considered to be a set of pages

 Buffer manager brings pages from disk to


main memory
Structure of a DBMS…
(contd.)
 Disk Space Manger deals with the
management of space on disk.

 DBMS supports concurrency control and crash


recovery

 This requires users scheduling the transactions


carefully and maintaining a log of all changes to
the database
Structure of a DBMS…
(contd.)
 DBMS components for transactions
processing include
 Transaction Manager: ensures that
transactions requests/releases locks according
to a appropriate locking protocol
 Lock Manager: Keeps track of locks for each
database object
 Recovery Manager: Maintains the log and
restores the system after a crash
Data Storage: Disks and Files

 DBMS stores information on (“hard”) disks.


 This has major implications for DBMS design!
 READ: transfer data from disk to main memory
(RAM).
 WRITE: transfer data from RAM to disk.
 Both are high-cost operations, relative to in-
memory operations, so must be planned carefully!
Why Not Store Everything in Main
Memory?
 Costs too much. Rs. 9000 will buy you either 256MB
of RAM or 40GB of disk.
 Main memory is volatile. We want data to be saved
between runs. (Obviously!)
 Typical storage hierarchy:
 Main memory (RAM) for currently used data

(primary storage).
 Disk for the main database (secondary storage).

 Tapes for archiving older versions of the data

(tertiary storage).
Memory Hierarchy
Characteristics of Storage
Medium
Cost Speed

Primary Storage
(main memory,
cache, etc.)
Increase Decrease
Secondary
Storage
(magnetic disks,
optical disks, etc.)

Tertiary Storage
(tapes)
Disks
 Secondary storage device of choice.
 Main advantage over tapes: random access vs.
sequential.
 Data is stored and retrieved in units called disk
blocks or pages.
 Unlike RAM, time to retrieve a disk page varies
depending upon location on disk.
 Therefore, relative placement of pages on disk has
major impact on DBMS performance!
Components of a Disk Spindle
Tracks
Disk head
 The platters spin (say, 90rps).
Sector
Track: concentric rings on platters
Cylinder: the set of all tracks with
the same diameter
Sector: each track is divided into
fixed-sized arcs, called sectors
Block: the unit in which data is Platters
Arm movement
written and read from disk;
block size is a multiple of sector size.
Disk heads: move as a unit;
only one head read/write at
any one time. Arm assembly
Arm assembly: move in or out to
position a disk head
on a desired track.
Components of a Disk
 A disk controller is an interface which controls the
disk drive.

 This interface commands to read/write sectors by


moving the arm assembly and transferring data
to/from the disk surfaces

 A well known interface for PCs is SCSI (Small


Computer Storage Interface)
Accessing a Disk Page
 Reading/ writing a disk block is called an
I/O operation:
 seek time (moving arms to position disk head on
track):1-20 ms
 rotational delay (waiting for block to rotate under
head)0-10ms usually less than seek time
 transfer time (actually moving data to/from disk
surface) 1ms per 4KB block
Disk Performance
 Seek time and rotational delay dominate.
 Seek time varies from about 1 to 20msec
 Rotational delay varies from 0 to 10msec
 Transfer rate is about 1msec per 4KB page

 Key to lower I/O cost: reduce seek/rotation


delays! Hardware vs. software solutions?
Disk Performance… (contd.)
 ‘Next’ block concept:
 blocks on same track, followed by
 blocks on same cylinder, followed by
 blocks on adjacent cylinder
 Blocks in a file should be arranged sequentially
on disk (by `next’), to minimize seek and
rotational delay.
 For a sequential scan, pre-fetching several
pages at a time is a big win!
Questions
 Consider a disk with a sector size of 512 bytes, 2000 tracks per
surface, 50 sectors per track, five double-sided platters, and
average seek time of 10 msec.
1. What is the capacity of a track in bytes? What is the capacity of
each surface? What is the capacity of the disk?
2. How many cylinders does the disk have?
3. Give examples of valid block sizes. Is 256 bytes a valid block size?
2048? 51200?
4. If the disk platters rotate at 5400 rpm (revolutions per minutes)
what is the maximum rotational delay?
5. If one track of data can be transferred per revolution, what is the
transfer rate?
RAID-Redundant Array of
Independent Disks
 It is a setup consisting of multiple disks for data
storage.

 They are linked together to prevent data loss


and/or speed up performance.

 Having multiple disks allows the employment of


various techniques like disk striping, disk
mirroring and parity.
RAID… (contd.)
 Disk Array: Arrangement of several disks that gives
abstraction of a single, large disk.

 Goals: Increase performance and reliability.

 Two main techniques:


 Data striping: Data is partitioned; size of a partition is

called the striping unit. Partitions are distributed over


several disks.
 Redundancy: More disks -> more failures. Redundant

information allows reconstruction of data if a disk fails.


RAID… (contd.)
 Data Striping Example…
 D disks = D blocks are transferred simultaneously
 Transfer Rate = D times faster

 An array of disks reduces reliability


 Example
 1 disk (MTTF) ~ 50,000 hours
 100 disks 50,000 hours / 100 ~ 21 days
RAID… (contd.)
 In RAID, redundant information is
stored to increase reliability

 Example.. parity scheme


RAID Levels

 Level 0: No redundancy
 Uses data striping
 Reliability an issue
 Best Write Performance (No writing of redundant
data)
RAID Levels… (contd.)
 Level 1: Mirrored (two identical copies)
 No data striping

 Most expensive solution

 Each disk has a mirror image (check disk)

 Parallel reads, a write involves two disks.

 Maximum transfer rate = transfer rate of one disk

(this level does not stripe the data).


 Space utilization = 50%
RAID Levels… (contd.)

Level 0+1: Striping and Mirroring


 Parallel reads, a write involves two disks.


 Maximum transfer rate = aggregate
bandwidth
 Space utilization = 50%
RAID Levels… (contd.)
 Level 2: Error-Correcting Codes
 Striping unit = 1 bit
 Redundancy scheme = Hamming Code
RAID Levels… (contd.)

 Level 3: Bit-Interleaved Parity


 Striping Unit: One bit. One check disk.
 Each read and write request involves all disks;
disk array can process one request at a time.
RAID Levels… (contd.)
 Level 4: Block-Interleaved Parity
 Striping Unit: One disk block. One check
disk.
 Parallel reads possible for small requests,
large requests can utilize full bandwidth
 Writes involve modified block and check
disk
RAID Levels… (contd.)
 Level 5: Block-Interleaved Distributed
Parity
 Similar to RAID Level 4, but parity blocks
are distributed over all disks
RAID Levels… (contd.)
 Level 6: P+Q Redundancy
 Similar to RAID Level 5
 Two check disks (uses Reed-Solomon
codes)
 Able to recover from 2 failures
Disk Space Management
 Lowest layer of DBMS software manages space on
disk.
 Higher levels call upon this layer to:
 allocate/de-allocate a page

 read/write a page

 Page = 1 disk block


 1 page request = 1 disk block
 Request for a sequence of pages must be satisfied by
allocating the pages sequentially on disk!
Disk Space Management…
(contd.)
 Higher levels don’t need to know how this
is done, or how free space is managed.

 Maintaining free blocks…


 Keeping a list of free blocks & a pointer to
the first free block

 Maintaining a Bitmap
Buffer Management in a DBMS
Page Requests from Higher Levels

BUFFER POOL

disk page

free frame

MAIN MEMORY

DISK choice of frame dictated


DB by replacement policy
 Data must be in RAM for DBMS to operate on it!
 Table of <frame#, pageid> pairs is maintained.
 For each frame#, 2 variables (pin count: no of current users & dirty bit: page is modified)
When a Page is Requested ...
 The buffer manager checks the buffer pool to see if some frame contains
the requested page.
 If so, (pin count ++) for that frame
 If requested page is not in pool:
 Choose a frame for replacement

 If frame is dirty, write it to disk

 Read requested page into chosen frame


Pin the page and return its address.

If requests can be predicted (e.g., sequential scans)


pages can be pre-fetched several pages at a time!
More on Buffer Management

 If a requested page is not in the buffer pool


and a free frame is not available.
 A frame with pin_count =0 is chosen for
replacement according to the buffer replacement
policy.
 If no page in the pool has pin_count 0, the buffer
manager has to wait till some page is released;
the requesting transaction may be simply aborted.
More on Buffer Management

 Requestor of page must unpin it, and indicate


whether page has been modified:
 dirty bit is used for this.

 If a page is requested by several different

transactions
 Each transaction should obtain a lock on

the page before it read or modify the page


(locking protocol)
Buffer Replacement Policy

 Least recently used(LRU)


 A queue of pointers to frames with pin_count 0
 A frame is added to the end of the queue. When it becomes
a candidate for replacement.
 The frame at the head of queue is chosen for replacement.
 Clock
 First in first out(FIFO)
 Most recently used (MRU)
Buffer Replacement Policy

 Frame is chosen for replacement by a replacement


policy:
 Least-recently-used (LRU), Clock, MRU, random etc.

 Policy can have big impact on # of I/O’s; depends on


the access pattern.
 Sequential flooding: Nasty situation caused by LRU +
repeated sequential scans.
 # buffer frames < # pages in file means each page

request causes an I/O. MRU much better in this


situation (but not in all situations, of course).
DBMS vs. OS File System

OS does disk space & buffer mgmt: why not let OS manage
these tasks?

 Differences in OS support: portability issues


 Some limitations, e.g., files can’t span disks.
 Buffer management in DBMS requires ability to:
 pin a page in buffer pool, force a page to disk (important

for implementing CC & recovery),


 adjust replacement policy, and pre-fetch pages based on

access patterns in typical DB operations.


Page Formats
 DBMS see data as a collection of records

 Hence, for fixed-length records, pages can be


considered as a collection of slots

 Each slot contains a record

 Each record contains a record id


Page Formats: Fixed Length
Records
Alternative 1 Alternative 2
Slot 1 Slot 1
Slot 2 Slot 2
Free
... Space ...
Slot N Slot N

Slot M
N 1 . . . 0 1 1M
number M ... 3 2 1 number
PACKED of records UNPACKED, BITMAP of slots

 Record id = <page id, slot #>. In first alternative,


moving records for free space management changes
rid; may not be acceptable.
Page Formats: Variable Length
Records
 Fixed length slots not possible

 Slot should be as large as the largest record

 If not, waste space for smaller records

 In variable-length records, having contiguous free


space needs is highly desirable (avoid free blocks
in the middle of pages)
Page Formats: Variable Length
Records
Rid = (i,N)
Page i

Rid = (i,2)

Rid = (i,1)

20 16 24 N Pointer
N ... 2 1 # slots to start
of free
space
SLOT DIRECTORY
 Maintains <record offset, record length> parameters. Offset in
slot directory & length first few bytes of the record or catalog
for fixed length records
 Can move records on page without changing rid; so, attractive
for fixed-length records too.
Record Formats: Fixed Length

F1 F2 F3 F4

L1 L2 L3 L4

Base address (B) Address = B+L1+L2

 Information about field types same for all


records in a file; stored in system catalogs.
 Finding i’th field requires scan of record.
Record Formats: Variable
Length
 Two alternative formats (# fields is fixed):
F1 F2 F3 F4

4 $ $ $ $

Field
Fields Delimited by Special Symbols
Count
F1 F2 F3 F4

Array of Field Offsets


 Second offers direct access to i’th field, efficient storage
of nulls (special don’t know value); small directory overhead.
Summary
 Components of a DBMS
 Disks
 RAID
 Disk Space Management
 Buffer Management
 Page Formats
 Record Formats

You might also like