Chapter 6 - File_and_Storage
Chapter 6 - File_and_Storage
File &
Storage
COMP3278 Introduction to
Database Management Systems
Storage hierarchy
File organization
Buffer
3
Section 1
Storage
Media
Answer :
64KB L1 cache per core.
256KB L2 cache per core.
8MB L3 cache.
5
CPU Cache
7
Magnetic Disk
9
Magnetic Disk
A disk has many
platters.
Each platter has two
surfaces covered with
magnetic materials,
information is recorded
on the surfaces.
Each platter is divided
into circular tracks.
There are about 50,000
to 100,000 tracks per
platter. (very dense) 10
Magnetic Disk
Each track is further
divided into sectors.
A sector is the smallest
unit of data that can
be read/written.
Rotational Latency
The time required to rotate the platter until the required sector is
under the disk head.
Around 4 to 11 milliseconds per rotation (15,000 revolutions per
minute (rpm) to 5,400 rpm) 13
Magnetic Disk
Data Transfer Rate – the rate at which data is
retrieved from or stored to disks:
Typical value : 25 to 300 megabytes per second (MBps).
System bus
Disk
Controller
Disk Track
Inner sector
1. Search data Outer
(seek time + rotational delay) sector
Storage
Hierarchy
Primary Storage
Main Memory
Reliability &
Efficiency
Error-correction schemes.
25
Disk Failure
MTTF (Mean time to failure) – average time the disk
is expect to run continuously without failure.
Parity check
0 1 1
1 0 1
1 1 0
1 0 0 1 Exclusive or (XOR)
0
0 0 0 1 1
1 1 1 0 1 Observation
1 0 0 1 0 The number of
0 0 0 0 0
1 1 1 1 0
“1”s among the bit
1 0 1 1 1 values in each row
(including P) must
b1 b2 b3 b4 P be even.
File
Organization
35
File Organization
Each file is logically partitioned into fixed-length
storage units called blocks, which are the units of
both storage and data transfer.
Answer
Record access is simple, but Retrieving one record across two
records may cross blocks. blocks requires two I/Os, which
doubles the amount of disk
Modification: Do not allow records to access time (if no buffer is used)
cross blocks, let those areas as unused
area. (Why?)
39
Free list
Store the address of the first deleted record in the
file header.
Use the first record to store the address of the
second deleted record, and so on.
40
1b. Variable-length records
Variable-length records arise in database systems in
several ways:
Storage of multiple records types in the same block
(e.g, Some tuples of the Instructor table, and some
tuples of the Department table stored together in one
block).
Record types that allow variable lengths for one or
more fields. (e.g., VARCHAR(250), TEXT …etc)
41
1b. Variable-length records
Slotted-page structure is commonly used for
organizing variable-length records within a block.
#Entries Size
A block
2 15 13
1004 Kit CS 15000 1012 Jacky CS 15000
End of free space in the block One record (13 bytes) One record (15 bytes)
42
2. Organizing records in files
A file contains a number of blocks.
Each block stores a number of records.
B1 B4 B2
Blocks with free space
Header
B3 B7 B6 Full blocks
44
2b. Sequential file
Store records in sequential order, based on the
value of the search key of each record.
One block
ordering according to the 1012 Ben CS 18000
One block
the new record there. 1012 Ben CS 18000
1059 John History 20000
1066 Peter CS 24000
Otherwise, insert
1084 Billy Civil 21000
the new record in 1095 Betty CS 16000
an overflow block.
1011 Ken CS 19000
Overflow block 46
2c. Hashing
A hash function is computed on some attribute of
each record.
The result of the hash function specifies in which
block of the file the record should be placed.
1001 Kit CS 15000
Hashing Block 1
ID mod 10 Block 2
1012 Ben CS 18000
Block 3
…
Will be elaborated in the next Chapter.
47
2d. Multitable clustering
We may put ≥ 2 related relations in the same file, to
achieve faster joins. lecturerID name dptID salary
1001 Ben 1 20000
dptID name budget 1008 Jacky 2 30000
Logical 1 CS 300000 1016 John 1 25000
2 Civil 200000
Level 1005 Betty 1 22000
Buffer
Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : ckchui@cs.hku.hk
Buffer manger
DBMS seeks to minimize the number of blocks
transfers between disks and memory.
Buffer 2 4 7
Disk 1 2 3 4 5 6 7 8 9 …
52
A block
1. Buffer miss
Database Application
Step 1. Read data
in Block 1.
Buffer 2 4 7 1
Step 2. Block 1 is not
in the buffer, so it is
retrieved from disk (1
Disk I/O ).
Disk 1 2 3 4 5 6 7 8 9 …
53
A block
1. Buffer miss
Database Application
Step 3. The required
data in block 1 is sent
to the application.
Buffer 2 4 7 1
Disk 1 2 3 4 5 6 7 8 9 …
54
A block
2. Buffer hit
Database Application Step 2. The required
Step 1. Read data
in Block 2. data is in the buffer,
return to users
without disk access
Buffer 2 4 7 1 (No I/O).
Disk 1 2 3 4 5 6 7 8 9 …
55
A block
3. Write operation
Database Application
Step 1. Update
record in block 2.
Buffer 2 4 7 1
Disk 1 2 3 4 5 6 7 8 9 …
56
A block
4. Buffer full
Database Application Since block 5 is
Step 1. Read data
not in buffer, we
in block 5.
need to fetch it
from disk. But
Buffer 2 4 7 1 we have no
buffer space …
?
Disk 1 2 3 4 5 6 7 8 9 …
57
A block
4. Buffer full
Database Application
Step 1. Read data
in block 5. Step 2. Suppose we
free the buffer space
that was used by
Buffer 2 4 5
7 1 block 7, simply
overwrites the buffer
slot.
Disk 1 2 3 4 5 6 7 8 9 …
58
A block
4. Buffer full
If we free the buffer
Database Application space that was used
Step 1. Read data by block 2, we need
in block 5. an extra step to
write the updated
data of block 2 from
Buffer 2 4 7 1 memory to disk
first.
Step 2. Write data
in block 2 to disk
Disk 1 2 3 4 5 6 7 8 9 …
59
A block
4. Buffer full
If we free the buffer
Database Application space that was used
Step 1. Read data by block 2, we need
in block 5. an extra step to
write the updated
data of block 2 from
Buffer 5 4 7 1 memory to disk
first.
Step 3. Read
Step 2. Write data block 5 from disk
in block 2 to disk
Disk 1 2 3 4 5 6 7 8 9 …
60
A block
Buffer replacement policy
Most operating systems replace the block that was
the least recently used – LRU strategy
The intuition behind LRU
If a block is not used for a long time, it is not likely
that it will be accessed again very soon.
This uses the past patterns of block accesses as a
predictor of future accesses.
Other replacement policies
LFU – Least frequently used., etc.
61
Data dictionary
Data dictionary (also called system catalog) stores
metadata, i.e., data about data. e.g.,
Information about relations (names of relations, names &
types of attributes, integrity constraints, views)
Statistical data (e.g., number of tuples in each relation).
62
Chapter 6.
END
COMP3278 Introduction to
Database Management Systems