Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Chapter 6 - File_and_Storage

asdad

Uploaded by

YouTubeATP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chapter 6 - File_and_Storage

asdad

Uploaded by

YouTubeATP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Chapter 6.

File &
Storage
COMP3278 Introduction to
Database Management Systems

Department of Computer Science, The University of Hong Kong


Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : ckchui@cs.hku.hk
In this chapter…
Outcome 1. Information Modeling
Able to understand the modeling of real life information in a database
system.

Outcome 2. Query Languages


Able to understand and use the languages designed for data access.

Outcome 3. System Design


Able to understand the design of an efficient and reliable database
system.

Outcome 4. Application Development


Able to implement a practical application on a real database.
2
Content
Storage media

Storage hierarchy

Reliability and efficiency

File organization

Buffer

3
Section 1

Storage
Media

Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278


For other uses, please email : ckchui@cs.hku.hk
CPU Cache

Intel core i7 CPU

Cache memory is extremely fast memory that is built


into a computer’s central processing unit (CPU).
Question: Do you know the cache size (L3
Volatile storage. cache) of an Intel Core i7 CPU?

Answer :
64KB L1 cache per core.
256KB L2 cache per core.
8MB L3 cache.
5
CPU Cache

Intel core i7 CPU

Its use is managed by the computer system hardware.


We shall not be concerned about managing CPU cache
storage in the database system.
However, it is worth noting that database implementers
do pay attention to cache effects when designing
query processing data structure and algorithms.
6
Main Memory
Volatile storage.

Fast access ( in nanoseconds : 10–9 seconds ).


Generally too small (or too expensive) to store
the entire database (of an enterprise).
Capacities of a few Gigabytes.
Question: What is the normal RAM size that
can be bought nowadays?
Capacities have gone up and
cost‐per‐byte has decreased Answer : There are 32GB
steadily and rapidly. DDR5 ram around HKD
2,100.

7
Magnetic Disk

An internal Hard Disk External Hard Disk

Primary medium for the long‐term storage of data;


typically stores the entire database.
Non‐volatile storage.
Access time: much slower than main memory.
Data are loaded into memory (a buffer) before accessed
by DBMS. 8
Magnetic Disk
Read‐Write Heads.

Positioned very closely


to the platter surface.
Reads or writes
magnetically encoded
information.

9
Magnetic Disk
A disk has many
platters.
Each platter has two
surfaces covered with
magnetic materials,
information is recorded
on the surfaces.
Each platter is divided
into circular tracks.
There are about 50,000
to 100,000 tracks per
platter. (very dense) 10
Magnetic Disk
Each track is further
divided into sectors.
A sector is the smallest
unit of data that can
be read/written.

Sector size is typically


512 bytes.

Typical sectors per track:


500-1000 (inner tracks)
1000-2000 (outer tracks).
11
Magnetic Disk
To read/write data

1. [Seek] Position the


head on the right track
by moving the disk arms.

2. [Rotation] Spin the


disk so that the start of
data is under the head.

3. [Transfer data] Continue


spinning and transfer the data.
12
Magnetic Disk
Access Time – the time between the request and the
start of data transfer. This consists of:
Seek time
The time required to reposition the arm over the correct track.
Around 2 to 30 milliseconds on typical disks, depend on the
physical location of the data.

Rotational Latency
The time required to rotate the platter until the required sector is
under the disk head.
Around 4 to 11 milliseconds per rotation (15,000 revolutions per
minute (rpm) to 5,400 rpm) 13
Magnetic Disk
Data Transfer Rate – the rate at which data is
retrieved from or stored to disks:
Typical value : 25 to 300 megabytes per second (MBps).

As multiple disks may share the same controller, we have


to be aware of the controller’s processing speed.

System bus

Disk
Controller

Disk 1 Disk 2 Disk 3 … 14


Data block
Data must be first transferred to main memory
(buffer) before the DBMS can operate on them.

The data transfer unit between disk and memory is


called a Data block.
Usually with size 4KB to 16KB (spans multiple sectors).
When a single item is needed (e.g., an attribute value of a
specific tuple), the whole block that contains the item is
transferred.
Reading / writing of a disk block is called an I/O operation.
15
I/O Operation
Main memory Transfer data to
CPU
CPU (very fast)

2. Transfer data (in block units)

Disk Track
Inner sector
1. Search data Outer
(seek time + rotational delay) sector

Disk sectors (512 bytes each)

A data block (E.g., Block number 1)


(4KB = 8*512 bytes consecutive disk sectors)
Consecutive sectors 16
I/O Operation
The time required to read/write a block depends on
the block’s location on disk

Time for one I/O operation


= seek time + rotational delay + transfer time

Efficiency issue: Time to move data from/to disk


usually dominates the cost of processing a query
(CPU actions are in nano‐seconds, and a block access
is in milli‐seconds!)
17
Magnetic Tape
Used primarily for offline
backup (to recover from disk
failures) and archival purpose.
Non‐volatile storage.
Cost: very low.
Access speed: slow,
and only sequential
access. Oracle StorageTek tape
drive stores up to 18 TB of The Oracle StorageTek tape library can
uncompressed data per archive up to 57.6EB of uncompressed
cartridge. tape capacity 18
Optical
Non‐volatile storage.
Access Speed:
Read – much slower than magnetic disks (especially on
seeks, i.e., random access);
Write – even more slower than magnetic disks.

Capacity: DVD (4.7GB to 17GB) BDXL (100GB to 128GB)


Usually write‐once, read many (WORM) optical disks
are used for archival storage.
19
Flash Memory
Non‐volatile storage.
Access Speed:
Read/write : in “page” granularity, corresponds
to disk “sector” (typically 4 KiB).
Erase: A page can only be written to, after it is
erased. With high latency, typically in ms time.
Each cell has limited program/erase lifetime
(thousands, for modern devices) – Cells become
slowly less reliable. Source of information from Windows101Tricks:
https://windows101tricks.com/ssd-vs-hdd-which-is-better-for-you/

Parameter SSD available for consumers as of 2020


Capacity Up to 8 TB
Sequential read speed Up to 6.795 GB/s
Sequential write speed Up to 4.397 GB/s
20
Source of information from Wikipedia: https://en.wikipedia.org/wiki/Solid-state_drive
Section 2

Storage
Hierarchy

Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278


For other uses, please email : ckchui@cs.hku.hk
Storage Hierarchy
Put all data on disks
Data must be maintained after power failure.
Storage capacity must be big enough for all data.

Use memory for temporary storage and manipulation


of data during queries
Selected data are transferred to memory for fast processing.
Updates are first performed in memory and later written
back to disk.
Backup data on tertiary storage
Periodically backup the contents of DB on tapes.
22
Storage Hierarchy
CPU

Increasing speed and cost, decreasing size


Cache

Primary Storage
Main Memory

I/O (slow) Flash


Memory
Secondary Storage
Magnetic Disks
(Online storage)

Tertiary Storage Magnetic Optical


(Offline storage) Tape Storage
23
Section 3

Reliability &
Efficiency

Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278


For other uses, please email : ckchui@cs.hku.hk
Reliability and Efficiency
Reliability - Hard disks may fail, but we don’t want
to lost our data.

Efficiency - Disks are slow compare with the speed


of CPU.
Solutions
Mirroring.
Data striping.

Error-correction schemes.
25
Disk Failure
MTTF (Mean time to failure) – average time the disk
is expect to run continuously without failure.

Mean time to data loss - depends on MTTF, and how


disks are organized.
Surely not!!! Question
With multiple disks, the mean time to data Suppose vendor claims that the
lost will be shorten! Think about a cluster MTTF of a disk is 100,000 hours
of 100 disks, what is the mean time to (11 years).
data lost? Does it mean that it is unlikely
to encounter disk failure in an
… enterprise database system?

Disk 1 Disk 100


26
Mirroring
Storing a redundant copy of data in another disk(s).
Mean time to data loss will be much longer.
Efficiency : If I have more than
one disks, can I also
The rate at which read requests increase the speed
can be handled is doubled - read of processing each
requests can be set to all disks. read request ?

Note: The speed of each read (or query)


is the same as in a single-disk system.

Disk 1 Disk 2 Disk 3 (mirror) Disk 4 (mirror)


27
Data Striping
Data are partitioned to several disks (e.g., first
block to disk 1, second block to disk 2, etc.)

Faster read can be achieved by parallel read.


Disk
Blocks b1 b2 b3 b4 b5 However, it doesn’t
improve reliability 

Disk 1 Disk 2 Disk 3 Disk 4


28
Parity check
1 0 0 1
0 0 0 1 Think about it:
1 1 1 0 Can you use one more
1 0 0 1 disk to store some extra
0 0 0 0
1 1 1 1
information so that the
1 0 1 1 database tolerates the
failure of one disk?
b1 b2 b3 b4

Disk 1 Disk 2 Disk 3 Disk 4 29


A B A XOR B
0 0 0

Parity check
0 1 1
1 0 1
1 1 0

1 0 0 1 Exclusive or (XOR)
0
0 0 0 1 1
1 1 1 0 1 Observation
1 0 0 1 0 The number of
0 0 0 0 0
1 1 1 1 0
“1”s among the bit
1 0 1 1 1 values in each row
(including P) must
b1 b2 b3 b4 P be even.

Disk 1 Disk 2 Disk 3 Disk 4 30


Parity disk
Parity check
1 0 ? 1 0
Although the data
0 0 ? 1 1
1 1 ? 0 1 on Disk 3 is lost, we
1 0 ? 1 0 can reconstruct the
This bit 0 0 ? 0 0 data in Disk 3 based
must be 0! 1 1 ? 1 0
Just apply on the data on Disk
1 0 ? 1 1
XOR 1,2,4 and P!
among the
bits of the b1 b2 Lost
b3 b4 P
other
disks.

Disk 1 Disk 2 Disk 3 Disk 4 31


Parity disk
RAID
Redundant Arrays of Independent Disks.
Mirroring provides high reliability, but it is expensive.
Striping provides high data-transfer rate, but does not
improve reliability.
RAID has various levels – with different combinations
of mirroring, striping and error-correction strategies.
RAID 0 – Striping only, no mirroring.
RAID 1/ RAID 10/ RAID 1+0 – Mirroring with striping.
Others: RAID level 2,3,4,5,6.
32
RAID 5
By striping also the Parity disk, all disks can share the
workload of read requests. Disk block
A block of containing
parity bits data
P0 0 1 2 3
4 P1 5 6 7
8 9 P2 10 11
12 13 14 P3 15
16 17 18 19 P4

Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 33


Section 4

File
Organization

Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278


For other uses, please email : ckchui@cs.hku.hk
File Organization
A database store the records in file(s).
Many large-scale database systems do not rely directly
on the underlying operating system for file
management.

Instead, one large operating system file is allocated to the


database system. The database system stores all relations
in this one file, and manages the file itself.

35
File Organization
Each file is logically partitioned into fixed-length
storage units called blocks, which are the units of
both storage and data transfer.

A block may contain several records.


Assumption : No record is larger than a block.
This assumption is realistic for most data-processing
applications. Large data items (e.g., images), can be stored
separately, and storing a pointer to the data item in the
record.
36
1. Records
There are two different ways of storing records in a
block:
1a. Fixed-length records.

1b. Variable-length records.


How the
Instructor ( database stores
ID VARCHAR(5), the records of
name VARCHAR(20), tables?
dept_name VARCHAR(20),
salary INT
)
37
1a. Fixed-length records
Fixed-length records
The length of every record is fixed.
A block (e.g., 4KB)
1004 Kit CS 15000 1012 Ben CS 18000 …

One record (49 bytes) One record (49 bytes)


Suppose each INT value takes 4 bytes,
Instructor ( and each CHAR takes 1 byte. Each
Instructor record then has a maximum
ID VARCHAR(5), length of 49 bytes! How many records
name VARCHAR(20), can fit in a block of 4KB size?
dept_name VARCHAR(20), Floor((1024 * 4) / 49) = 83 records!
salary INT
) 38
1a. Fixed-length records
Fixed-length records
Unused space (29 bytes)
The length of every record is fixed.
A block (e.g., 4KB)
1004 Kit CS 15000 1012 Ben CS 18000 …

One record (49 bytes) One record (49 bytes)

Answer
Record access is simple, but Retrieving one record across two
records may cross blocks. blocks requires two I/Os, which
doubles the amount of disk
Modification: Do not allow records to access time (if no buffer is used)
cross blocks, let those areas as unused
area. (Why?)
39
Free list
Store the address of the first deleted record in the
file header.
Use the first record to store the address of the
second deleted record, and so on.

1012 Ben CS 18000


Free space caused 1059 John History 20000
by tuple deletions

1095 Betty CS 16000

40
1b. Variable-length records
Variable-length records arise in database systems in
several ways:
Storage of multiple records types in the same block
(e.g, Some tuples of the Instructor table, and some
tuples of the Department table stored together in one
block).
Record types that allow variable lengths for one or
more fields. (e.g., VARCHAR(250), TEXT …etc)

41
1b. Variable-length records
Slotted-page structure is commonly used for
organizing variable-length records within a block.

Block header Free space Records

#Entries Size
A block
2 15 13
1004 Kit CS 15000 1012 Jacky CS 15000

End of free space in the block One record (13 bytes) One record (15 bytes)

42
2. Organizing records in files
A file contains a number of blocks.
Each block stores a number of records.

How are the blocks organized in a file?


Which record should be stored in which block?
File organization
2a. Heap file
2b. Sequential file
2c. Hashing
2d. Multitable clustering 43
2a. Heap file
No ordering of records, can place anywhere
Adv: Simplicity – stores every record in any empty
space in any blocks.
Div: New blocks are allocated or destroyed dynamically;
i.e., blocks in a file may be scattered over the disk.

B1 B4 B2
Blocks with free space
Header

B3 B7 B6 Full blocks

44
2b. Sequential file
Store records in sequential order, based on the
value of the search key of each record.

Sequential file is designed for efficient processing of


records in sorted order based on some search key.

Note that this sequential 1004 Kit CS 15000

One block
ordering according to the 1012 Ben CS 18000

Lecturer IDs can help the 1059 John History 20000

processing of the following 1066 Peter CS 24000

query. 1084 Billy Civil 21000


1095 Betty CS 16000

SELECT * FROM Lecturer WHERE ID < 1020; 45


2b. Sequential file
Has to maintain the order during record insertion.
Locate the record in the file that comes before the
record to be inserted in search-key order.
If there is a free slot (maybe after previous deletion)
within the same block
as this record, insert 1004 Kit CS 15000

One block
the new record there. 1012 Ben CS 18000
1059 John History 20000
1066 Peter CS 24000
Otherwise, insert
1084 Billy Civil 21000
the new record in 1095 Betty CS 16000
an overflow block.
1011 Ken CS 19000
Overflow block 46
2c. Hashing
A hash function is computed on some attribute of
each record.
The result of the hash function specifies in which
block of the file the record should be placed.
1001 Kit CS 15000
Hashing Block 1
ID mod 10 Block 2
1012 Ben CS 18000
Block 3


Will be elaborated in the next Chapter.
47
2d. Multitable clustering
We may put ≥ 2 related relations in the same file, to
achieve faster joins. lecturerID name dptID salary
1001 Ben 1 20000
dptID name budget 1008 Jacky 2 30000
Logical 1 CS 300000 1016 John 1 25000
2 Civil 200000
Level 1005 Betty 1 22000

SELECT * FROM Lecturer L, Department D


WHERE L.dptID = D.dptID;
1 CS 300000
Tuples of the Lecturer table
1001 Ben 20000 group by dptID, and are
Tuples of the
Physical Department 1016 John 25000 ordered after the
Level table 1005 Betty 22000 corresponding department
record in the file.
(A file) 2 Civil 200000
48
1008 Jacky 30000
A comparison
Heap file is the cheapest to maintain.
But have to scan all data to locate a specific record.

Sequential file helps query evaluation.


But it is difficult to maintain a sequential file.

Clustering file helps joins and finding related


records over different relations.
Accessing data on only one relation may suffer.
Variable size records may be difficult to handle.
49
Section 5

Buffer
Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : ckchui@cs.hku.hk
Buffer manger
DBMS seeks to minimize the number of blocks
transfers between disks and memory.

Buffer - Portion of memory available to store copies of


disk blocks.

Buffer Manager – A program responsible for allocating


buffer space in main memory and moving blocks
between disk and memory (so that disk I/O is
minimized).
51
1. Buffer miss
Database Application
Step 1. Read data
in Block 1.

Buffer 2 4 7

Disk 1 2 3 4 5 6 7 8 9 …

52
A block
1. Buffer miss
Database Application
Step 1. Read data
in Block 1.

Buffer 2 4 7 1
Step 2. Block 1 is not
in the buffer, so it is
retrieved from disk (1
Disk I/O ).

Disk 1 2 3 4 5 6 7 8 9 …

53
A block
1. Buffer miss
Database Application
Step 3. The required
data in block 1 is sent
to the application.
Buffer 2 4 7 1

Disk 1 2 3 4 5 6 7 8 9 …

54
A block
2. Buffer hit
Database Application Step 2. The required
Step 1. Read data
in Block 2. data is in the buffer,
return to users
without disk access
Buffer 2 4 7 1 (No I/O).

Disk 1 2 3 4 5 6 7 8 9 …

55
A block
3. Write operation
Database Application
Step 1. Update
record in block 2.

Buffer 2 4 7 1

Updates are done in memory only. The result will be


reflected on disk when the buffer is flushed back to disk
(or under other reliability requirements).

Disk 1 2 3 4 5 6 7 8 9 …

56
A block
4. Buffer full
Database Application Since block 5 is
Step 1. Read data
not in buffer, we
in block 5.
need to fetch it
from disk. But
Buffer 2 4 7 1 we have no
buffer space …
?

Disk 1 2 3 4 5 6 7 8 9 …

57
A block
4. Buffer full
Database Application
Step 1. Read data
in block 5. Step 2. Suppose we
free the buffer space
that was used by
Buffer 2 4 5
7 1 block 7, simply
overwrites the buffer
slot.

Disk 1 2 3 4 5 6 7 8 9 …

58
A block
4. Buffer full
If we free the buffer
Database Application space that was used
Step 1. Read data by block 2, we need
in block 5. an extra step to
write the updated
data of block 2 from
Buffer 2 4 7 1 memory to disk
first.
Step 2. Write data
in block 2 to disk

Disk 1 2 3 4 5 6 7 8 9 …

59
A block
4. Buffer full
If we free the buffer
Database Application space that was used
Step 1. Read data by block 2, we need
in block 5. an extra step to
write the updated
data of block 2 from
Buffer 5 4 7 1 memory to disk
first.
Step 3. Read
Step 2. Write data block 5 from disk
in block 2 to disk

Disk 1 2 3 4 5 6 7 8 9 …

60
A block
Buffer replacement policy
Most operating systems replace the block that was
the least recently used – LRU strategy
The intuition behind LRU
If a block is not used for a long time, it is not likely
that it will be accessed again very soon.
This uses the past patterns of block accesses as a
predictor of future accesses.
Other replacement policies
LFU – Least frequently used., etc.
61
Data dictionary
Data dictionary (also called system catalog) stores
metadata, i.e., data about data. e.g.,
Information about relations (names of relations, names &
types of attributes, integrity constraints, views)
Statistical data (e.g., number of tuples in each relation).

Physical file organization (sequential, hashing, etc.)


Frequently accessed by the buffer manager and query
optimizer and therefore stays in the memory for fast access.

62
Chapter 6.

END
COMP3278 Introduction to
Database Management Systems

Department of Computer Science, The University of Hong Kong


Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : ckchui@cs.hku.hk

You might also like