Disk memory provides large, non-volatile storage. It uses rotating platters coated with magnetic surfaces and read/write heads. Disk access time has four components - seek time to position the head, rotational latency to wait for the desired sector, transfer time to read/write data, and controller overhead time. RAID (redundant array of independent disks) uses multiple disks for performance, reliability, and availability. Popular RAID levels include RAID 1 which uses mirroring for redundancy, and RAID 5 which uses distributed parity blocks.
2. Review: Major Components of a Computer
Processor
Control
Datapath
Memory
Devices
Input
OutputCache
Main
Memory
Secondary
Memory
(Disk)
3. DISK MEMORY
Purpose:
long term, non volatile storage
Lowest level in memory
- Slow ,large, in-expensive
General structure:
A rotting platter coated with a magnetic surface
A moveable read/write head to access the information on disk.
Typical numbers:
1 to 4 platters (each with recordable surfaces) per disk of 1” to
3.5” in diameter
Rotational speed of 5400 to15000 RPM
10,000 to 5000 tracks per surface
CYLINDER: all the tracks under the head at a given point to all surfaces 100 to 500
sectors per track.
Sector: The smallest unit that can read/ write.
Sector
Track
4. Magnetic Disk Characteristic
Disk read/write components
1. Seek time: position the head over the
proper track (3 to 13 ms avg)
- due to locality of disk references
the actual average seek time may
be only 25% to 33% of the
advertised number
2. Rotational latency: wait for the desired sector to rotate under
the head (½ of 1 rotation/RPM, converted to ms)
- 0.5/5400RPM = 5.6 ms to 0.5/15000RPM = 2.0 ms
3. Transfer time: transfer a block of bits (one or more sectors)
under the head to the disk controller’s cache (70 to 125 MB/s
are typical disk transfer rates in 2008)
- disk controller’s cache takes advantage of spatial locality in disk
accesses. Cache transfer rates are much faster (e.g.375 MB/s)
4. Controller time: the overhead the disk controller imposes in
performing a disk I/O access (typically <0 .2 ms)
Sector
Track
Cylinder
Head
Platter
Controller
+
Cache
5. INTERFACES STANDARDS
Higher level disk interfaces have a microprocessor disk
controller that can lead to the performance optimization.
ATA ( Advance Technology Attachment): an interface standard
for the connection of storage devices, such as hard disk and solid state
devices and CD ROM devices. Parallel ATA has been largely replaced by
Serial ATA, widely used in PC’s.
SCSI ( Small Computer System Interface): A set of standards
(commands, protocols and electrical and optical interfaces) for
physically connecting and transferring data b/w computers, and
peripheral devices most commonly used for hard disk and tape
devices.
In particular, disk controller have SRAM disk caches which support fast
accesses to data that was recently read and often also include prefetch
algorithms to try to participate demands.
6. FLASH MEMORY
Flash Memory:
Is first credible challenger to disks. It is semiconductor
memory that is non-volatile like disks but has latency
100 to 1000 times faster than disks and is smaller, more
power efficient, and more shock resistance.
In 2008, the price of flash memory was $4 to $10 per
GB and about 2 to 10 times higher than disk read 5 to
10 times lower than the DRAM
7. DEPENDABILITY , RELIABILITY AND AVAILABILITY
Reliability: measured by the mean time to failure (MTTF)
Service Interruption is measured by mean time to repair (MTTR)
Availability: a measure of service accomplishment ( a percent of time
the system is running)
Availability: MTTF / (MTTF + MTTR)
To increase MTTF either improve the quality of the components or
design the system to continue operating in the presence of faulty
components.
Fault avoidance : preventing fault occurrence by construction
Fault tolerance: using redundancy to correct or bypass faulty
components.
Fault detection versus Fault correction
Permanent fault versus Transient fault
8. RAID : DISK ARRAY
Arrays of small and in expensive disk.
Increase potential throughput by having many disk devices.
Data is spread over multiple disk
Multiple accesses are made to several disks at a time.
Reliability: is lower than a single disk.
But availability can be improved by adding redundant disks (RAID)
Lost information can be reconstructed from redundant information
MTTR: mean time to repair is in the order of hours.
MTTF: mean time to failure of disk in terms of years.
9. RAID LEVEL 0 NO DEPENDENCY BUT STRIPPING
SEC1 SEC2 SEC3 SEC4
Multiple smaller disks as opposed to one big disk.
- spending the sector one multiple disks-striping-means
that multiple blocks can be accessed in parallel increasing
the performance(i.e. throughput)
- this 4 disk system gives four times the throughput of 1
disk system
Same cost as one big disk—assuming a small disks cost the same as big disk
No redundancy; so what? If one disk fails.
1. Failure of one more disk is more likely as the number of disks in the system increases
2. But this R0 is good for performance because we are doing it by “stripping”. It distribute the data
over the disk.
Sec1.b0 Sec1.b1 Sec1.b2 Sec.b3
10. RAID LEVEL 1 REDUNDANCY VIA MIRRORING
Users twice as many disks as RAID 0 ( 8 smaller
disks with the second set of 4 duplicating the first
set) so there are always copies of data.
# redundant disks=> # of data so reliability doubles the cost (reliable and most expensive)
- writes have to be made to both set of disks,
so write would only have ½ the performance
of a RAID0 with 8 disks
What if 1 disk fails?
If a disk fails, the system just goes to the mirror
for the data
S1 S2 S3 S4 S1 S2 S3 S4
11. RAID: Level 0+1 (Striping with Mirroring)
Combines the best of RAID 0 and RAID 1, data is striped
across four disks and mirrored to four disks
Four times the throughput (due to striping)
# redundant disks = # of data disks, so reliability doubles the cost
- writes have to be made to both sets of disks, so writes would be
only 1/2 the performance of a RAID 0 with 8 disks
What if one disk fails?
If a disk fails, the system just goes to the “mirror” for the data
sec1 blk3blk2 blk4 blk1 blk2 blk3 blk4
redundant (check) data
sec1,b0 sec1,b2sec1,b1 sec1,b3 sec1,b0 sec1,b2sec1,b1 sec1,b3
12. RAID: Level 2 (Redundancy via ECC)
ECC disks contain the parity of data on a set of distinct
overlapping disks
# redundant disks ~= log (total # of data disks) + 1, so reliability
almost doubles the cost
- writes require computing parity to write to the ECC disks
- reads require reading ECC disks and confirming parity
Can tolerate limited disk failure, since the data can be
reconstructed (similar to memory ECC systems)
sec1,b0 sec1,b2sec1,b1 sec1,b3
Checks
4,5,6,7
Checks
2,3,6,7
Checks
1,3,5,7
3 5 6 7 4 2 1
10 0 0 11
ECC disks (even parity)
ECC disks 4 and 2 point to either data disk 6 or 7,
but ECC disk 1 says disk 7 is okay, so disk 6 must be in error
1
13. RAID: Level 2 (Redundancy via ECC)
ECC disks contain the parity of data on a set of distinct
overlapping disks
# redundant disks ~= log (total # of data disks) + 1, so reliability
almost doubles the cost
- writes require computing parity to write to the ECC disks
- reads require reading ECC disks and confirming parity
Can tolerate limited disk failure, since the data can be
reconstructed (similar to memory ECC systems)
sec1,b0 sec1,b2sec1,b1 sec1,b3
Checks
4,5,6,7
Checks
2,3,6,7
Checks
1,3,5,7
3 5 6 7 4 2 1
10 0 0 11
ECC disks (even parity)
0
ECC disks 4 and 2 point to either data disk 6 or 7,
but ECC disk 1 says disk 7 is okay, so disk 6 must be in error
1
14. RAID: Level 3 (Bit-Interleaved Parity)
Cost of higher availability is reduced to 1/N where N is the
number of disks in a protection group
# redundant disks = 1 × # of protection groups
Reads and writes must access all disks
writing new data to the data disk as well as computing and writing
the parity disk, means reading the other disks, so that the parity
disk can be updated
Can tolerate limited (single) disk failure, since the data can
be reconstructed
reads require reading all the operational data disks as well as the
parity disk to calculate the missing data stored on the failed disk
sec1,b0 sec1,b2sec1,b1 sec1,b3
10 0 1
(odd)
bit parity diskdisk fails
1
15. RAID: Level 4 (Block-Interleaved Parity)
Cost of higher availability still only 1/N but the parity is
stored as blocks associated with sets of data blocks
Four times the throughput (blocks are striped)
# redundant disks = 1 × # of protection groups
Supports “small reads” and “small writes” (reads and writes that go
to just one (or a few) data disk in a protection group, not to all)
- by watching which bits change when writing new information, need only
to change the corresponding bits on the parity disk (read-modify-write)
- the parity disk must be updated on every write, so it is a bottleneck for
back-to-back writes
Can tolerate limited disk failure, since the data can be
reconstructed
block
parity disk
sec1 sec2 sec3 sec4
16. Small Writes
RAID 3 writes
New D1 data
D1 D2 D3 D4 P
D1 D2 D3 D4 P
3 reads and
2 writes
involving all the
disks
RAID 4 small writes
New D1 data
D1 D2 D3 D4 P
D1 D2 D3 D4 P
2 reads and
2 writes
involving just
two disks
17. RAID: Level 5 (Distributed Block-Interleaved
Parity)
Cost of higher availability still only 1/N but the parity block
can be located on any of the disks so there is no single
bottleneck for writes
Still four times the throughput (block striping)
# redundant disks = 1 × # of protection groups
Supports “small reads” and “small writes” (reads and writes that
go to just one (or a few) data disk in a protection group, not to all)
Allows multiple simultaneous writes as long as the
accompanying parity blocks are not located on the same disk
Can tolerate limited disk failure, since the data can be
reconstructed
one of these assigned as the block parity disk
18. Distributing Parity Blocks
By distributing parity blocks to all disks, some small
writes can be performed in parallel
1 2 3 4 P0
5 6 7 8 P1
9 10 11 12 P2
13 14 15 16 P3
RAID 4 RAID 5
1 2 3 4 P0
5 6 7 P1 8
9 10 P2 11 12
13 P3 14 15 16
Time
Canbedoneinparallel
19. Summary
Four components of disk access time:
Seek Time: advertised to be 3 to 14 ms but lower in real systems
Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at 15000
RPM
Transfer Time: 30 to 80 MB/s
Controller Time: typically less than .2 ms
RAIDS can be used to improve availability
RAID 1 and RAID 5 – widely used in servers, one estimate is that
80% of disks in servers are RAIDs
RAID 0+1 (mirroring) – EMC, HP/Tandem, IBM
RAID 3 – Storage Concepts
RAID 4 – Network Appliance
RAIDS have enough redundancy to allow continuous
operation, but hot swapping (replacement while system is
running) is challenging