Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
STORAGE
MEMORY
Review: Major Components of a Computer
Processor
Control
Datapath
Memory
Devices
Input
OutputCache
Main
Memory
Secondary
Memory
(Disk)
DISK MEMORY
Purpose:
 long term, non volatile storage
 Lowest level in memory
- Slow ,large, in-expensive
General structure:
 A rotting platter coated with a magnetic surface
 A moveable read/write head to access the information on disk.
Typical numbers:
 1 to 4 platters (each with recordable surfaces) per disk of 1” to
 3.5” in diameter
 Rotational speed of 5400 to15000 RPM
 10,000 to 5000 tracks per surface
CYLINDER: all the tracks under the head at a given point to all surfaces 100 to 500
sectors per track.
Sector: The smallest unit that can read/ write.
Sector
Track
Magnetic Disk Characteristic
 Disk read/write components
1. Seek time: position the head over the
proper track (3 to 13 ms avg)
- due to locality of disk references
the actual average seek time may
be only 25% to 33% of the
advertised number
2. Rotational latency: wait for the desired sector to rotate under
the head (½ of 1 rotation/RPM, converted to ms)
- 0.5/5400RPM = 5.6 ms to 0.5/15000RPM = 2.0 ms
3. Transfer time: transfer a block of bits (one or more sectors)
under the head to the disk controller’s cache (70 to 125 MB/s
are typical disk transfer rates in 2008)
- disk controller’s cache takes advantage of spatial locality in disk
accesses. Cache transfer rates are much faster (e.g.375 MB/s)
4. Controller time: the overhead the disk controller imposes in
performing a disk I/O access (typically <0 .2 ms)
Sector
Track
Cylinder
Head
Platter
Controller
+
Cache
INTERFACES STANDARDS
 Higher level disk interfaces have a microprocessor disk
controller that can lead to the performance optimization.
ATA ( Advance Technology Attachment): an interface standard
for the connection of storage devices, such as hard disk and solid state
devices and CD ROM devices. Parallel ATA has been largely replaced by
Serial ATA, widely used in PC’s.
SCSI ( Small Computer System Interface): A set of standards
(commands, protocols and electrical and optical interfaces) for
physically connecting and transferring data b/w computers, and
peripheral devices most commonly used for hard disk and tape
devices.
 In particular, disk controller have SRAM disk caches which support fast
accesses to data that was recently read and often also include prefetch
algorithms to try to participate demands.
FLASH MEMORY
Flash Memory:
 Is first credible challenger to disks. It is semiconductor
memory that is non-volatile like disks but has latency
100 to 1000 times faster than disks and is smaller, more
power efficient, and more shock resistance.
In 2008, the price of flash memory was $4 to $10 per
GB and about 2 to 10 times higher than disk read 5 to
10 times lower than the DRAM
DEPENDABILITY , RELIABILITY AND AVAILABILITY
Reliability: measured by the mean time to failure (MTTF)
 Service Interruption is measured by mean time to repair (MTTR)
 Availability: a measure of service accomplishment ( a percent of time
the system is running)
Availability: MTTF / (MTTF + MTTR)
To increase MTTF either improve the quality of the components or
design the system to continue operating in the presence of faulty
components.
Fault avoidance : preventing fault occurrence by construction
Fault tolerance: using redundancy to correct or bypass faulty
components.
 Fault detection versus Fault correction
 Permanent fault versus Transient fault
RAID : DISK ARRAY
 Arrays of small and in expensive disk.
 Increase potential throughput by having many disk devices.
 Data is spread over multiple disk
 Multiple accesses are made to several disks at a time.
 Reliability: is lower than a single disk.
 But availability can be improved by adding redundant disks (RAID)
 Lost information can be reconstructed from redundant information
 MTTR: mean time to repair is in the order of hours.
 MTTF: mean time to failure of disk in terms of years.
RAID LEVEL 0 NO DEPENDENCY BUT STRIPPING
SEC1 SEC2 SEC3 SEC4
 Multiple smaller disks as opposed to one big disk.
- spending the sector one multiple disks-striping-means
that multiple blocks can be accessed in parallel increasing
the performance(i.e. throughput)
- this 4 disk system gives four times the throughput of 1
disk system
 Same cost as one big disk—assuming a small disks cost the same as big disk
 No redundancy; so what? If one disk fails.
1. Failure of one more disk is more likely as the number of disks in the system increases
2. But this R0 is good for performance because we are doing it by “stripping”. It distribute the data
over the disk.
Sec1.b0 Sec1.b1 Sec1.b2 Sec.b3
RAID LEVEL 1 REDUNDANCY VIA MIRRORING
 Users twice as many disks as RAID 0 ( 8 smaller
disks with the second set of 4 duplicating the first
set) so there are always copies of data.
 # redundant disks=> # of data so reliability doubles the cost (reliable and most expensive)
 - writes have to be made to both set of disks,
so write would only have ½ the performance
of a RAID0 with 8 disks
 What if 1 disk fails?
If a disk fails, the system just goes to the mirror
for the data
S1 S2 S3 S4 S1 S2 S3 S4
RAID: Level 0+1 (Striping with Mirroring)
 Combines the best of RAID 0 and RAID 1, data is striped
across four disks and mirrored to four disks
 Four times the throughput (due to striping)
 # redundant disks = # of data disks, so reliability doubles the cost
- writes have to be made to both sets of disks, so writes would be
only 1/2 the performance of a RAID 0 with 8 disks
 What if one disk fails?
 If a disk fails, the system just goes to the “mirror” for the data
sec1 blk3blk2 blk4 blk1 blk2 blk3 blk4
redundant (check) data
sec1,b0 sec1,b2sec1,b1 sec1,b3 sec1,b0 sec1,b2sec1,b1 sec1,b3
RAID: Level 2 (Redundancy via ECC)
 ECC disks contain the parity of data on a set of distinct
overlapping disks
 # redundant disks ~= log (total # of data disks) + 1, so reliability
almost doubles the cost
- writes require computing parity to write to the ECC disks
- reads require reading ECC disks and confirming parity
 Can tolerate limited disk failure, since the data can be
reconstructed (similar to memory ECC systems)
sec1,b0 sec1,b2sec1,b1 sec1,b3
Checks
4,5,6,7
Checks
2,3,6,7
Checks
1,3,5,7
3 5 6 7 4 2 1
10 0 0 11
ECC disks (even parity)
ECC disks 4 and 2 point to either data disk 6 or 7,
but ECC disk 1 says disk 7 is okay, so disk 6 must be in error
1
RAID: Level 2 (Redundancy via ECC)
 ECC disks contain the parity of data on a set of distinct
overlapping disks
 # redundant disks ~= log (total # of data disks) + 1, so reliability
almost doubles the cost
- writes require computing parity to write to the ECC disks
- reads require reading ECC disks and confirming parity
 Can tolerate limited disk failure, since the data can be
reconstructed (similar to memory ECC systems)
sec1,b0 sec1,b2sec1,b1 sec1,b3
Checks
4,5,6,7
Checks
2,3,6,7
Checks
1,3,5,7
3 5 6 7 4 2 1
10 0 0 11
ECC disks (even parity)
0
ECC disks 4 and 2 point to either data disk 6 or 7,
but ECC disk 1 says disk 7 is okay, so disk 6 must be in error
1
RAID: Level 3 (Bit-Interleaved Parity)
 Cost of higher availability is reduced to 1/N where N is the
number of disks in a protection group
 # redundant disks = 1 × # of protection groups
 Reads and writes must access all disks
 writing new data to the data disk as well as computing and writing
the parity disk, means reading the other disks, so that the parity
disk can be updated
 Can tolerate limited (single) disk failure, since the data can
be reconstructed
 reads require reading all the operational data disks as well as the
parity disk to calculate the missing data stored on the failed disk
sec1,b0 sec1,b2sec1,b1 sec1,b3
10 0 1
(odd)
bit parity diskdisk fails
1

RAID: Level 4 (Block-Interleaved Parity)
 Cost of higher availability still only 1/N but the parity is
stored as blocks associated with sets of data blocks
 Four times the throughput (blocks are striped)
 # redundant disks = 1 × # of protection groups
 Supports “small reads” and “small writes” (reads and writes that go
to just one (or a few) data disk in a protection group, not to all)
- by watching which bits change when writing new information, need only
to change the corresponding bits on the parity disk (read-modify-write)
- the parity disk must be updated on every write, so it is a bottleneck for
back-to-back writes
 Can tolerate limited disk failure, since the data can be
reconstructed
block
parity disk
sec1 sec2 sec3 sec4
Small Writes
 RAID 3 writes
New D1 data
D1 D2 D3 D4 P
D1 D2 D3 D4 P

3 reads and
2 writes
involving all the
disks
 RAID 4 small writes
New D1 data
D1 D2 D3 D4 P
D1 D2 D3 D4 P
2 reads and
2 writes
involving just
two disks


RAID: Level 5 (Distributed Block-Interleaved
Parity)
 Cost of higher availability still only 1/N but the parity block
can be located on any of the disks so there is no single
bottleneck for writes
 Still four times the throughput (block striping)
 # redundant disks = 1 × # of protection groups
 Supports “small reads” and “small writes” (reads and writes that
go to just one (or a few) data disk in a protection group, not to all)
 Allows multiple simultaneous writes as long as the
accompanying parity blocks are not located on the same disk
 Can tolerate limited disk failure, since the data can be
reconstructed
one of these assigned as the block parity disk
Distributing Parity Blocks
 By distributing parity blocks to all disks, some small
writes can be performed in parallel
1 2 3 4 P0
5 6 7 8 P1
9 10 11 12 P2
13 14 15 16 P3
RAID 4 RAID 5
1 2 3 4 P0
5 6 7 P1 8
9 10 P2 11 12
13 P3 14 15 16
Time
Canbedoneinparallel
Summary
 Four components of disk access time:
 Seek Time: advertised to be 3 to 14 ms but lower in real systems
 Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at 15000
RPM
 Transfer Time: 30 to 80 MB/s
 Controller Time: typically less than .2 ms
 RAIDS can be used to improve availability
 RAID 1 and RAID 5 – widely used in servers, one estimate is that
80% of disks in servers are RAIDs
 RAID 0+1 (mirroring) – EMC, HP/Tandem, IBM
 RAID 3 – Storage Concepts
 RAID 4 – Network Appliance
 RAIDS have enough redundancy to allow continuous
operation, but hot swapping (replacement while system is
running) is challenging

More Related Content

Storage memory

  • 2. Review: Major Components of a Computer Processor Control Datapath Memory Devices Input OutputCache Main Memory Secondary Memory (Disk)
  • 3. DISK MEMORY Purpose:  long term, non volatile storage  Lowest level in memory - Slow ,large, in-expensive General structure:  A rotting platter coated with a magnetic surface  A moveable read/write head to access the information on disk. Typical numbers:  1 to 4 platters (each with recordable surfaces) per disk of 1” to  3.5” in diameter  Rotational speed of 5400 to15000 RPM  10,000 to 5000 tracks per surface CYLINDER: all the tracks under the head at a given point to all surfaces 100 to 500 sectors per track. Sector: The smallest unit that can read/ write. Sector Track
  • 4. Magnetic Disk Characteristic  Disk read/write components 1. Seek time: position the head over the proper track (3 to 13 ms avg) - due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number 2. Rotational latency: wait for the desired sector to rotate under the head (½ of 1 rotation/RPM, converted to ms) - 0.5/5400RPM = 5.6 ms to 0.5/15000RPM = 2.0 ms 3. Transfer time: transfer a block of bits (one or more sectors) under the head to the disk controller’s cache (70 to 125 MB/s are typical disk transfer rates in 2008) - disk controller’s cache takes advantage of spatial locality in disk accesses. Cache transfer rates are much faster (e.g.375 MB/s) 4. Controller time: the overhead the disk controller imposes in performing a disk I/O access (typically <0 .2 ms) Sector Track Cylinder Head Platter Controller + Cache
  • 5. INTERFACES STANDARDS  Higher level disk interfaces have a microprocessor disk controller that can lead to the performance optimization. ATA ( Advance Technology Attachment): an interface standard for the connection of storage devices, such as hard disk and solid state devices and CD ROM devices. Parallel ATA has been largely replaced by Serial ATA, widely used in PC’s. SCSI ( Small Computer System Interface): A set of standards (commands, protocols and electrical and optical interfaces) for physically connecting and transferring data b/w computers, and peripheral devices most commonly used for hard disk and tape devices.  In particular, disk controller have SRAM disk caches which support fast accesses to data that was recently read and often also include prefetch algorithms to try to participate demands.
  • 6. FLASH MEMORY Flash Memory:  Is first credible challenger to disks. It is semiconductor memory that is non-volatile like disks but has latency 100 to 1000 times faster than disks and is smaller, more power efficient, and more shock resistance. In 2008, the price of flash memory was $4 to $10 per GB and about 2 to 10 times higher than disk read 5 to 10 times lower than the DRAM
  • 7. DEPENDABILITY , RELIABILITY AND AVAILABILITY Reliability: measured by the mean time to failure (MTTF)  Service Interruption is measured by mean time to repair (MTTR)  Availability: a measure of service accomplishment ( a percent of time the system is running) Availability: MTTF / (MTTF + MTTR) To increase MTTF either improve the quality of the components or design the system to continue operating in the presence of faulty components. Fault avoidance : preventing fault occurrence by construction Fault tolerance: using redundancy to correct or bypass faulty components.  Fault detection versus Fault correction  Permanent fault versus Transient fault
  • 8. RAID : DISK ARRAY  Arrays of small and in expensive disk.  Increase potential throughput by having many disk devices.  Data is spread over multiple disk  Multiple accesses are made to several disks at a time.  Reliability: is lower than a single disk.  But availability can be improved by adding redundant disks (RAID)  Lost information can be reconstructed from redundant information  MTTR: mean time to repair is in the order of hours.  MTTF: mean time to failure of disk in terms of years.
  • 9. RAID LEVEL 0 NO DEPENDENCY BUT STRIPPING SEC1 SEC2 SEC3 SEC4  Multiple smaller disks as opposed to one big disk. - spending the sector one multiple disks-striping-means that multiple blocks can be accessed in parallel increasing the performance(i.e. throughput) - this 4 disk system gives four times the throughput of 1 disk system  Same cost as one big disk—assuming a small disks cost the same as big disk  No redundancy; so what? If one disk fails. 1. Failure of one more disk is more likely as the number of disks in the system increases 2. But this R0 is good for performance because we are doing it by “stripping”. It distribute the data over the disk. Sec1.b0 Sec1.b1 Sec1.b2 Sec.b3
  • 10. RAID LEVEL 1 REDUNDANCY VIA MIRRORING  Users twice as many disks as RAID 0 ( 8 smaller disks with the second set of 4 duplicating the first set) so there are always copies of data.  # redundant disks=> # of data so reliability doubles the cost (reliable and most expensive)  - writes have to be made to both set of disks, so write would only have ½ the performance of a RAID0 with 8 disks  What if 1 disk fails? If a disk fails, the system just goes to the mirror for the data S1 S2 S3 S4 S1 S2 S3 S4
  • 11. RAID: Level 0+1 (Striping with Mirroring)  Combines the best of RAID 0 and RAID 1, data is striped across four disks and mirrored to four disks  Four times the throughput (due to striping)  # redundant disks = # of data disks, so reliability doubles the cost - writes have to be made to both sets of disks, so writes would be only 1/2 the performance of a RAID 0 with 8 disks  What if one disk fails?  If a disk fails, the system just goes to the “mirror” for the data sec1 blk3blk2 blk4 blk1 blk2 blk3 blk4 redundant (check) data sec1,b0 sec1,b2sec1,b1 sec1,b3 sec1,b0 sec1,b2sec1,b1 sec1,b3
  • 12. RAID: Level 2 (Redundancy via ECC)  ECC disks contain the parity of data on a set of distinct overlapping disks  # redundant disks ~= log (total # of data disks) + 1, so reliability almost doubles the cost - writes require computing parity to write to the ECC disks - reads require reading ECC disks and confirming parity  Can tolerate limited disk failure, since the data can be reconstructed (similar to memory ECC systems) sec1,b0 sec1,b2sec1,b1 sec1,b3 Checks 4,5,6,7 Checks 2,3,6,7 Checks 1,3,5,7 3 5 6 7 4 2 1 10 0 0 11 ECC disks (even parity) ECC disks 4 and 2 point to either data disk 6 or 7, but ECC disk 1 says disk 7 is okay, so disk 6 must be in error 1
  • 13. RAID: Level 2 (Redundancy via ECC)  ECC disks contain the parity of data on a set of distinct overlapping disks  # redundant disks ~= log (total # of data disks) + 1, so reliability almost doubles the cost - writes require computing parity to write to the ECC disks - reads require reading ECC disks and confirming parity  Can tolerate limited disk failure, since the data can be reconstructed (similar to memory ECC systems) sec1,b0 sec1,b2sec1,b1 sec1,b3 Checks 4,5,6,7 Checks 2,3,6,7 Checks 1,3,5,7 3 5 6 7 4 2 1 10 0 0 11 ECC disks (even parity) 0 ECC disks 4 and 2 point to either data disk 6 or 7, but ECC disk 1 says disk 7 is okay, so disk 6 must be in error 1
  • 14. RAID: Level 3 (Bit-Interleaved Parity)  Cost of higher availability is reduced to 1/N where N is the number of disks in a protection group  # redundant disks = 1 × # of protection groups  Reads and writes must access all disks  writing new data to the data disk as well as computing and writing the parity disk, means reading the other disks, so that the parity disk can be updated  Can tolerate limited (single) disk failure, since the data can be reconstructed  reads require reading all the operational data disks as well as the parity disk to calculate the missing data stored on the failed disk sec1,b0 sec1,b2sec1,b1 sec1,b3 10 0 1 (odd) bit parity diskdisk fails 1 
  • 15. RAID: Level 4 (Block-Interleaved Parity)  Cost of higher availability still only 1/N but the parity is stored as blocks associated with sets of data blocks  Four times the throughput (blocks are striped)  # redundant disks = 1 × # of protection groups  Supports “small reads” and “small writes” (reads and writes that go to just one (or a few) data disk in a protection group, not to all) - by watching which bits change when writing new information, need only to change the corresponding bits on the parity disk (read-modify-write) - the parity disk must be updated on every write, so it is a bottleneck for back-to-back writes  Can tolerate limited disk failure, since the data can be reconstructed block parity disk sec1 sec2 sec3 sec4
  • 16. Small Writes  RAID 3 writes New D1 data D1 D2 D3 D4 P D1 D2 D3 D4 P  3 reads and 2 writes involving all the disks  RAID 4 small writes New D1 data D1 D2 D3 D4 P D1 D2 D3 D4 P 2 reads and 2 writes involving just two disks  
  • 17. RAID: Level 5 (Distributed Block-Interleaved Parity)  Cost of higher availability still only 1/N but the parity block can be located on any of the disks so there is no single bottleneck for writes  Still four times the throughput (block striping)  # redundant disks = 1 × # of protection groups  Supports “small reads” and “small writes” (reads and writes that go to just one (or a few) data disk in a protection group, not to all)  Allows multiple simultaneous writes as long as the accompanying parity blocks are not located on the same disk  Can tolerate limited disk failure, since the data can be reconstructed one of these assigned as the block parity disk
  • 18. Distributing Parity Blocks  By distributing parity blocks to all disks, some small writes can be performed in parallel 1 2 3 4 P0 5 6 7 8 P1 9 10 11 12 P2 13 14 15 16 P3 RAID 4 RAID 5 1 2 3 4 P0 5 6 7 P1 8 9 10 P2 11 12 13 P3 14 15 16 Time Canbedoneinparallel
  • 19. Summary  Four components of disk access time:  Seek Time: advertised to be 3 to 14 ms but lower in real systems  Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at 15000 RPM  Transfer Time: 30 to 80 MB/s  Controller Time: typically less than .2 ms  RAIDS can be used to improve availability  RAID 1 and RAID 5 – widely used in servers, one estimate is that 80% of disks in servers are RAIDs  RAID 0+1 (mirroring) – EMC, HP/Tandem, IBM  RAID 3 – Storage Concepts  RAID 4 – Network Appliance  RAIDS have enough redundancy to allow continuous operation, but hot swapping (replacement while system is running) is challenging