Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Ahmed Amer
    As we move towards data centers at the exascale, the reliability challenges of such enormous storage systems are daunting. We demonstrate how such systems will suffer substantial annual data loss if only traditional reliability mechanisms... more
    As we move towards data centers at the exascale, the reliability challenges of such enormous storage systems are daunting. We demonstrate how such systems will suffer substantial annual data loss if only traditional reliability mechanisms are employed. We argue that the architecture for exascale storage systems should incorporate novel mechanisms at or below the object level to address this problem. Our argument for such a research focus is that focusing solely on the device level will not scale, and in this study we analytically evaluate how rapidly this problem manifests.
    Research Interests:
    Two-dimensional RAID arrays maintain separate row and column parities for all their disks. Depending on their organization, they can tolerate between two and three concurrent disk failures without losing any data. We propose to enhance... more
    Two-dimensional RAID arrays maintain separate row and column parities for all their disks. Depending on their organization, they can tolerate between two and three concurrent disk failures without losing any data. We propose to enhance the robustness of these arrays by replacing a small fraction of these drives with storage class memory devices, and demonstrate how such a pairing is
    Storage class memories (SCMs) constitute an emerging class of non-volatile storage devices that promise to be significantly faster and more reliable than magnetic disks. We propose to add one of these devices to each group of two or three... more
    Storage class memories (SCMs) constitute an emerging class of non-volatile storage devices that promise to be significantly faster and more reliable than magnetic disks. We propose to add one of these devices to each group of two or three RAID level arrays and store on it additional parity data. We show that the new organization can tolerate all double disk failures, between 75 and 90 percent of all triple disk failures and between 50 and 70 percent of all failures involving two disks and the SCM device without incurring any data loss. As a result, the additional parity device increases the mean time to data loss of the arrays in the group it protects by at least 200-fold.
    We present a two-dimensional RAID architecture that is specifically tailored to the needs of archival storage systems. Our proposal starts with a fairly conventional two-dimensional RAID architecture where each disk belongs to exactly one... more
    We present a two-dimensional RAID architecture that is specifically tailored to the needs of archival storage systems. Our proposal starts with a fairly conventional two-dimensional RAID architecture where each disk belongs to exactly one horizontal and one vertical RAID level 4 stripe. Once the array has been populated, we add a superparity device that contains the exclusive OR of all the contents of all horizontal—or vertical—parity disks. The new organization tolerates all triple disk failures and nearly all quadruple and quintuple disk failures. As a result, it provides mean times to data loss (MTTDLs) more than a hundred times better than those of sets of RAID level 6 stripes with equal capacity and similar parity overhead. 1
    We propose the use of parity-based redundant data layouts of increasing reliability as a means to progressively harden data archives. We evaluate the reliability of two such layouts and demonstrate how moving to layouts of higher parity... more
    We propose the use of parity-based redundant data layouts of increasing reliability as a means to progressively harden data archives. We evaluate the reliability of two such layouts and demonstrate how moving to layouts of higher parity degree offers a mechanism to progressively and dramatically increase the reliability of a multi-device data store. Specifically we propose that a data archive can be migrated to progressively more reliable layouts as the data ages, trading limited (and likely unrealized) increases in update costs for increased reliability. Our parity-based schemes are drawn from SSPiRAL (Survivable Storage using Parity in Redundant Array Layouts) that offer capacity efficiency equivalent to a straightforward mirroring arrangement. Our analysis shows our proposed schemes would utilize no additional physical resources and result in improvements to mean time to data loss of four to seven orders of magnitude.
    Research Interests:
    Ultimately the performance and success of a shingled write disk (SWD) will be determined by more than the physical hardware realized , but will depend on the data layouts employed, the workloads experienced, and the architecture of the... more
    Ultimately the performance and success of a shingled write disk (SWD) will be determined by more than the physical hardware realized , but will depend on the data layouts employed, the workloads experienced, and the architecture of the overall system, including the level of interfaces provided by the devices to higher levels of system software. While we discuss several alternative layouts for use with SWD, we also discuss the dramatic implications of observed workloads. Example data access traces demonstrate the surprising stability of written device blocks, with a small fraction requiring multiple updates (the problematic operation for a shingled-write device). Specifically, we discuss how general purpose workloads can show that more than 93% of device blocks can remain unchanged over a day, and that for more specialized workloads less than 0.5% of a shingled-write disk's capacity would be needed to hold randomly updated blocks. We further demonstrate how different approaches to data layout can alternatively improve or reduce the performance of a shingled-write device in comparison to the performance of a traditional non-shingled device.
    The gap between CPU speeds and the speed of the technologies providing the data is increasing. As a result, latency and bandwidth to needed data is limited by the performance of the storage devices and the networks that connect them to... more
    The gap between CPU speeds and the speed of the technologies providing the data is increasing. As a result, latency and bandwidth to needed data is limited by the performance of the storage devices and the networks that connect them to the CPU. Distributed caching techniques are often used to reduce the penalties associated with such caching; however, such techniques need further development to be truly integrated into the network. This paper describes the preliminary design of an adaptive caching scheme using multiple experts, called ACME. ACME is used to manage the replacement policies within distributed caches to further improve the hit rates over static caching techniques. We propose the use of machine learning algorithms to rate and select the current best policies or mixtures of policies via weight updates based on their recent success, allowing each adaptive cache node to tune itself based on the workload it observes. Since no cache databases or synchronization messages are exchanged for adaptivity, the clusters composed of these nodes will be scalable and manageable. We show that static techniques are suboptimal when combined in networks of caches, providing potential for adaptivity to improve performance.