Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1815695.1815697acmotherconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Using machine learning techniques to enhance the performance of an automatic backup and recovery system

Published: 24 May 2010 Publication History

Abstract

A typical disaster recovery system will have mirrored storage at a site that is geographically separate from the main operational site. In many cases, communication between the local site and the backup repository site is performed over a network which is inherently slow, such as a WAN, or is highly strained, for example due to a whole-site disaster recovery operation.
The goal of this work is to alleviate the performance impact of the network in such a scenario, and to do so using machine learning techniques. We focus on two main areas, prefetching and read-ahead size determination. In both cases we significantly improve the performance of the system.
Our main contributions are as follows: We introduce a theoretical model of the system and the problem we are trying to solve and bound the gain from prefetching techniques. We construct two frequent pattern mining algorithms and use them for prefetching. A framework for controlling and combining multiple prefetch algorithms is presented as well. These algorithms, as well as various simple prefetch algorithms, are compared on a simulation environment. We introduce a novel algorithm for determining the amount of read ahead on such a system that is based on intuition from online competitive analysis and on regression techniques. The significant positive impact of this algorithm is demonstrated on IBM's FastBack system.
Much of our improvements have been applied with little or no modification of the current implementation's internals. We therefore feel confident in stating that the techniques are general and are likely to have applications elsewhere.

References

[1]
WU Fengguang, XI Hongsheng, and XU Chenfeng. On the design of a new linux readahead framework. SIGOPS Oper. Syst. Rev., 42(5):75--84, 2008.
[2]
Carsten Gerlhof, Carsten A. Gerlhof, and Alfons Kemper. A multi-threaded architecture for prefetching in object bases. In In Proc. of the Int. Conf. on Extending Database Technology, pages 351--364. Springer-Verlag, 1994.
[3]
Carsten A. Gerlhof and Alfons Kemper. Prefetch support relations in object bases. In In Proc. of the Sixth Int. Workshop on Persistent Object Systems, pages 115--126. Springer and British Computer Society, 1994.
[4]
Binny S. Gill, Luis Angel, and D. Bathen. Amp: Adaptive multi-stream prefetching in a shared cache. In In Proceedings of the Fifth USENIX Symposium on File and Storage Technologies (FAST 07, pages 185--198, 2007.
[5]
Binny S. Gill and Dharmendra S. Modha. Sarc: Sequential prefetching in adaptive replacement cache. In In Proceedings of USENIX 2005 Annual Technical Conference, page 293308, 2005.
[6]
D. Xin J. Han, H. Cheng and X. Yan. Frequent pattern mining: Current status and future directions. In Data Mining and Knowledge Discovery, 10th Anniversary Issue, pages 55--86, 2007.
[7]
Hui Lei and Dan Duchamp. An analytical approach to file prefetching. In In Proceedings of the USENIX 1997 Annual Technical Conference, pages 275--288, 1997.
[8]
Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, and Yuanyuan Zhou. C-miner: Mining block correlations in storage systems. In In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST 04, pages 173--186, 2004.
[9]
Shuang Liang, Song Jiang, and Xiaodong Zhang. Step: Sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In ICDCS '07: Proceedings of the 27th International Conference on Distributed Computing Systems, page 64, Washington, DC, USA, 2007. IEEE Computer Society.
[10]
OLTP traces. Available via http://traces.cs.umass.edu/index.php/Storage/Storage.
[11]
Mark Palmer. Fido: A cache that learns to fetch. In In Proceedings of the 17th International Conference on Very Large Data Bases, pages 255--264, 1991.
[12]
R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, and Jim Zelenka. Informed prefetching and caching. In In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pages 79--95. ACM Press, 1995.
[13]
Carl Tait, Hui Lei, and Swamp Acharya. Intelligent file hoarding for mobile computers, 1995.
[14]
A. Inkeri Verkamo. Empirical results on locality in database referencing. SIGMETRICS Perform. Eval. Rev., 13(2):49--58, 1985.
[15]
H. Wedekind and George Zoerntlein. Prefetching in realtime database applications. SIGMOD Rec., 15(2):215--226, 1986.

Cited By

View all
  • (2022)The Performance Optimization for Save and Restore Applications of Switching to New Firmware Using a Clean Install Approach2022 International Interdisciplinary Humanitarian Conference for Sustainability (IIHC)10.1109/IIHC55949.2022.10059982(299-303)Online publication date: 18-Nov-2022
  • (2014)Scalable data analytics platform for enterprise backup management2014 IEEE Network Operations and Management Symposium (NOMS)10.1109/NOMS.2014.6838291(1-7)Online publication date: May-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SYSTOR '10: Proceedings of the 3rd Annual Haifa Experimental Systems Conference
May 2010
211 pages
ISBN:9781605589084
DOI:10.1145/1815695
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. file and storage systems
  2. machine learning
  3. prefetching
  4. readahead
  5. systems

Qualifiers

  • Research-article

Conference

SYSTOR '10

Acceptance Rates

Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)The Performance Optimization for Save and Restore Applications of Switching to New Firmware Using a Clean Install Approach2022 International Interdisciplinary Humanitarian Conference for Sustainability (IIHC)10.1109/IIHC55949.2022.10059982(299-303)Online publication date: 18-Nov-2022
  • (2014)Scalable data analytics platform for enterprise backup management2014 IEEE Network Operations and Management Symposium (NOMS)10.1109/NOMS.2014.6838291(1-7)Online publication date: May-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media