Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2038916.2038933acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PipeCloud: using causality to overcome speed-of-light delays in cloud-based disaster recovery

Published: 26 October 2011 Publication History

Abstract

Disaster Recovery (DR) is a desirable feature for all enterprises, and a crucial one for many. However, adoption of DR remains limited due to the stark tradeoffs it imposes. To recover an application to the point of crash, one is limited by financial considerations, substantial application overhead, or minimal geographical separation between the primary and recovery sites. In this paper, we argue for cloud-based DR and pipelined synchronous replication as an antidote to these problems. Cloud hosting promises economies of scale and on-demand provisioning that are a perfect fit for the infrequent yet urgent needs of DR. Pipelined synchrony addresses the impact of WAN replication latency on performance, by efficiently overlapping replication with application processing for multi-tier servers. By tracking the consequences of the disk modifications that are persisted to a recovery site all the way to client-directed messages, applications realize forward progress while retaining full consistency guarantees for client-visible state in the event of a disaster. PipeCloud, our prototype, is able to sustain these guarantees for multi-node servers composed of black-box VMs, with no need of application modification, resulting in a perfect fit for the arbitrary nature of VM-based cloud hosting. We demonstrate disaster failover to the Amazon EC2 platform, and show that PipeCloud can increase throughput by an order of magnitude and reduce response times by more than half compared to synchronous replication, all while providing the same zero data loss consistency guarantees.

References

[1]
M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proc. of SOSP, 2003.
[2]
AT&T. Synaptic Storage as a Service. http://bit.ly/nt1JS0.
[3]
A. Bessani, M. Correia, B. Quaresma, F. Andre, and P. Sousa. DepSky: Dependable and Secure Storage in a Cloud-of-Clouds. In Proc. of Eurosys, 2011.
[4]
A. D. Birrell, R. Levin, M. D. Schroeder, and R. M. Needham. Grapevine: an Exercise in Distributed Computing. Communications of the ACM, 25(4), 1982.
[5]
N. Bonvin, T. Papaioannou, and K. Aberer. A Self-Organized, Fault-Tolerant and Scalable Replication scheme for Cloud Storage. In Proc. of SOCC, 2010.
[6]
B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: High Availability via Asynchronous Virtual Machine Replication. In Proc. of NSDI, 2008.
[7]
EMC. EMC CLARiiON Storage Solutions. http://bit.ly/oBNNel.
[8]
EMC. Symmetrix Remote Data Facility (SRDF) Product Guide. http://scr.bi/oMVxIA.
[9]
M. Ji, A. Veitch, and J. Wilkes. Seneca: Remote Mirroring Done Write. In Proc. of Usenix ATC, 2003.
[10]
K. Keeton, C. Santos, D. Beyer, J. Chase, and J. Wilkes. Designing for Disasters. In Proc. of FAST, 2004.
[11]
D. C. Knowledge. New York "Donut" Boosts NJ Data Centers. http://bit.1y/1dzdau.
[12]
R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Providing High Availability Using Lazy Replication. ACM TOCS, 10(4), 1992.
[13]
L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Comm. of the ACM, 21(7), 1978.
[14]
Linbit. DR:BD Software Development for High Availability Clusters. http://drbd.org.
[15]
Linux-HA. Heartbeat. http://linux-ha.org/wiki/Heartbeat.
[16]
F. Mattern. Virtual Time and Global States of Distributed Systems. In Parallel and Distributed Algorithms, 1989.
[17]
U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. Remusdb: Transparent high availability for database systems. Proc. of VLDB, 2011.
[18]
E. B. Nightingale, K. Veeraraghavan, P. M. Chen, and J. Flinn. Rethink the Sync. In Proc. of OSDI, 2006.
[19]
H. Patterson, S. Manley, M. Federwisch, D. Hitz, S. Kleiman, and S. Owara. SnapMirror: File System Based Asynchronous Mirroring for Disaster Recovery. In Proc. of FAST, Monterey, CA, Jan. 2002.
[20]
Rightscale. Amazon EC2 Outage: Summary and Lessons Learned. http://bit.1y/mhFvKY.
[21]
M. Shamma, D. T. Meyer, J. Wires, M. Ivanova, N. C. Hutchinson, and A. Warfield. Capo: Recapitulating Storage for Virtual Desktops. In Proc. of FAST, 2011.
[22]
R. E. Strom and S. Yemini. Optimistic recovery in distributed systems. ACM Transactions on Computer Systems, 3:204--226, 1985.
[23]
D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proc. of SOSP, Copper Mountain, CO, Dec. 1995.
[24]
H. Weatherspoon, L. Ganesh, T. Marian, M. Balakrishnan, and K. Birman. Smoke and mirrors: reflecting files at a geographically remote location without loss of performance. In Proc. of the FAST, 2009.
[25]
B. Wester, J. Cowling, E. B. Nightingale, P. M. Chen, J. Flinn, and B. Liskov. Tolerating latency in replicated state machines through client speculation. In Proceedings of NSDI, Berkeley, CA, USA, 2009. USENIX Association.
[26]
T. Wood, E. Cecchet, K. Ramakrishnan, P. Shenoy, and J. Van der Merwe. Disaster Recovery as a Cloud Service: Economic Benefits & Deployment Challenges. In Proc. of HotCloud, Boston, MA, June 2010.
[27]
J. J. Wylie, M. Bakkaloglu, V. Pandurangan, M. W. Bigrigg, S. Oguz, K. Tew, C. Williams, G. R. Ganger, and P. K. Koshla. Selecting the Right Data Distribution Scheme for a Survivable Storage System. Technical report, CMU, 2001. CMU-CS-01-120.

Cited By

View all
  • (2022)Leveraging Scale-Up Machines for Swift DBMS Replication on IaaS Platforms Using BalenaDBIEICE Transactions on Information and Systems10.1587/transinf.2020ZDP7505E105.D:1(92-104)Online publication date: 1-Jan-2022
  • (2022)FVMM: Fast VM Migration for Virtualization-based Fault Tolerance Using Templates2022 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom55334.2022.00012(9-16)Online publication date: Dec-2022
  • (2021)Network Function Virtualization over Cloud-Cloud Computing as Business Continuity SolutionE-Service [Working Title]10.5772/intechopen.97369Online publication date: 14-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
October 2011
377 pages
ISBN:9781450309769
DOI:10.1145/2038916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. disaster recovery
  3. virtualization

Qualifiers

  • Research-article

Funding Sources

Conference

SOCC '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Leveraging Scale-Up Machines for Swift DBMS Replication on IaaS Platforms Using BalenaDBIEICE Transactions on Information and Systems10.1587/transinf.2020ZDP7505E105.D:1(92-104)Online publication date: 1-Jan-2022
  • (2022)FVMM: Fast VM Migration for Virtualization-based Fault Tolerance Using Templates2022 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom55334.2022.00012(9-16)Online publication date: Dec-2022
  • (2021)Network Function Virtualization over Cloud-Cloud Computing as Business Continuity SolutionE-Service [Working Title]10.5772/intechopen.97369Online publication date: 14-Jul-2021
  • (2021)A Comprehensive Overview of Privacy and Data Security for Cloud StorageInternational Journal of Scientific Research in Science, Engineering and Technology10.32628/IJSRSET21852(113-152)Online publication date: 8-Sep-2021
  • (2021)Virtual machine preserving host updates for zero day patching in public cloudProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456232(114-129)Online publication date: 21-Apr-2021
  • (2021)Storage Protection with Connectivity and Processing Restoration for Survivable Cloud Services2021 International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN52240.2021.9522324(1-9)Online publication date: Jul-2021
  • (2021)Blockchain-based Load Regulation Data Storage Technology2021 IEEE 4th International Electrical and Energy Conference (CIEEC)10.1109/CIEEC50170.2021.9510871(1-6)Online publication date: 28-May-2021
  • (2020)Disaster Recovery Layer for Distributed OpenStack DeploymentsIEEE Transactions on Cloud Computing10.1109/TCC.2017.27455608:1(112-123)Online publication date: 1-Jan-2020
  • (2020)Efficient Group Fault Tolerance for Multi-tier Services in Cloud EnvironmentsICC 2020 - 2020 IEEE International Conference on Communications (ICC)10.1109/ICC40277.2020.9149253(1-7)Online publication date: Jun-2020
  • (2018)A comparative study of failover schemes for IaaS recovery2018 International Conference on Information Networking (ICOIN)10.1109/ICOIN.2018.8343078(25-30)Online publication date: Jan-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media