Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3357390.3361019acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Asynchronous snapshots of actor systems for latency-sensitive applications

Published: 21 October 2019 Publication History
  • Get Citation Alerts
  • Abstract

    The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging purposes. A key issue is that snapshotting blocks all other operations. In modern latency-sensitive applications, stopping the application to persist its state needs to be avoided, because users may not tolerate the increased request latency. In order to minimize the impact of snapshotting on request latency, our approach persists the application’s state asynchronously by capturing partial heaps, completing snapshots step by step. Additionally, our solution is transparent and supports arbitrary object graphs. We prototyped our snapshotting approach on top of the Truffle/Graal platform and evaluated it with the Savina benchmarks and the Acme Air microservice application. When performing a snapshot every thousand Acme Air requests, the number of slow requests ( 0.007% of all requests) with latency above 100ms increases by 5.43%. Our Savina microbenchmark results detail how different utilization patterns impact snapshotting cost. To the best of our knowledge, this is the first system that enables asynchronous snapshotting of actor applications, i.e. without stop-the-world synchronization, and thereby minimizes the impact on latency. We thus believe it enables new deployment and debugging options for actor systems.

    References

    [1]
    Ioannis Arapakis, Xiao Bai, and B Barla Cambazoglu. 2014. Impact of response latency on user behavior in web search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 103–112.
    [2]
    Joe Armstrong, Robert Virding, Claes Wikstrom, and Mike Williams. 1996. Concurrent Programming in Erlang (2 ed.). Prentice Hall PTR.
    [3]
    Dominik Aumayr, Stefan Marr, Clément Béra, Elisa Gonzalez Boix, and Hanspeter Mössenböck. 2018. Efficient and Deterministic Record & Replay for Actor Languages. In Proceedings of the 15th International Conference on Managed Languages & Runtimes (ManLang ’18). ACM, Article 15, 14 pages.
    [4]
    Earl T. Barr and Mark Marron. 2014. Tardis: Affordable Time-travel Debugging in Managed Runtimes. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 67–82.
    [5]
    Earl T Barr, Mark Marron, Ed Maurer, Dan Moseley, and Gaurav Seth. 2016. Time-travel debugging for JavaScript/Node.js. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, 1003–1007.
    [6]
    Edd Barrett, Carl Friedrich Bolz-Tereick, Rebecca Killick, Sarah Mount, and Laurence Tratt. 2017. Virtual Machine Warmup Blows Hot and Cold. Proc. ACM Program. Lang. 1, OOPSLA, Article 52 (Oct. 2017), 27 pages.
    [7]
    Gilad Bracha, Peter von der Ahé, Vassili Bykov, Yaron Kashai, William Maddox, and Eliot Miranda. 2010. Modules as Objects in Newspeak. In ECOOP 2010 – Object-Oriented Programming (Lecture Notes in Computer Science), Vol. 6183. Springer, 405–428.
    [8]
    Darius Buntinas, Camille Coti, Thomas Herault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, and Franck Cappello. 2008. Blocking vs. non-blocking coordinated checkpointing for largescale fault tolerant MPI Protocols. Future Generation Computer Systems 24, 1 (2008), 73 – 84.
    [9]
    Sergey Bykov, Alan Geller, Gabriel Kliot, James R. Larus, Ravi Pandya, and Jorgen Thelin. 2011. Orleans: Cloud Computing for Everyone. In Proceedings of the 2Nd ACM Symposium on Cloud Computing (SOCC ’11). ACM, Article 16, 14 pages.
    [10]
    Paris Carbone, Gyula Fóra, Stephan Ewen, Seif Haridi, and Kostas Tzoumas. 2015. Lightweight asynchronous snapshots for distributed dataflows. arXiv preprint arXiv:1506.08603 (2015).
    [11]
    K. Mani Chandy and Leslie Lamport. 1985. Distributed Snapshots: Determining Global States of Distributed Systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63–75.
    [12]
    Sylvan Clebsch, Sophia Drossopoulou, Sebastian Blessing, and Andy McNeil. 2015. Deny Capabilities for Safe, Fast Actors. In Proceedings of the 5th International Workshop on Programming Based on Actors, Agents, and Decentralized Control (AGERE! 2015). ACM, New York, NY, USA, 1–12.
    [13]
    Joeri De Koster, Tom Van Cutsem, and Wolfgang De Meuter. 2016. 43 Years of Actors: A Taxonomy of Actor Models and Their Key Properties. In Proceedings of the 6th International Workshop on Programming Based on Actors, Agents, and Decentralized Control (AGERE 2016). ACM, New York, NY, USA, 31–40.
    [14]
    E. N. (Mootaz) Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. 2002. A Survey of Rollback-recovery Protocols in Messagepassing Systems. ACM Comput. Surv. 34, 3 (Sept. 2002), 375–408.
    [15]
    Benjamin Erb, Dominik Meißner, Gerhard Habiger, Jakob Pietron, and Frank Kargl. 2017. Consistent retrospective snapshots in distributed event-sourced systems. In 2017 International Conference on Networked Systems (NetSys). IEEE, 1–8.
    [16]
    Emily Halili. 2008. Apache JMeter. Packt Publishing.
    [17]
    Carl Hewitt, Peter Bishop, and Richard Steiger. 1973. A Universal Modular ACTOR Formalism for Artificial Intelligence. In IJCAI’73: Proceedings of the 3rd International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 235–245.
    [18]
    Shams M. Imam and Vivek Sarkar. 2014. Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries. In Proceedings of the 4th International Workshop on Programming Based on Actors Agents & Decentralized Control (AGERE!’14). ACM, 67–80.
    [19]
    Phillip Kuang, John Field, and Carlos A. Varela. 2014. Fault Tolerant Distributed Computing Using Asynchronous Local Checkpointing. In Proceedings of the 4th International Workshop on Programming based on Actors Agents & Decentralized Control (AGERE! ’14). 81–93.
    [20]
    Nuria Losada, George Bosilca, Aurélien Bouteiller, Patricia González, and María J. Martín. 2019. Local rollback for resilient MPI applications with application-level checkpointing and message logging. Future Generation Computer Systems 91 (2019), 450 – 464.
    [21]
    Stefan Marr, Benoit Daloze, and Hanspeter Mössenböck. 2016. CrossLanguage Compiler Benchmarking—Are We Fast Yet?. In Proceedings of the 12th Symposium on Dynamic Languages (DLS’16). ACM, 120–131.
    [22]
    Mark S. Miller, E. Dean Tribble, and Jonathan Shapiro. 2005. Concurrency Among Strangers: Programming in E As Plan Coordination. In Proceedings of the 1st International Conference on Trustworthy Global Computing (TGC’05). Springer, 195–229.
    [23]
    N. Naksinehaboon, Y. Liu, C. Leangsuksun, R. Nassar, M. Paun, and S. L. Scott. 2008. Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments. In 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). 783–788.
    [24]
    J. S. Plank, and M. A. Puening. 1998. Diskless checkpointing. IEEE Transactions on Parallel and Distributed Systems 9, 10 (Oct 1998), 972– 986.
    [25]
    Dave Thomas. 2014. Programming Elixir: Functional, Concurrent, Pragmatic, Fun (1st ed.). Pragmatic Bookshelf.
    [26]
    Takanori Ueda, Takuya Nakaike, and Moriyoshi Ohara. 2016. Workload Characterization for Microservices. In 2016 IEEE International Symposium on Workload Characterization (IISWC’16). IEEE, 85–94.
    [27]
    Tom Van Cutsem. 2012. AmbientTalk: Modern Actors for Modern Networks. In Proceedings of the 14th Workshop on Formal Techniques for Java-like Programs (FTfJP ’12). ACM, 2–2.
    [28]
    Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas Wöß, Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and Matthias Grimmer. 2017. Practical Partial Evaluation for Highperformance Dynamic Language Runtimes. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’17). ACM, 662–676.

    Cited By

    View all
    • (2020)Scalable and serializable networked multi-actor programmingProceedings of the ACM on Programming Languages10.1145/34282664:OOPSLA(1-30)Online publication date: 13-Nov-2020

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MPLR 2019: Proceedings of the 16th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes
    October 2019
    171 pages
    ISBN:9781450369770
    DOI:10.1145/3357390
    • General Chair:
    • Antony Hosking,
    • Program Chair:
    • Irene Finocchi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Actors
    2. Latency
    3. Micro services
    4. Snapshots

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MPLR '19
    Sponsor:

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Scalable and serializable networked multi-actor programmingProceedings of the ACM on Programming Languages10.1145/34282664:OOPSLA(1-30)Online publication date: 13-Nov-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media