Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2901318.2901319acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Public Access

Flint: batch-interactive data-intensive processing on transient servers

Published: 18 April 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. The low price of transient servers is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly important for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where revocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by < 2%.

    References

    [1]
    PiCloud. http://www.multyvac.com, May 1st 2014.
    [2]
    Amazon Elastic Map Reduce for Spark. https://aws.amazon.com/elasticmapreduce/details/spark/, June 2015.
    [3]
    Livejournal Social Network Dataset. https://snap.stanford.edu/data/soc-LiveJournal1.html, June 2015.
    [4]
    Transaction Processing Performance Council - Benchmark H. http://www.tpc.org/tpch/, June 2015.
    [5]
    Hadoop Recovery. https://twiki.grid.iu.edu/bin/view/Storage/HadoopRecovery, March 2016.
    [6]
    M. Armbrust, T. Das, A. Davidson, A. Ghodsi, A. Or, J. Rosen, I. Stoica, P. Wendell, R. Xin, and M. Zaharia. Scaling Spark in the real world: performance and usability. VLDB, 8(12):1840--1843, 2015.
    [7]
    M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. In SIGMOD, 2015.
    [8]
    O. Ben-Yehuda, M. Ben-Yehuda, A. Schuster, and D. Tsafrir. Deconstructing Amazon EC2 Spot Instance Pricing. In CloudCom, November 2011.
    [9]
    C. Binnig, A. Salama, E. Zamanian, M. El-Hindi, S. Feil, and T. Ziegler. Spotgres-Parallel Data Analytics on Spot Instances. In ICDEW, 2015.
    [10]
    K. M. Chandy and L. Lamport. Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems (TOCS), 3(1), 1985.
    [11]
    N. Chohan, C. Castillo, M. Spreitzer, M. Steinder, A. Tantawi, and C. Krintz. See Spot Run: Using Spot Instances for MapReduce Workflows. In HotCloud, June 2010.
    [12]
    J. T. Daly. A Higher Order Estimate of the Optimum Checkpoint Interval for Restart Dumps. Future Generation Computer Systems, 22(3), 2006.
    [13]
    J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, December 2004.
    [14]
    F. Dinu and T. Ng. Understanding the Effects and Implications of Compute Node Related Failures in Hadoop. In HPDC, June 2012.
    [15]
    F. Faghri, S. Bazarbayev, M. Overholt, R. Farivar, R. H. Campbell, and W. H. Sanders. Failure Scenario as a Service (FSaaS) for Hadoop Clusters. In Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management, 2012.
    [16]
    N. Jain, I. Menache, and O. Shamir. On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud. In ICAC, June 2014.
    [17]
    B. Javadi, R. Thulasiram, and R. Buyya. Statistical Modeling of Spot Instance Prices in Public Cloud Environments. In UCC, December 2011.
    [18]
    S. Khatua and N. Mukherjee. Application-centric Resource Provisioning for Amazon EC2 Spot Instances. In EuroPar, August 2013.
    [19]
    H. Liu. Cutting MapReduce Cost with Spot Market. In HotCloud, June 2011.
    [20]
    D. Meisner, C. Sadler, L. Barroso, W. Weber, and T. Wenisch. Power Management for Online Data-Intensive Services. In ISCA, June 2011.
    [21]
    X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, et al. MLlib: Machine Learning in Apache Spark. arXiv preprint arXiv:1505.06807, 2015.
    [22]
    D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A Timely Dataflow System. In SOSP, October 2013.
    [23]
    M. Pundir, L. M. Leslie, I. Gupta, and R. H. Campbell. Zorro: Zero-cost Reactive Failure Recovery in Distributed Graph Processing. In SOCC, August 2015.
    [24]
    A. Salama, C. Binnig, T. Kraska, and E. Zamanian. Cost-based Fault-tolerance for Parallel Data Processing. In SIGMOD, 2015.
    [25]
    P. Sharma, D. Irwin, and P. Shenoy. How Not to Bid the Cloud. University of Massachusetts Technical Report UMCS-2016-002, 2016.
    [26]
    P. Sharma, S. Lee, T. Guo, D. Irwin, and P. Shenoy. SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market. In EuroSys, April 2015.
    [27]
    K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In MSST, May 2010.
    [28]
    R. Singh, P. Sharma, D. Irwin, P. Shenoy, and K. Ramakrishnan. Here Today, Gone Tomorrow: Exploiting Transient Servers in Data Centers. IEEE Internet Computing, 18(4), July/August 2014.
    [29]
    Y. Song, M. Zafer, and K. Lee. Optimal Bidding in Spot Instance Market. In Infocom, March 2012.
    [30]
    S. Subramanya, T. Guo, P. Sharma, D. Irwin, and P. Shenoy. SpotOn: A Batch Computing Service for the Spot Market. In SOCC, August 2015.
    [31]
    S. Tang, J. Yuan, and X. Li. Towards Optimal Bidding Strategy for Amazon EC2 Cloud Spot Instance. In IEEE CLOUD, June 2012.
    [32]
    W. Voorsluys and R. Buyya. Reliable Provisioning of Spot Instances for Compute-Intensive Applications. In AINA, March 2012.
    [33]
    S. Wee. Debunking Real-Time Pricing in Cloud Computing. In CCGrid, May 2011.
    [34]
    R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A Resilient Distributed Graph System on Spark. In First International Workshop on Graph Data Management Experiences and Systems. ACM, 2013.
    [35]
    S. Yi, D. Kondo, and A. Andrzejak. Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud. In IEEE CLOUD, July 2010.
    [36]
    M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, April 2012.
    [37]
    M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In SOSP, 2013.
    [38]
    S. Zaman and D. Grosu. Efficient Bidding for Virtual Machine Instances in Clouds. In IEEE CLOUD, July 2011.
    [39]
    Q. Zhang, E. Gürses, R. Boutaba, and J. Xiao. Dynamic Resource Allocation for Spot Markets in Clouds. In Hot-ICE, March 2011.
    [40]
    L. Zheng, C. Joe-Wong, C. W. Tan, M. Chiang, and X. Wang. How to Bid the Cloud. In SIGCOMM, August 2015.

    Cited By

    View all
    • (2024)LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain DemandProceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems10.1145/3632775.3661942(27-45)Online publication date: 4-Jun-2024
    • (2024)Making Cloud Spot Instance Interruption Events VisibleProceedings of the ACM on Web Conference 202410.1145/3589334.3645548(2998-3009)Online publication date: 13-May-2024
    • (2023)DOLL: Distributed OnLine Learning Using Preemptible Cloud Instances2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)10.23919/WiOpt58741.2023.10349831(175-182)Online publication date: 24-Aug-2023
    • Show More Cited By
    1. Flint: batch-interactive data-intensive processing on transient servers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems
      April 2016
      605 pages
      ISBN:9781450342407
      DOI:10.1145/2901318
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 April 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      EuroSys '16
      EuroSys '16: Eleventh EuroSys Conference 2016
      April 18 - 21, 2016
      London, United Kingdom

      Acceptance Rates

      EuroSys '16 Paper Acceptance Rate 38 of 180 submissions, 21%;
      Overall Acceptance Rate 241 of 1,308 submissions, 18%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)183
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain DemandProceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems10.1145/3632775.3661942(27-45)Online publication date: 4-Jun-2024
      • (2024)Making Cloud Spot Instance Interruption Events VisibleProceedings of the ACM on Web Conference 202410.1145/3589334.3645548(2998-3009)Online publication date: 13-May-2024
      • (2023)DOLL: Distributed OnLine Learning Using Preemptible Cloud Instances2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)10.23919/WiOpt58741.2023.10349831(175-182)Online publication date: 24-Aug-2023
      • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
      • (2023)The War of the Efficiencies: Understanding the Tension between Carbon and Energy OptimizationProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605709(1-7)Online publication date: 9-Jul-2023
      • (2023)Towards Proactive Risk-Aware Cloud Cost Optimization Leveraging Transient ResourcesIEEE Transactions on Services Computing10.1109/TSC.2023.325347316:4(3014-3026)Online publication date: 1-Jul-2023
      • (2023)Adaptive Fragment-Based Parallel State Recovery for Stream Processing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325199734:8(2464-2478)Online publication date: Aug-2023
      • (2022)SciSpot: Scientific Computing On Temporally Constrained Cloud Preemptible VMsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315727233:12(3575-3588)Online publication date: 1-Dec-2022
      • (2022)Elastic Deep Learning in Multi-Tenant GPU ClustersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306496633:1(144-158)Online publication date: 1-Jan-2022
      • (2022)SpotLake: Diverse Spot Instance Dataset Archive Service2022 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC55918.2022.00029(242-255)Online publication date: Dec-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media