Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2843966.2843972acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
short-paper

Optimal Resource Provisioning Approach based on Cost Modeling for Spark Applications in Public Clouds

Published: 07 December 2015 Publication History

Abstract

Efficient resource provisioning is required when running Spark applications in public clouds. However, how to optimize resource provisioning to minimize the time and/or monetary cost for a specific application remains an intractable problem since resource provisioning may differ from application to application and even be affected by the amount of input data. Existing resource settings heavily rely on random selection or previous deployer experience, frequently leading to low-quality resource provisioning. Therefore, there is an urgent need to propose an approach towards optimal resource provisioning for Spark applications in public clouds. This is a PhD proposal, where an approach based on time and monetary cost modeling is presented for cloud resource provisioning optimization under two typical constrained scenarios. The approach systematically drives resource provisioning for a specific Spark application, which may save a significant amount of time and money, compared to randomly selected settings.

References

[1]
Fan Liang, Chen Feng, Xiaoyi Lu, and Zhiwei Xu. Performance benefits of datampi: a case study with bigdatabench. In Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Lecture notes in computer science, vol. 8807, pages 111--123. Springer, 2014.
[2]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 2--2. USENIX, 2012.
[3]
Michael Armbrust, Tathagata Das, Aaron Davidson, Ali Ghodsi, Andrew Or, Josh Rosen, Ion Stoica, Patrick Wendell, Reynold Xin, and Matei Zaharia. Scaling spark in the real world: performance and usability. Proceedings of the VLDB Endowment, 8(12):1840--1843, 2015.
[4]
Lei Gu and Huan Li. Memory or time: Performance evaluation for iterative operation on hadoop and spark. In 2013 IEEE 10th International Conference on High Performance Computing and Communications & Embedded and Ubiquitous Computing (HPCC_EUC), pages 721--727. IEEE, 2013.
[5]
Norman Spangenberg, Martin Roth, and Bogdan Franczyk. Evaluating new approaches of big data analytics frameworks. In Business Information Systems. Lecture Notes in Business Information Processing, vol. 208, pages 28--37. Springer, 2015.
[6]
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, and VMware ICSI. Making sense of performance in data analytics frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation, pages 293--307. USENIX, 2015.
[7]
Marti J Anderson. A new method for non-parametric multivariate analysis of variance. Austral ecology, 26(1):32--46, 2001.
[8]
Kristopher J Preacher, Patrick J Curran, and Daniel J Bauer. Computational tools for probing interactions in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of educational and behavioral statistics, 31(4):437--448, 2006.
[9]
Berc Rustem and Quoc Nguyen. An algorithm for the inequality-constrained discrete min--max problem. SIAM Journal on Optimization, 8(1):265--283, 1998.
[10]
Andrew R Conn, Nicholas IM Gould, and Philippe Toint. A globally convergent augmented lagrangian algorithm for optimization with general constraints and simple bounds. SIAM Journal on Numerical Analysis, 28(2):545--572, 1991.
[11]
Keke Chen, Jacob Powers, Shumin Guo, and Fengguang Tian. Cresp: Towards optimal resource provisioning for mapreduce computing in public clouds. IEEE Transactions on Parallel and Distributed Systems, 25(6):1403--1412, 2014.
[12]
Virajith Jalaparti, Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Antony Rowstron. Bazaar: Enabling predictable performance in datacenters. Microsoft Res., Tech. Rep. MSR-TR-2012-38, 2012.

Cited By

View all
  • (2024)Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud ServiceProceedings of the VLDB Endowment10.14778/3654621.365462917:7(1618-1627)Online publication date: 1-Mar-2024
  • (2023)Cost-Aware Scheduling and Data Skew Alleviation for Big Data Processing in Heterogeneous Cloud EnvironmentJournal of Grid Computing10.1007/s10723-023-09661-221:3Online publication date: 22-Jun-2023
  • (2021)Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction modelJournal of Systems and Software10.1016/j.jss.2021.111028(111028)Online publication date: Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware Doct Symposium '15: Proceedings of the Doctoral Symposium of the 16th International Middleware Conference
December 2015
42 pages
ISBN:9781450337281
DOI:10.1145/2843966
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cost model
  2. Optimization
  3. Public cloud
  4. Resource provisioning
  5. Spark

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

Conference

Middleware '15
Sponsor:
  • ACM
  • USENIX Assoc
  • IFIP
Middleware '15: 16th International Middleware Conference
December 7 - 11, 2015
BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Upcoming Conference

MIDDLEWARE '24
25th International Middleware Conference
December 2 - 6, 2024
Hong Kong , Hong Kong

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud ServiceProceedings of the VLDB Endowment10.14778/3654621.365462917:7(1618-1627)Online publication date: 1-Mar-2024
  • (2023)Cost-Aware Scheduling and Data Skew Alleviation for Big Data Processing in Heterogeneous Cloud EnvironmentJournal of Grid Computing10.1007/s10723-023-09661-221:3Online publication date: 22-Jun-2023
  • (2021)Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction modelJournal of Systems and Software10.1016/j.jss.2021.111028(111028)Online publication date: Jun-2021
  • (2021)Screening hardware and volume factors in distributed machine learning algorithms on sparkComputing10.1007/s00607-021-00965-3Online publication date: 15-Jun-2021
  • (2020)Efficient performance prediction for Apache SparkJournal of Parallel and Distributed Computing10.1016/j.jpdc.2020.10.010Online publication date: Nov-2020
  • (2017)Elastic Resource Provisioning for Batched Stream Processing System in Container CloudWeb and Big Data10.1007/978-3-319-63579-8_32(411-426)Online publication date: 3-Aug-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media