Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2737182.2737197acmconferencesArticle/Chapter ViewAbstractPublication PagescomparchConference Proceedingsconference-collections
research-article

Exploring Performance Models of Hadoop Applications on Cloud Architecture

Published: 04 May 2015 Publication History

Abstract

Hadoop is an open source implementation of the MapReduce programming model, and provides the runtime infrastructure for map and reduce functions programmed in individual applications. Commercial clouds such as Amazon Elastic MapReduce provides the Hadoop architecture with IaaS support. In this architecture, the map and reduce functions are major determinants of end-to-end application latency, along with the framework components responsible for data access and exchange. In this paper, we aim to explore modeling methods that capture the performance characteristic and the semantics of a Hadoop architecture. We present our early results for modeling the performance of a Hadoop application given the design of map and reduce functions using Layered Queueing Network (LQN). We build two different LQN models to represent the data parallel computing of these functions and calibrate both models using monitored performance data. The output of both models produces converging results that are within ~10% of observed performance. From our modeling experience, we further discuss the issues of modeling Hadoop architecture using LQN in general and describe our future work.

References

[1]
Bowman, M., Debray, S. K., and Peterson, L. L. 1993. Reasoning about naming systems. ACM Trans. Program. Lang. Syst. 15, 5 (Nov. 1993), 795--825. DOI= http://doi.acm.org/10.1145/161468.16147.
[2]
Ding, W. and Marchionini, G. 1997. A Study on Video Browsing Strategies. Technical Report. University of Maryland at College Park.
[3]
Fröhlich, B. and Plate, J. 2000. The cubic mouse: a new device for three-dimensional input. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands, April 01 - 06, 2000). CHI '00. ACM, New York, NY, 526--531. DOI= http://doi.acm.org/10.1145/332040.332491.
[4]
Tavel, P. 2007. Modeling and Simulation Design. AK Peters Ltd., Natick, MA.
[5]
Sannella, M. J. 1994. Constraint Satisfaction and Debugging for Interactive User Interfaces. Doctoral Thesis. UMI Order Number: UMI Order No. GAX95-09398., University of Washington.
[6]
Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (Mar. 2003), 1289--1305.
[7]
Brown, L. D., Hua, H., and Gao, C. 2003. A widget framework for augmented interaction in SCAPE. In Proceedings of the 16th Annual ACM Symposium on User Interface Software and Technology (Vancouver, Canada, November 02 - 05, 2003). UIST '03. ACM, New York, NY, 1--10. DOI= http://doi.acm.org/10.1145/964696.964697.
[8]
Yu, Y. T. and Lau, M. F. 2006. A comparison of MC/DC, MUMCUT and several other coverage criteria for logical decisions. J. Syst. Softw. 79, 5 (May. 2006), 577--590. DOI= http://dx.doi.org/10.1016/j.jss.2005.05.030.
[9]
Spector, A. Z. 1989. Achieving application requirements. In Distributed Systems, S. Mullender, Ed. ACM Press Frontier Series. ACM, New York, NY, 19--33. DOI= http://doi.acm.org/10.1145/90417.90738.
[10]
J. Dean and S. Ghemawat, ?MapReduce: Simplified Data Processing on Large Clusters,? Communications of the ACM, vol. 51, no. 1, pp. 107?113, January 2008.
[11]
http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
[12]
Eaman Jahani, Michael J. Cafarella, and Christopher Ré. 2011. Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4, 6 (March 2011), 385--396.
[13]
Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3, 1--2 (September 2010), 515--529.
[14]
Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. 2009. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2, 1 (August 2009), 922--933
[15]
Wang, Guanying, Ali Raza Butt, Prashant Pandey, and Karan Gupta. "A simulation approach to evaluating design decisions in mapreduce setups." InModeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009. MASCOTS'09. IEEE International Symposium on, pp. 1--11. IEEE, 2009.
[16]
A. Verma, L. Cherkasova, and R. H. Campbell. Play It Again, SimMR! In 2011 IEEE International Conference on Cluster Computing, pages 253?261. IEEE, Sept. 2011.
[17]
A. C. Murthy. Mumak: Map-Reduce Simulator. MAPREDUCE-728, Apache JIRA, Also available at http://issues.apache.org/jira/browse/MAPREDUCE-728, 2009.
[18]
Yang Liu, Maozhen Li, Nasullah Khalid Alham, Suhel Hammoud, HSim: A MapReduce simulator in enabling Cloud Computing, Future Generation Computer Systems, Volume 29, Issue 1, January 2013, Pages 300--308, ISSN 0167--739X,
[19]
Hammoud, S.; Maozhen Li; Yang Liu; Alham, N.K.; Zelong Liu, "MRSim: A discrete event based MapReduce simulator," Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on, vol.6, no., pp.2993,2997, 10--12 Aug. 2010
[20]
http://www.dcs.ed.ac.uk/home/hase/simjava/
[21]
Buyya, Rajkumar, and Manzur Murshed. "Gridsim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing." Concurrency and computation: practice and experience 14.13?15 (2002): 1175--1220.
[22]
Greg Franks, Tariq Al-Omari, Murray Woodside, Olivia Das, and Salem Derisavi. "Enhanced modeling and solution of layered queueing networks."Software Engineering, IEEE Transactions on 35, no. 2 (2009): 148--161.
[23]
Murray Woodside, Tutorial Introduction to Layered Modeling of Software Performance, 2013. http://www.sce.carleton.ca/rads/lqns/lqn-documentation/tutorialh.pdf
[24]
Greg Franks, Peter Maly, Murray Woodside, Dorina C. Petriu, Alex Hubbard and Martin Mroz, Layered Queueing Network Solver and Simulator User Manual, 2013. http://www.sce.carleton.ca/rads/lqns/LQNSUserMan-jan13.pdf
[25]
Zaharia, Matei, Andy Konwinski, Anthony D. Joseph, Randy H. Katz, and Ion Stoica. "Improving MapReduce Performance in Heterogeneous Environments." In OSDI, vol. 8, no. 4, p. 7. 2008.
[26]
Netflix Prize. http://www.netflixprize.com/?
[27]
Linden, Greg, Brent Smith, and Jeremy York. "Amazon. com recommendations: Item-to-item collaborative filtering." Internet Computing, IEEE 7.1 (2003): 76--80.
[28]
Yan Liu, Ian Gorton, Anna Liu, Ning Jiang, and Shiping Chen. 2002. Designing a test suite for empirically-based middleware performance prediction. In Proceedings of the Fortieth International Conference on Tools Pacific: Objects for internet, mobile and embedded applications (CRPIT '02). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 123--130.
[29]
YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome Rolia. 2012. SkewTune: mitigating skew in mapreduce applications. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). ACM, New York, NY, USA, 25--36.
[30]
Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin, MapReduce and Parallel DBMSs: Friends or Foes?, CACM (2010).
[31]
https://issues.apache.org/jira/browse/MAPREDUCE-64
[32]
Shivnath Babu. 2010. Towards automatic optimization of MapReduce programs. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC '10). ACM, New York, NY, USA, 137--142.
[33]
Herodotou, Herodotos. "Hadoop performance models." arXiv preprint arXiv:1106.0940, Technical Report CS-2011-05, Duke University (2011).
[34]
Herodotou, Herodotos and Babu, Shivnath. "Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs." PVLDB 4, no. 11 (2011): 1111--1122.
[35]
Song, Ge, Zide Meng, Fabrice Huet, Frederic Magoules, Lei Yu, and Xuelian Lin. "A Hadoop MapReduce Performance Prediction Method." In 2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing. 2013.

Cited By

View all
  • (2023)A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2022.316762849:2(924-946)Online publication date: 1-Feb-2023
  • (2019)Towards a cloud performance research repositoryProceedings of the 2nd International Workshop on Establishing a Community-Wide Infrastructure for Architecture-Based Software Engineering10.1109/ECASE.2019.00012(22-25)Online publication date: 27-May-2019
  • (2019)Engineering-out hazards: digitising the management working safety in confined spacesFacilities10.1108/F-03-2018-003937:3/4(196-215)Online publication date: 28-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
QoSA '15: Proceedings of the 11th International ACM SIGSOFT Conference on Quality of Software Architectures
May 2015
152 pages
ISBN:9781450334709
DOI:10.1145/2737182
  • General Chair:
  • Philippe Kruchten,
  • Program Chairs:
  • Ipek Ozkaya,
  • Heiko Koziolek
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. layered queueing network
  2. mapreduce
  3. performance modeling

Qualifiers

  • Research-article

Conference

CompArch '15
Sponsor:

Acceptance Rates

QoSA '15 Paper Acceptance Rate 14 of 42 submissions, 33%;
Overall Acceptance Rate 46 of 131 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2022.316762849:2(924-946)Online publication date: 1-Feb-2023
  • (2019)Towards a cloud performance research repositoryProceedings of the 2nd International Workshop on Establishing a Community-Wide Infrastructure for Architecture-Based Software Engineering10.1109/ECASE.2019.00012(22-25)Online publication date: 27-May-2019
  • (2019)Engineering-out hazards: digitising the management working safety in confined spacesFacilities10.1108/F-03-2018-003937:3/4(196-215)Online publication date: 28-Feb-2019
  • (2019)A Scalable Platform for Monitoring Data Intensive ApplicationsJournal of Grid Computing10.1007/s10723-019-09483-1Online publication date: 29-May-2019
  • (2019)Testing MapReduce programsJournal of Software: Evolution and Process10.1002/smr.212031:3Online publication date: 25-Mar-2019
  • (2018)Stochastic Petri Net Based Modeling for Analyzing Dependability of Big Data Storage SystemEmerging Technologies in Data Mining and Information Security10.1007/978-981-13-1498-8_42(473-484)Online publication date: 2-Sep-2018
  • (2017)Cooperation between data modeling and simulation modeling for performance analysis of Hadoop2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS)10.23919/SPECTS.2017.8046769(1-7)Online publication date: Jul-2017
  • (2017)Building Information Modeling (BIM) Enabled Facilities Management Using Hadoop Architecture2017 Portland International Conference on Management of Engineering and Technology (PICMET)10.23919/PICMET.2017.8125462(1-7)Online publication date: Jul-2017
  • (2017)Augmenting Amdahl's Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science2017 IEEE 10th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD.2017.27(147-154)Online publication date: Jun-2017
  • (2017)Autonomic deployment decision making for big data analytics applications in the cloudSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-015-1945-521:16(4501-4512)Online publication date: 1-Aug-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media