Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MapReduce programming and cost-based optimization?: crossing this chasm with starfish

Published: 01 August 2011 Publication History

Abstract

MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical success of database systems, namely, cost-based optimization. A major challenge here is that, to the MapReduce system, a program consists of black-box map and reduce functions written in some programming language like C++, Java, Python, or Ruby.
Starfish is a self-tuning system for big data analytics that includes, to our knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs. Starfish also includes a Profiler to collect detailed statistical information from unmodified MapReduce programs, and a What-if Engine for fine-grained cost estimation. This demonstration will present the profiling, what-if analysis, and cost-based optimization of MapReduce programs in Starfish. We will show how (nonexpert) users can employ the Starfish Visualizer to (a) get a deep understanding of a MapReduce program's behavior during execution, (b) ask hypothetical questions on how the program's behavior will change when parameter settings, cluster resources, or input data properties change, and (c) ultimately optimize the program.

References

[1]
S. Babu. Towards Automatic Optimization of MapReduce Programs. In SoCC, pages 137--142, 2010.
[2]
A Dynamic Instrumentation Tool for Java. kenai.com/projects/btrace.
[3]
B. M. Cantrill, M. W. Shapiro, and A. H. Leventhal. Dynamic Instrumentation of Production Systems. In USENIX ATEC, pages 2--2, 2004.
[4]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 51(1):107--113, 2008.
[5]
Apache Hadoop. http://hadoop.apache.org/.
[6]
H. Herodotou and S. Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. PVLDB, 4, 2011.
[7]
H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A Self-tuning System for Big Data Analytics. In CIDR, pages 261--272, 2011.
[8]
J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Morgan and Claypool, 2010.
[9]
T. Ye and S. Kalyanaraman. A Recursive Random Search Algorithm for Large-scale Network Parameter Configuration. In SIGMETRICS, pages 196--205, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 4, Issue 12
August 2011
303 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2011
Published in PVLDB Volume 4, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Accelerating Parallel ALS for Collaborative Filtering on HadoopBenchmarking, Measuring, and Optimizing10.1007/978-3-030-49556-5_13(123-137)Online publication date: 14-Nov-2019
  • (2017)ArasInternational Journal of Distributed Systems and Technologies10.4018/IJDST.20170401048:2(47-60)Online publication date: 1-Apr-2017
  • (2014)MRONLINEProceedings of the 23rd international symposium on High-performance parallel and distributed computing10.1145/2600212.2600229(165-176)Online publication date: 23-Jun-2014
  • (2013)The family of mapreduce and large-scale data processing systemsACM Computing Surveys10.1145/2522968.252297946:1(1-44)Online publication date: 11-Jul-2013

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media