Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2980258.2980461acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaConference Proceedingsconference-collections
research-article

AEGEUS: An online partition skew mitigation algorithm for mapreduce

Published: 25 August 2016 Publication History

Editorial Notes

NOTICE OF CONCERN: ACM has received evidence that casts doubt on the integrity of the peer review process for the ICIA 2016 Conference. As a result, ACM is issuing a Notice of Concern for all papers published and strongly suggests that the papers from this Conference not be cited in the literature until ACM's investigation has concluded and final decisions have been made regarding the integrity of the peer review process for this Conference.

Abstract

This paper investigates the partition skew problem at reduce phase in the MapReduce jobs. Our studies with the Hadoop addresses this problem in both offline and online manner. Offline is a heuristics based approach which has to wait for the completion of map tasks and involves computation overhead to estimate the partition size. In another approach, they distribute the overloaded tasks across other nodes that needed extra split and merge operation. These extra operations, in turn, hamper the performance of the system. In this paper, we propose Aegeus, an on-line streaming based skew mitigation approach for MapReduce jobs which do not have long waiting time and extra operations for addressing the skew problem. Aegeus predicts the partition size of the each map tasks and creates the resource specification based on its requirement even before the completion of map phase. Hence, the proposed system can create the container based on the workload which can improve the overall job completion time and system performance. We evaluated Aegeus by using benchmark datasets and, compare its performance with naive Hadoop. Based on our observation, Aegeus outperforms naive Hadoop by 42% by maximizing the overall performance of the application and system.

References

[1]
F. Ahmad, S. Lee, M. Thottethodi, and T. Vijaykumar. Puma: Purdue mapreduce benchmarks suite. 2012.
[2]
G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In OSDI, volume 10, page 24, 2010.
[3]
Q. Chen, J. Yao, and Z. Xiao. Libra: Lightweight data skew mitigation in mapreduce. IEEE Transactions on Parallel and Distributed Systems, 26(9):2520--2533, 2015.
[4]
M. Company. http://www.mckinsey.com/business-functions/business-technology/our-insights/the-need-to-lead-in-data-and-analytics. visited 10-may-2016.
[5]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[6]
P. Dhawalia, S. Kailasam, and D. Janakiram. Chisel: A resource savvy approach for handling skew in mapreduce applications. In 2013 IEEE Sixth International Conference on Cloud Computing, pages 652--660. IEEE, 2013.
[7]
P. Dhawalia, S. Kailasam, and D. Janakiram. Chisel++: handling partitioning skew in mapreduce framework using efficient range partitioning technique. In Proceedings of the sixth international workshop on Data intensive distributed computing, pages 21--28. ACM, 2014.
[8]
K. Elmeleegy, C. Olston, and B. Reed. Spongefiles: Mitigating data skew in mapreduce using distributed memory. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 551--562. ACM, 2014.
[9]
A. Hadoop. https://hadoop.apache.org/.
[10]
M. Hammoud and M. F. Sakr. Locality-aware reduce task scheduling for mapreduce. In Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, pages 570--576. IEEE, 2011.
[11]
D. Hsu and S. Sabato. Heavy-tailed regression with a generalized median-of-means. In ICML, pages 37--45, 2014.
[12]
S. Ibrahim, H. Jin, L. Lu, S. Wu, B. He, and L. Qi. Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, pages 17--24. IEEE, 2010.
[13]
Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. Skewtune: mitigating skew in mapreduce applications. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 25--36. ACM, 2012.
[14]
Y. Le, J. Liu, F. Ergün, and D. Wang. Online load balancing for mapreduce with skewed data input. In IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pages 2004--2012. IEEE, 2014.
[15]
Z. Liu, Q. Zhang, R. Boutaba, Y. Liu, and B. Wang. Optima: on-line partitioning skew mitigation for mapreduce with resource adjustment. Journal of Network and Systems Management, pages 1--25, 2016.
[16]
Z. Liu, Q. Zhang, M. F. Zhani, R. Boutaba, Y. Liu, and Z. Gong. Dreams: Dynamic resource allocation for mapreduce with data skew. In 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pages 18--26. IEEE, 2015.
[17]
S. Sabato and R. Munos. Active regression by stratification. In Advances in Neural Information Processing Systems, pages 469--477, 2014.
[18]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, page 5. ACM, 2013.
[19]
vCloud. http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-bytes-of-data-created-daily. visited 10-may-2016.
[20]
N. Zaheilas and V. Kalogeraki. Real-time scheduling of skewed mapreduce jobs in heterogeneous environments. In 11th International Conference on Autonomic Computing (ICAC 14), pages 189--200, 2014.

Cited By

View all
  • (2020)SrSpark: Skew-resilient Spark based on Adaptive Parallel Processing2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS51040.2020.00067(466-475)Online publication date: Dec-2020
  • (2019)Learning automata-based algorithms for MapReduce data skewness handlingThe Journal of Supercomputing10.1007/s11227-019-02855-075:10(6488-6516)Online publication date: 1-Oct-2019
  • (2018)MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduceAlgorithms10.3390/a1201000512:1(5)Online publication date: 24-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIA-16: Proceedings of the International Conference on Informatics and Analytics
August 2016
868 pages
ISBN:9781450347563
DOI:10.1145/2980258
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MapReduce
  2. big data processing
  3. cloud computing
  4. online load balancing algorithms
  5. partitioning skew

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Anna Centenary Research Fellowship (ACRF)
  • Microsoft Azure sponsorship Award

Conference

ICIA-16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)SrSpark: Skew-resilient Spark based on Adaptive Parallel Processing2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS51040.2020.00067(466-475)Online publication date: Dec-2020
  • (2019)Learning automata-based algorithms for MapReduce data skewness handlingThe Journal of Supercomputing10.1007/s11227-019-02855-075:10(6488-6516)Online publication date: 1-Oct-2019
  • (2018)MAPSkew: Metaheuristic Approaches for Partitioning Skew in MapReduceAlgorithms10.3390/a1201000512:1(5)Online publication date: 24-Dec-2018
  • (2017)AEGEUS++: an energy-aware online partition skew mitigation algorithm for mapreduce in cloudCluster Computing10.1007/s10586-017-1044-821:2(1243-1260)Online publication date: 24-Jul-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media