research-article

SpongeFiles: mitigating data skew in mapreduce using distributed memory

Authors:

Khaled Elmeleegy,

Christopher Olston,

Benjamin ReedAuthors Info & Claims

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 551 - 562

https://doi.org/10.1145/2588555.2595634

Published: 18 June 2014 Publication History

Abstract

Data skew is a major problem for data processing platforms like MapReduce. Skew causes worker tasks to spill to disk what they cannot fit in memory, which slows down the task and the overall job. Moreover, performance of other jobs sharing same disk degrades. In many cases, this situation occurs even as the cluster has plenty of spare memory it is just not used evenly. We introduce SpongeFiles, a novel distributed-memory abstraction tailored to data processing environments like MapReduce. A SpongeFile is a logical byte array, comprised of large chunks that can be stored in a variety of locations in the cluster. Spilled data goes to SpongeFiles, which route it to the nearest location with sufficient capacity (local memory, remote memory, local disk, or remote disk as a last resort). By enabling memory-sapped nodes to tap into the spare capacity of their neighbors, SpongeFiles minimize expensive disk spilling, thereby improving performance. In our experiments with Hadoop and Pig, SpongeFiles reduce overall job runtimes by up to 55% and by up to 85% under disk contention.

References

[1]

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In Proc. OSDI, 2010.

Digital Library

[2]

HBase: the Hadoop database. http:///hbase.apache.org/.

[3]

Pig: Query processing system on Hadoop. http://pig.apache.org.

[4]

M. G. Bulmer. Principles of statistics. Courier Dover Publications, 1979.

[5]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Computer Systems, 26(2), 2008.

Digital Library

[6]

D. Comer and J. Griffioen. A new design for distributed systems: The remote memory model. In Proceedings of the Summer 1990 USENIX Conference, USENIX '90, 1990.

[7]

B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2):1277--1288, Aug. 2008.

Digital Library

[8]

M. D. Dahlin, O. Y. Wang, T. E. Anderson, and D. A. Patterson. Cooperative caching: Using remote client memory to improve file system performance. In In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 267--280, 1994.

Digital Library

[9]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. OSDI, 2004.

Digital Library

[10]

D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. CACM, 35(6), 1992.

Digital Library

[11]

D. J. DeWitt, J. F. Naughton, D. A. Schneider, and S. Seshadri. Practical skew handling in parallel joins. In Proc. VLDB, 1992.

Digital Library

[12]

D. J. DeWitt and M. Stonebraker. MapReduce: A major step backwards. http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/.

[13]

K. Elmeleegy. Piranha: Optimizing short jobs in hadoop. Proc. VLDB Endow., 6(11):985--996, Aug. 2013.

Digital Library

[14]

M. J. Feeley, W. E. Morgan, E. P. Pighin, A. R. Karlin, H. M. Levy, and C. A. Thekkath. Implementing global memory management in a workstation cluster. In Proceedings of the fifteenth ACM symposium on Operating systems principles, SOSP '95, pages 201--212, New York, NY, USA, 1995. ACM.

Digital Library

[15]

A. F. Gates, O. Natkovich, S. Chopra, P. Kamath, S. M. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of map-reduce: The Pig experience. In Proc. VLDB, 2009.

Digital Library

[16]

K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. In Proc. VLDB, 1991.

Digital Library

[17]

P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In USENIXATC'10: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, pages 11--11, 2010.

Digital Library

[18]

Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. Skewtune: mitigating skew in mapreduce applications. In SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 25--36, New York, NY, USA, 2012. ACM.

Digital Library

[19]

J. Lin. The curse of Zipf and the limits to parallelization: A look at the stragglers problem in MapReduce. In Proc. Workshop on Large-Scale Distributed Systems for Information Retrieval, 2009.

[20]

E. Markatos, E. P. Markatos, G. Dramitinos, and G. Dramitinos. Implementation of a reliable remote memory pager. In In USENIX Annual Technical Conference, 1996.

Digital Library

[21]

I. McDonald. Remote paging in a single address space operating system supporting quality of service. Technical report, Department of Computing Science, University of Glasgow, Scotland, UK, 1999.

[22]

Memcached: A distributed memory object caching system. http://memcached.org/.

[23]

Microsoft. Sql server 2008 r2 documentation. http://msdn.microsoft.com/en-us/library/ms191514.aspx.

[24]

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. ACM SIGMOD, 2008.

Digital Library

[25]

D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in ramcloud. In SOSP '11: Proceedings of the twentieth ACM symposium on Operating systems principles, 2011.

Digital Library

[26]

Oracle. Oracle database concepts -- memory architecture, 10g release 2. http://docs.oracle.com/cd/B19306_01/server.102/b14220/memory.htm.

[27]

A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 61--72, New York, NY, USA, 2012. ACM.

Digital Library

[28]

R. Ramakrishnan and J. Gehrke. Database management systems (3. ed.). McGraw-Hill, 2003.

Digital Library

[29]

C. B. Walton, A. G. Dale, and R. M. Jenevein. A taxonomy and performance model of data skew effects in parallel joins. In Proc. VLDB, 1991.

Digital Library

[30]

D. Williams, H. Jamjoom, Y.-H. Liu, and H. Weatherspoon. Overdriver: handling memory overload in an oversubscribed cloud. In Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, VEE '11, 2011.

Digital Library

[31]

M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems, EuroSys '10, pages 265--278, New York, NY, USA, 2010. ACM.

Digital Library

[32]

M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI'08: Proceedings of the 8th USENIX conference on Operating systems design and implementation, pages 29--42, 2008.

Digital Library

Cited By

Ge YTian YYu ZZhang W(2023)Memory sharing for handling memory overload on physical machines in cloud data centersJournal of Cloud Computing10.1186/s13677-023-00405-x12:1Online publication date: 28-Feb-2023
https://doi.org/10.1186/s13677-023-00405-x
Wang TLiu HJin H(2023)Efficient Remote Memory Paging for Disaggregated Memory SystemsAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_1(1-20)Online publication date: 11-Jan-2023
https://doi.org/10.1007/978-3-031-22677-9_1
Kocharyan AEkane BTeabe BTran GAstsatryan HHagimont D(2022)A Remote Memory Sharing System for Virtualized Computing InfrastructuresIEEE Transactions on Cloud Computing10.1109/TCC.2020.301808910:3(1532-1542)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TCC.2020.3018089
Show More Cited By

Index Terms

SpongeFiles: mitigating data skew in mapreduce using distributed memory
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Data types and structures

Recommendations

FRASH: hierarchical file system for FRAM and flash
ICCSA'07: Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I

In this work, we develop novel file system, FRASH, for byte-addressable NVRAM (FRAM[1]) and NAND Flash device. Byte addressable NVRAM and NAND Flash is typified by the DRAM-like fast access latency and high storage density, respectively. Hierarchical ...
PFFS: a scalable flash memory file system for the hybrid architecture of phase-change RAM and NAND flash
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

In this paper, we present the scalable and efficient flash file system using the combination of NAND and Phase-change RAM (PRAM). Until now, several flash file systems have been developed considering the physical characteristics of NAND flash. However, ...
FlexFS: a flexible flash file system for MLC NAND flash memory
USENIX'09: Proceedings of the 2009 conference on USENIX Annual technical conference

The multi-level cell (MLC) NAND flash memory technology enables multiple bits of information to be stored on a single cell, thus making it possible to increase the density of the memory without increasing the die size. For most MLC flash memories, each ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

June 2014

1645 pages

ISBN:9781450323765

DOI:10.1145/2588555

General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

handling data skew

Qualifiers

Research-article

Conference

SIGMOD/PODS'14

Sponsor:

SIGMOD

SIGMOD/PODS'14: International Conference on Management of Data

June 22 - 27, 2014

Utah, Snowbird, USA

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
736
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ge YTian YYu ZZhang W(2023)Memory sharing for handling memory overload on physical machines in cloud data centersJournal of Cloud Computing10.1186/s13677-023-00405-x12:1Online publication date: 28-Feb-2023
https://doi.org/10.1186/s13677-023-00405-x
Wang TLiu HJin H(2023)Efficient Remote Memory Paging for Disaggregated Memory SystemsAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_1(1-20)Online publication date: 11-Jan-2023
https://doi.org/10.1007/978-3-031-22677-9_1
Kocharyan AEkane BTeabe BTran GAstsatryan HHagimont D(2022)A Remote Memory Sharing System for Virtualized Computing InfrastructuresIEEE Transactions on Cloud Computing10.1109/TCC.2020.301808910:3(1532-1542)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TCC.2020.3018089
Song WYang YEo JSeo JKim JLee SLee GUm TCho HChun B(2021)Apache Nemo: A Framework for Optimizing Distributed Data ProcessingACM Transactions on Computer Systems10.1145/346814438:3-4(1-31)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3468144
DAIKOKU HKAWASHIMA HTATEBE O(2019)Skew-Aware Collective Communication for MapReduce ShufflingIEICE Transactions on Information and Systems10.1587/transinf.2019PAP0019E102.D:12(2389-2399)Online publication date: 1-Dec-2019
https://doi.org/10.1587/transinf.2019PAP0019
Mishra SSethi NChinmay A(2019)Various Data Skewness Methods in the Hadoop Environment2019 International Conference on Recent Advances in Energy-efficient Computing and Communication (ICRAECC)10.1109/ICRAECC43874.2019.8994979(1-4)Online publication date: Mar-2019
https://doi.org/10.1109/ICRAECC43874.2019.8994979
Bindschaedler LMalicevic JSchiper NGoel AZwaenepoel WOliveira RFelber PHu Y(2018)Rock you like a hurricaneProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190532(1-15)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3190508.3190532
Meena KTayal D(2018)Partition Tuning based Bagging technique to Skew Handling2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT)10.1109/ICICCT.2018.8473202(244-250)Online publication date: May-2018
https://doi.org/10.1109/ICICCT.2018.8473202
Kim MCho H(2018)Popularity-based covering sets for energy proportionality in shared-nothing clustersThe Journal of Supercomputing10.1007/s11227-017-2197-174:5(1885-1910)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s11227-017-2197-1
Shang FChen XYan CLi LZhao Y(2018)The bandwidth-aware backup task scheduling strategy using SDN in HadoopCluster Computing10.1007/s10586-018-1736-8Online publication date: 15-Jan-2018
https://doi.org/10.1007/s10586-018-1736-8
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents