Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1982185.1982220acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Towards improved load balancing for data intensive distributed computing

Published: 21 March 2011 Publication History

Abstract

Specialized frameworks for highly scalable data processing continue to gain prominence over traditional databases in many environments including the cloud. Perhaps the most well-known such framework is Google MapReduce, which has gained wide-spread popularity. However, the MapReduce model offers some significant challenges for workload balancing which have not been adequately explored so far. In this paper, we introduce techniques for improving load balancing -- particularly multi-stage jobs and dynamic partition assignment -- by using a modified programming model that offers greater flexibility but maintains the simplicity, scalability and fault tolerance of MapReduce. We then explore the effectiveness of our approach using a parallel frequent itemset mining algorithm.

References

[1]
R. Agrawal and R. Srikant. Quest synthetic data generator. IBM Almaden Research Center.
[2]
Amazon. Elastic MapReduce. http://aws.amazon.com/elasticmapreduce/.
[3]
Apache. Hadoop Core. http://hadoop.apache.org/core.
[4]
Apache. Mahout. http://mahout.apache.org/.
[5]
S. Babu. Towards automatic optimization of MapReduce programs. In Proc. of SoCC, pages 137--142, New York, NY, USA, 2010. ACM.
[6]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proc. of OSDI, pages 137--150, Berkeley, CA, USA, 2004. USENIX.
[7]
J. Dean and S. Ghemawat. MapReduce: a flexible data processing tool. Commun. ACM, 53(1): 72--77, 2010.
[8]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proc. of SOSP, pages 29--43, New York, NY, USA, 2003. ACM.
[9]
K. Goda, T. Tamura, M. Oguchi, and M. Kitsuregawa. Run-time load balancing system on SAN-connected PC cluster for dynamic injection of CPU and disk resource -- a case study of data mining application --. In Proc. of DEXA, pages 182--192. Springer, 2002.
[10]
S. Groot, K. Goda, and M. Kitsuregawa. A study on workload imbalance issues in data intensive distributed computing. In Proc. of DNIS, pages 27--32. Springer, 2010.
[11]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. SIGMOD Rec., 29(2): 1--12, 2000.
[12]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev., 41(3): 59--72, 2007.
[13]
K. Kambatla, A. Pathak, and H. Pucha. Towards optimizing hadoop provisioning in the cloud. In Proc. of HotCloud, Berkeley, CA, USA, 2009. USENIX.
[14]
H. Li, Y. Wang, D. Zhang, M. Zhang, and E. Y. Chang. PFP: Parallel FP-growth for query recommendation. In RecSys, pages 107--114, New York, NY, USA, 2008. ACM.
[15]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proc. of SIGMOD, pages 165--178, New York, NY, USA, 2009. ACM.
[16]
M. Randles, D. Lamb, and A. Taleb-Bendiab. A comparative study into distributed load balancing algorithms for cloud computing. In Proc. of WAINA, pages 551--556, Los Alamitos, CA, USA, 2010. IEEE.
[17]
M. Tamura and M. Kitsuregawa. Dynamic load balancing for parallel association rule mining on heterogenous pc cluster systems. In Proc. of VLDB, pages 162--173, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[18]
C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In Proc. of ICDE, pages 657--668, Los Alamitos, CA, USA, 2010. IEEE.
[19]
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proc. of OSDI, pages 29--42, Berkeley, CA, USA, 2008. USENIX.

Cited By

View all
  • (2017)FiDoop-DPIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.256017628:1(101-114)Online publication date: 1-Jan-2017
  • (2014)The Optimization of Parallel Frequent Pattern Growth Algorithm Based on Mahout in Cloud Manufacturing EnvironmentProceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 0210.1109/ISCID.2014.258(420-423)Online publication date: 13-Dec-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing
March 2011
1868 pages
ISBN:9781450301138
DOI:10.1145/1982185
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SAC'11
Sponsor:
SAC'11: The 2011 ACM Symposium on Applied Computing
March 21 - 24, 2011
TaiChung, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2017)FiDoop-DPIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.256017628:1(101-114)Online publication date: 1-Jan-2017
  • (2014)The Optimization of Parallel Frequent Pattern Growth Algorithm Based on Mahout in Cloud Manufacturing EnvironmentProceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 0210.1109/ISCID.2014.258(420-423)Online publication date: 13-Dec-2014

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media