research-article

On the core affinity and file upload performance of Hadoop

Authors:

Joong-Yeon Cho,

Karsten SchwanAuthors Info & Claims

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Pages 25 - 30

https://doi.org/10.1145/2534645.2534651

Published: 18 November 2013 Publication History

Abstract

The MapReduce programming model is introduced for big-data processing, where the data nodes perform both data storing and computation. Thus, we need to understand different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. The core affinity defines mapping between a set of cores and a given task. The core affinity can be decided based on resource requirements of a task because this largely affects the efficiency of computation, memory, and I/O resource utilization. In this paper, we analyze the impact of core affinity on the file upload performance of Hadoop Distributed File System (HDFS). Our study can provide the insight into the process scheduling issues on big-data processing systems. We also suggest a framework for dynamic core affinity based on our observations and show that a preliminary implementation can improve the throughput more than 40% compared with default Linux system.

References

[1]

bonnie++. http://www.coker.com.au/bonnie++/.

[2]

VTune Performance Analyzer. http://www.intel.com/software/products/vtune.

[3]

V. Ahuja, M. Farrens, and D. Ghosal. Cache-aware affinitization on commodity multicores for high-speed network flows. In Proc. of ANCS, Oct 2012.

Digital Library

[4]

D. Borthakur. The Hadoop Distributed File System: Architecture and Design. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, 2007.

[5]

B. Chen and R. Morris. Flexible control of parallelism in a multiprocessor pc router. In Proc. of USENIX ATC, pages 333--346, June 2001.

Digital Library

[6]

J. Dean and J. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. of USENIX OSDI, Dec 2004.

Digital Library

[7]

A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In Proc. of ICON, pages 244--250, Nov 2004.

[8]

A. Foong, J. Fung, D. Newell, A. Lopez-Estrada, S. Abraham, and P. Irelan. Architectural characterization of processor affinity in network processing. In Proc. of ISPASS, Mar 2005.

Digital Library

[9]

H.-C. Jang and H.-W. Jin. Miami: Multi-core aware processor affinity for tcp/ip over multiple network interfaces. In Proc. of HotI 2009, pages 73--82, August 2009.

Digital Library

[10]

L. Kencl and J. Boudec. Adaptive load sharing for network processors. In Proc. of INFOCOM, pages 545--554, June 2002.

[11]

M. Lee and K. Schwan. Region scheduling: efficiently using the cache architectures via page-level affinity. In Proc. of ASPLOS 2012, pages 451--462, March 2012.

Digital Library

[12]

E. Lemoine, C. Pham, and L. Lefevre. Packet classification in the nic for improved smp-based internet servers. In Proc. of ICN 2004, Feb 2004.

[13]

G. Narayanaswamy, P. Balaji, and W. Feng. An analysis of 10-gigabit ethernet protocol stacks in multicore environments. In Proc. of HotI, Aug 2007.

Digital Library

[14]

G. Narayanaswamy, P. Balaji, and W. Feng. Impact of network sharing in multi-core architectures. In Proc. of ICCCN, Aug 2008.

[15]

M. Ott, T. Klug, J. Weidendorfer, and C. Trinitis. autopin - automated optimization of thread-to-core pinning on multicore systems. In Proc. of MULTIPROG, Jan 2008.

[16]

J. D. Salehi, J. F. Kurose, and D. Towsley. The effectiveness of affinity-based scheduling in multiprocessor network protocol processing. IEEE/ACM Transactions on Networking, 4(4): 516--530, Aug 1996.

Digital Library

[17]

T. Scogland, P. Balaji, W. Feng, and G. Narayanaswamy. Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and evaluation. In Proc. of SC2008, Nov 2008.

Digital Library

[18]

W. Shi and L. Kencl. Sequence-preserving adaptive load balancers. In Proc. of ANCS, pages 143--152, Dec 2006.

Digital Library

[19]

W. Shi, M. MacGregor, and P. Gburzynski. Load balancing for parallel forwarding. Transactions on Networking, 13(4): 790--801, Aug 2005.

Digital Library

[20]

L. Soares and M. Stumm. Flexsc: flexible system call scheduling with exception-less system calls. In Proc. of USENIX OSDI, June 2010.

Digital Library

[21]

P. Strazdins, R. Alexander, and D. Barr. Performance enhancement of smp clusters with multiple network interfaces using virtualization. In Proc. of ISPA 2006 Workshops, Dec 2006.

Digital Library

[22]

USNA. TTCP: A test of TCP and UDP performance. 1984.

[23]

B. Veal and A. Foong. Performance scalability of a multi-core web server. In Proc. of ANCS'07, Dec 2007.

Digital Library

[24]

P. Willmann, S. Rixner, and A. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In Proc. of USENIX ATC, May 2006.

Digital Library

Cited By

Hanford NAhuja VFarrens MTierney BGhosal D(2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
https://dl.acm.org/doi/10.1145/3184899
Cho JJin HLee MSchwan K(2014)Dynamic core affinity for high-performance file upload on Hadoop Distributed File SystemParallel Computing10.1016/j.parco.2014.07.00540:10(722-737)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1016/j.parco.2014.07.005

Index Terms

On the core affinity and file upload performance of Hadoop
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Dynamic core affinity for high-performance file upload on Hadoop Distributed File System

We analyze the impact of core affinity on both network and disk I/O performance.Both parallelism and locality are important for tasks that access disk and network.We suggest a novel approach to dynamically decide the core affinity of HDFS threads.Our ...
Addressing Hadoop's Small File Problem With an Appendable Archive File Format
CF'17: Proceedings of the Computing Frontiers Conference

Hadoop has been used widely for data analytic tasks in various domains. At the same time, data volume is expected to grow even further in the next years. Hadoop recently introduced the concept Archival Storage, an automated tiered storage technique for ...
Implementation of Distributed Searching and Sorting using Hadoop MapReduce
ICTCS '14: Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies

This paper focuses on implementation of MapReduce programming model on Hadoop cluster for parallel processing of huge amount of data efficiently. There is deluge of data everywhere and we need to process these data efficiently to take decisions and to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

November 2013

66 pages

ISBN:9781450325066

DOI:10.1145/2534645

General Chair:
Xian-He Sun
Illinois Institute of Technology
,
Program Chairs:
Yong Chen
Texas Tech University
,
Philip C. Roth
Oak Ridge National Laboratory

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea

Conference

SC13

Sponsor:

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 18, 2013

Colorado, Denver

Acceptance Rates

DISCS-2013 Paper Acceptance Rate 10 of 19 submissions, 53%;

Overall Acceptance Rate 19 of 34 submissions, 56%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
185
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hanford NAhuja VFarrens MTierney BGhosal D(2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
https://dl.acm.org/doi/10.1145/3184899
Cho JJin HLee MSchwan K(2014)Dynamic core affinity for high-performance file upload on Hadoop Distributed File SystemParallel Computing10.1016/j.parco.2014.07.00540:10(722-737)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1016/j.parco.2014.07.005

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents