Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1996130.1996166acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
poster

Adapting MapReduce for HPC environments

Published: 08 June 2011 Publication History
  • Get Citation Alerts
  • Abstract

    MapReduce is increasingly gaining popularity as a programming model for use in large-scale distributed processing. The model is most widely used when implemented using the Hadoop Distributed File System (HDFS). The use of the HDFS, however, precludes the direct applicability of the model to HPC environments, which use high performance distributed file systems. In such distributed environments, the MapReduce model can rarely make use of full resources, as local disks may not be available for data placement on all the nodes. This work proposes a MapReduce implementation and design choices directly suitable for such HPC environments.

    References

    [1]
    Apache Hadoop. http://hadoop.apache.org.
    [2]
    Fermilab Computing Division, FermiGrid. http://fermigrid.fnal.gov/.
    [3]
    Microsoft Research. http://www.microsoft.com/windowsazure/.
    [4]
    National Energy Research Scientific Computing Center. http://www.nersc.gov.
    [5]
    Open Science Grid. http://www.opensciencegrid.org.
    [6]
    TeraGrid Information Services. http://info.teragrid.org/.
    [7]
    Amazon. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2.
    [8]
    J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
    [9]
    J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pages 810--818, New York, NY, USA, 2010. ACM.
    [10]
    L. Heshan, A. Ma, and M. Feng. Moon: Mapreduce on opportunistic environments. In HPDC '10: the ACM International Symposium on High Performance Distributed Computing. ACM, 2010.
    [11]
    Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, 1994.
    [12]
    R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and implementation or the sun network filesystem, 1985.
    [13]
    F. Schmuck and R. Haskin. Gpfs: A shared-disk file system for large computing clusters. In In Proceedings of the 2002 Conference on File and Storage Technologies (FAST, pages 231--244, 2002.
    [14]
    K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10, May 2010.
    [15]
    S. R. Soltis, G. M. Erickson, K. W. Preslan, M. T. O'keefe, and T. M. Ruwart. The global file system: A file system for shared disk storage, 1997.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing
    June 2011
    296 pages
    ISBN:9781450305525
    DOI:10.1145/1996130

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. hdfs
    2. hpc
    3. mapreduce

    Qualifiers

    • Poster

    Conference

    HPDC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Synthesizing MPI Implementations from Functional Data-Parallel ProgramsInternational Journal of Parallel Programming10.1007/s10766-015-0359-444:3(552-573)Online publication date: 1-Jun-2016
    • (2015)ApproxHadoopACM SIGARCH Computer Architecture News10.1145/2786763.269435143:1(383-397)Online publication date: 14-Mar-2015
    • (2015)ApproxHadoopACM SIGPLAN Notices10.1145/2775054.269435150:4(383-397)Online publication date: 14-Mar-2015
    • (2015)ApproxHadoopProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694351(383-397)Online publication date: 14-Mar-2015
    • (2015)Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2014.234506826:8(2300-2315)Online publication date: 1-Aug-2015
    • (2012)A Parameter Dynamic-Tuning Scheduling Algorithm Based on History in Heterogeneous EnvironmentsProceedings of the 2012 Seventh ChinaGrid Annual Conference10.1109/ChinaGrid.2012.24(49-56)Online publication date: 20-Sep-2012
    • (2012)MARLAProceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)10.1109/CCGrid.2012.135(49-56)Online publication date: 13-May-2012

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media