Towards co-designed optimizations in parallel frameworks: a MapReduce case study
Pages 172 - 179
Abstract
The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit.
This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs automatically without requiring modification to application code.
The optimizer is able to speedup the execution time of MR4J by up to 2.0x. The introduced optimization not only improves the performance of the generated code, during the map phase, but also reduces the pressure on the garbage collector. This demonstrates how semantic information can be harnessed without sacrificing sound software engineering practices when using parallel software frameworks.
References
[1]
R. Appuswamy, C. Gkantsidis, D. Narayanan, O. Hodson, and A. Rowstron. Nobody ever got fired for buying a cluster. Technical report, Technical Report MSR-TR-2013-2, Microsoft Research, 2013.
[2]
W. Binder, J. Hulaas, and P. Moret. Advanced Java Bytecode Instrumentation. In Proceedings of the 5th International Symposium on Principles and Practice of Programming in Java, pages 135--144, 2007.
[3]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 207--216, 1995.
[4]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 519--538, 2005.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51(1):107--113, 2008.
[6]
G. Duboscq, L. Stadler, T. Würthinger, D. Simon, C. Wimmer, and H. Mössenböck. Graal IR: An Extensible Declarative Intermediate Representation. In Proceedings of the Second Asia-Pacific Programming Languages and Compilers Workshop, 2013.
[7]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 29--43, 2003.
[8]
R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman & Hall/CRC, 2011.
[9]
D. Lea. A Java Fork/Join Framework. In Proceedings of the ACM 2000 Conference on Java Grande, pages 36--43, 2000.
[10]
MongoDB, Inc. MongoDB Map-Reduce Website. http://docs.mongodb.org/manual/core/map-reduce/. Online; last accessed 14-October-2013.
[11]
S. S. Muchnick. Advanced Compiler Design & Implementation. Morgan Kaufmann Publishers, Inc., 1997.
[12]
Oracle, Inc. Java SE. http://www.oracle.com/technetwork/java/javase/overview/index.html.
[13]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In Proceedings of the 13th IEEE International Symposium on High Performance Computer Architecture, pages 13--24, 2007.
[14]
J. Talbot, R. M. Yoo, and C. Kozyrakis. Phoenix++: Modular MapReduce for Shared-Memory Systems. In Proceedings of the 2nd International Workshop on MapReduce and its Applications, pages 9--16, 2011.
[15]
The Apache Software Foundation. Hadoop Project Website. http://hadoop.apache.org/.
[16]
The Apache Software Foundation. Mahout Project Website. http://mahout.apache.org/. Online; last accessed 23-July-2013.
[17]
T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to Rule Them All. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, pages 187--204, 2013.
[18]
R. M. Yoo, A. Romano, and C. Kozyrakis. Phoenix Rebirth: Scalable MapReduce on a Large-scale Shared-Memory System. In IEEE International Symposium on Workload Characterization, pages 198--207, 2009.
Index Terms
- Towards co-designed optimizations in parallel frameworks: a MapReduce case study
Recommendations
Parallel Programming Paradigms and Frameworks in Big Data Era
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these ...
Comments
Information & Contributors
Information
Published In

May 2016
487 pages
ISBN:9781450341288
DOI:10.1145/2903150
- General Chairs:
- Gianluca Palermo,
- John Feo,
- Program Chairs:
- Antonino Tumeo,
- Hubertus Franke
Copyright © 2016 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
- Micron Foundation: Micron Technology Foundation, Inc.
- ACM: Association for Computing Machinery
- Politecnico di Milano: Politecnico di Milano
- SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
- IBM: IBM
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 16 May 2016
Check for updates
Qualifiers
- Research-article
Funding Sources
Conference
Acceptance Rates
CF '16 Paper Acceptance Rate 30 of 94 submissions, 32%;
Overall Acceptance Rate 273 of 785 submissions, 35%
Upcoming Conference
CF '25
- Sponsor:
- sigmicro
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 81Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025
Other Metrics
Citations
Cited By
View allView Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in