Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mapping parallelism to multi-cores: a machine learning based approach

Published: 14 February 2009 Publication History

Abstract

The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.

References

[1]
D. H. Bailey, E. Barszcz, et al. The NAS parallel benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, 1991.
[2]
B. Barnes, B. Rountree, et al. A regression-based approach to scalability prediction. In ICS'08, 2008.
[3]
E. B. Bernhard, M. G. Isabelle, et al. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, 1992.
[4]
C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, U. K., 1996.
[5]
F. Blagojevic, X. Feng, et al. Modeling multi-grain parallelism on heterogeneous multicore processors: A case study of the Cell BE. In HiPEAC'08, 2008.
[6]
J. Cavazos, G. Fursin, et al. Rapidly selecting good compiler optimizations using performance counters. In CGO'07, 2007.
[7]
K. D. Cooper, P. J. Schielke, et al. Optimizing for reduced code space using genetic algorithms. In LCTES'99, 1999.
[8]
J. Corbalan, X. Martorell, et al. Performance-driven processor allocation. IEEE Transaction Parallel Distribution System, 16(7):599--611, 2005.
[9]
L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Comput. Sci. Eng., 5(1):46--55, 1998.
[10]
E. C. David, M. K. Richard, et al. LogP: a practical model of parallel computation. Communications of the ACM, 39(11):78--85, 1996.
[11]
M. Gabriel and M. John. Cross-architecture performance predictions for scientific applications using parameterized models. In SIGMETRICS'04, 2004.
[12]
M. R. Guthaus, J. S. Ringenberg, et al. Mibench: A free, commercially representative embedded benchmark suite, 2001.
[13]
H. Hofstee. Future microprocessors and off-chip SOP interconnect. Advanced Packaging, IEEE Transactions on, 27(2):301--303, May 2004.
[14]
S. Ilya, K. Robert, et al. A case study in top-down performance estimation for a large-scale parallel application. In PPoPP'06, 2006.
[15]
E. Ipek, B. R. de Supinski, et al. An approach to performance prediction for parallel applications. In Euro-Par'05, 2005.
[16]
Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406--471, 1999.
[17]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO'04, 2004.
[18]
C. Lee. UTDSP benchmark suite, http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html.
[19]
G. V. Leslie. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.
[20]
C. Liao and B. Chapman. A compile-time cost model for OpenMP. In IPDPS'07, 2007.
[21]
C. L. Liu and W. L. James. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46--61, 1973.
[22]
S. Long, G. Fursin, et al. A cost-aware parallel workload allocation approach based on machine learning. In NPC '07, 2007.
[23]
C. K. Luk, Robert Cohn, et al. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI'05, 2005.
[24]
B. S. Macey and A. Y. Zomaya. A performance evaluation of CP list scheduling heuristics for communication intensive task graphs. In IPPS/SPDP'98, 1998.
[25]
Z. Qin, C. Ioana, et al. Pipa: pipelined profiling and analysis on multicore systems. In CGO'08, 2008.
[26]
J. Ramanujam and P. Sadayappan. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In SuperComputing'89, 1989.
[27]
T. Xinmin, G. Milind, et al. Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures. In IPDPS'03, 2003.
[28]
Z. Yun and V. Michael. Runtime empirical selection of loop schedulers on hyperthreaded SMPs. In IPDPS'05, 2005.

Cited By

View all
  • (2024)On the Impact of Heterogeneity on Federated Learning at the Edge with DGA Malware DetectionProceedings of the Asian Internet Engineering Conference 202410.1145/3674213.3674215(10-17)Online publication date: 9-Aug-2024
  • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
  • (2023)Optimizing performance and energy across problem sizes through a search space exploration and machine learningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104720180(104720)Online publication date: Oct-2023
  • Show More Cited By

Index Terms

  1. Mapping parallelism to multi-cores: a machine learning based approach

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 4
      PPoPP '09
      April 2009
      294 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1594835
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
        February 2009
        322 pages
        ISBN:9781605583976
        DOI:10.1145/1504176
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 February 2009
      Published in SIGPLAN Volume 44, Issue 4

      Check for updates

      Author Tags

      1. artificial neural networks
      2. compiler optimization
      3. machine learning
      4. performance modeling
      5. support vector machine

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)55
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)On the Impact of Heterogeneity on Federated Learning at the Edge with DGA Malware DetectionProceedings of the Asian Internet Engineering Conference 202410.1145/3674213.3674215(10-17)Online publication date: 9-Aug-2024
      • (2024)IDaTPA: importance degree based thread partitioning approach in thread level speculationDiscover Computing10.1007/s10791-024-09440-x27:1Online publication date: 19-Jun-2024
      • (2023)Optimizing performance and energy across problem sizes through a search space exploration and machine learningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104720180(104720)Online publication date: Oct-2023
      • (2022)Learning Intermediate Representations using Graph Neural Networks for NUMA and Prefetchers Optimization2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00120(1206-1216)Online publication date: May-2022
      • (2022)Adaptive Model Selection for Video Super Resolution2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00172(1088-1094)Online publication date: Dec-2022
      • (2022)Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directionsFrontiers of Computer Science10.1007/s11704-022-0625-816:5Online publication date: 23-May-2022
      • (2022)Predicting number of threads using balanced datasets for openMP regionsComputing10.1007/s00607-022-01081-6105:5(999-1017)Online publication date: 30-Apr-2022
      • (2021)Building representative and balanced datasets of OpenMP parallel regions2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP52278.2021.00019(67-74)Online publication date: Mar-2021
      • (2021)AnghaBenchProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370322(378-390)Online publication date: 27-Feb-2021
      • (2021)Using hardware performance counters to speed up autotuning convergence on GPUsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.10.003Online publication date: Oct-2021
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media