research-article

A machine learning approach to mapping streaming workloads to dynamic multicore processors

Authors:

Paul-Jules Micolet,

Christophe DubachAuthors Info & Claims

LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems

Pages 113 - 122

https://doi.org/10.1145/2907950.2907951

Published: 13 June 2016 Publication History

Abstract

Dataflow programming languages facilitate the design of data intensive programs such as streaming applications commonly found in embedded systems. They also expose parallelism that can be exploited using multicore processors which are now part of the mobile landscape. In recent years a shift has occurred towards heterogeneity ( ARM big.LITTLE) and reconfigurability. Dynamic Multicore Processors (DMPs) bridge the gap between fully reconfigurable processors and homogeneous multicore systems. They can re-allocate their resources at runtime to create larger more powerful logical processors fine-tuned to the workload. Unfortunately, there exists no accurate method to determine how to partition the cores in a DMP among application threads. Often programmers rely on analyzing the application manually and using a set of hand picked heuristics. This leads to sub-optimal performance, reducing the potential of DMPs. What is needed is a way to determine the optimal partitioning and grouping of resources to maximize performance. As a first step, this paper studies the effect of thread partitioning and hardware resource allocation on a set of StreamIt applications. We show that the resulting space is not trivial and exhibits a large performance variation depending on the combination of parameters. We introduce a machine-learning based methodology to tackle the space complexity. Our machine-learning model is able to directly predict the best combination of parameters using static code features. The predicted set of parameters leads to performance on-par with the best performance found in a space of more than 32,000 configurations per application.

References

[1]

J. Auerbach, D. Bacon, I. Burcea, P. Cheng, S. Fink, R. Rabbah, and S. Shukla. A compiler and runtime for heterogeneous computing. In DAC, 2012, pages 271–276, June 2012.

Digital Library

[2]

S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook. Tile64 - processor: A 64-core soc with mesh interconnect. In ISSCC 2008. IEEE International, pages 88–598, Feb 2008.

[3]

F. Bower, D. Sorin, and L. Cox. The impact of dynamically heterogeneous multicore processors on thread scheduling. Micro, IEEE, 28(3): 17–25, May 2008. ISSN 0272-1732.

Digital Library

[4]

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: Stream computing on graphics hardware. In ACM SIGGRAPH 2004, pages 777–786, New York, NY, USA, 2004. ACM.

Digital Library

[5]

P. M. Carpenter, A. Ramirez, and E. Ayguade. Mapping stream programs onto heterogeneous multiprocessor systems. In CASES ’09, pages 57–66, New York, NY, USA, 2009. ACM.

Digital Library

[6]

J. Chen, M. I. Gordon, W. Thies, M. Zwicker, K. Pulli, and F. Durand. A reconfigurable architecture for load-balanced rendering. In HWWS ’05, pages 71–80, New York, NY, USA, 2005. ACM.

Digital Library

[7]

S. Eyerman and L. Eeckhout. Modeling critical sections in amdahl’s law and its implications for multicore design. SIGARCH Comput. Archit. News, 38(3):362–370, June 2010.

Digital Library

[8]

S. M. Farhad, Y. Ko, B. Burgstaller, and B. Scholz. Profile-guided deployment of stream programs on multicores. LCTES ’12, pages 79–88, New York, NY, USA, 2012. ACM.

Digital Library

[9]

M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. SIGARCH Comput. Archit. News, 30(5):291–303, Oct. 2002. ISSN 0163-5964.

Digital Library

[10]

M. Govindan, B. Robatmili, D. Li, B. Maher, A. Smith, S. W. Keckler, and D. Burger. Scaling power and performance via processor composability. IEEE Transactions on Computers, 63(8):2025–2038, 2014.

Digital Library

[11]

D. P. Gulati, C. Kim, S. Sethumadhavan, S. W. Keckler, and D. Burger. Multitasking workload scheduling on flexible core chip multiprocessors. SIGARCH Comput. Archit. News, 36(2):46–55, May 2008.

Digital Library

[12]

E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: Accommodating software diversity in chip multiprocessors. SIGARCH Comput. Archit. News, 35(2):186–197, June 2007.

Digital Library

[13]

C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable lightweight processors. In MICRO ’07, pages 381–394, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[14]

M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. SIGPLAN Not., 43(6):114–124, June 2008.

Digital Library

[15]

R. R. Newton, L. D. Girod, M. B. Craig, S. R. Madden, and J. G. Morrisett. Design and evaluation of a compiler for embedded stream programs. In LCTES ’08, pages 131–140, New York, NY, USA, 2008. ACM.

Digital Library

[16]

U. of Edinburgh. Edinburgh compute and data facility web site, 1 August 2007, accessed 4th of April. 2016. www.ecdf.ed.ac.uk.

[17]

P. Santos, G. Nazar, F. Anjam, S. Wong, D. Matos, and L. Carro. A fully dynamic reconfigurable noc-based mpsoc: The advantages of total reconfiguration. In HiPEAC ’13, Berlin, Germany, January 2013.

[18]

M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. SIGPLAN Not., 44(3):253–264, Mar. 2009.

Digital Library

[19]

W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In PACT ’10, pages 365–376, New York, NY, USA, 2010. ACM.

Digital Library

[20]

W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In CC, pages 179–196, London, UK, UK, 2002. Springer-Verlag.

Digital Library

[21]

R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, 2003. AAI3121741.

Digital Library

[22]

E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. Computer, 30 (9):86–93, Sep 1997.

Digital Library

[23]

Z. Wang and M. F. P. O’boyle. Using machine learning to partition streaming programs. ACM Trans. Archit. Code Optim., 10(3):20:1– 20:25, Sept. 2008.

Digital Library

[24]

Y. Watanabe, J. D. Davis, and D. A. Wood. Widget: Wisconsin decoupled grid execution tiles. SIGARCH Comput. Archit. News, 38 (3):2–13, June 2010.

Digital Library

[25]

P. M. Wells, K. Chakraborty, and G. S. Sohi. Dynamic heterogeneity and the need for multicore virtualization. SIGOPS Oper. Syst. Rev., 43 (2):5–14, Apr. 2009.

Digital Library

[26]

Y. Zhou and D. Wentzlaff. The sharing architecture: Sub-core configurability for iaas clouds. SIGPLAN Not., 49(4):559–574, Feb. 2014.

Digital Library

Cited By

An JWu QDong CZhang Z(2024)A Dynamic Task Mapping Scheme Based on Machine Learning2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00074(389-397)Online publication date: 12-Apr-2024
https://doi.org/10.1109/IPEC61310.2024.00074
Tagtekin BHoke BSezer MOzturk M(2021)FOGA: Flag Optimization with Genetic Algorithm2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA52262.2021.9548573(1-6)Online publication date: 25-Aug-2021
https://doi.org/10.1109/INISTA52262.2021.9548573
Wang YChen WYang JLi T(2018)Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.279144029:6(1428-1441)Online publication date: 1-Jun-2018
https://doi.org/10.1109/TPDS.2018.2791440
Show More Cited By

Index Terms

A machine learning approach to mapping streaming workloads to dynamic multicore processors
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A machine learning approach to mapping streaming workloads to dynamic multicore processors
LCTES '16

Dataflow programming languages facilitate the design of data intensive programs such as streaming applications commonly found in embedded systems. They also expose parallelism that can be exploited using multicore processors which are now part of the ...
A Study of Dynamic Phase Adaptation Using a Dynamic Multicore Processor
Special Issue ESWEEK 2017, CASES 2017, CODES + ISSS 2017 and EMSOFT 2017

Heterogeneous processors such as ARM’s big.LITTLE have become popular for embedded systems. They offer a choice between running workloads on a high performance core or a low-energy core leading to increased energy efficiency. However, the core ...
Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors
ScalA '14: Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

Ever since accelerators and coprocessors became the mainstream hardware for throughput-oriented HPC workloads, various programming techniques have been proposed to increase productivity in terms of both the performance and ease-of-use. We evaluate these ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems

June 2016

122 pages

ISBN:9781450343169

DOI:10.1145/2907950

General Chair:
Tei-Wei Kuo,
Program Chair:
David B. Whalley

ACM SIGPLAN Notices Volume 51, Issue 5
LCTES '16
May 2016
122 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2980930
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

LCTES'16

Sponsor:

LCTES'16: SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2016

June 13 - 14, 2016

CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
440
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

An JWu QDong CZhang Z(2024)A Dynamic Task Mapping Scheme Based on Machine Learning2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00074(389-397)Online publication date: 12-Apr-2024
https://doi.org/10.1109/IPEC61310.2024.00074
Tagtekin BHoke BSezer MOzturk M(2021)FOGA: Flag Optimization with Genetic Algorithm2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA52262.2021.9548573(1-6)Online publication date: 25-Aug-2021
https://doi.org/10.1109/INISTA52262.2021.9548573
Wang YChen WYang JLi T(2018)Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.279144029:6(1428-1441)Online publication date: 1-Jun-2018
https://doi.org/10.1109/TPDS.2018.2791440
Cummins CPetoumenos PWang ZLeather HReddi VSmith ATang L(2017)Synthesizing benchmarks for predictive modelingProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049843(86-99)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3049832.3049843
Taylor BMarco VWang Z(2017)Adaptive optimization for OpenCL programs on embedded heterogeneous systemsACM SIGPLAN Notices10.1145/3140582.308104052:5(11-20)Online publication date: 21-Jun-2017
https://dl.acm.org/doi/10.1145/3140582.3081040
Wang YZhang MYang J(2017)Towards memory-efficient processing-in-memory architecture for convolutional neural networksACM SIGPLAN Notices10.1145/3140582.308103252:5(81-90)Online publication date: 21-Jun-2017
https://dl.acm.org/doi/10.1145/3140582.3081032
Micolet PSmith ADubach C(2017)A Study of Dynamic Phase Adaptation Using a Dynamic Multicore ProcessorACM Transactions on Embedded Computing Systems10.1145/312652316:5s(1-19)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126523
Taylor BMarco VWang ZNagarajan VShao Z(2017)Adaptive optimization for OpenCL programs on embedded heterogeneous systemsProceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3078633.3081040(11-20)Online publication date: 21-Jun-2017
https://dl.acm.org/doi/10.1145/3078633.3081040
Wang YZhang MYang JNagarajan VShao Z(2017)Towards memory-efficient processing-in-memory architecture for convolutional neural networksProceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3078633.3081032(81-90)Online publication date: 21-Jun-2017
https://dl.acm.org/doi/10.1145/3078633.3081032
Cummins CPetoumenos PWang ZLeather H(2017)End-to-End Deep Learning of Optimization Heuristics2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.24(219-232)Online publication date: Sep-2017
https://doi.org/10.1109/PACT.2017.24
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents