Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2907950.2907951acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

A machine learning approach to mapping streaming workloads to dynamic multicore processors

Published: 13 June 2016 Publication History

Abstract

Dataflow programming languages facilitate the design of data intensive programs such as streaming applications commonly found in embedded systems. They also expose parallelism that can be exploited using multicore processors which are now part of the mobile landscape. In recent years a shift has occurred towards heterogeneity ( ARM big.LITTLE) and reconfigurability. Dynamic Multicore Processors (DMPs) bridge the gap between fully reconfigurable processors and homogeneous multicore systems. They can re-allocate their resources at runtime to create larger more powerful logical processors fine-tuned to the workload. Unfortunately, there exists no accurate method to determine how to partition the cores in a DMP among application threads. Often programmers rely on analyzing the application manually and using a set of hand picked heuristics. This leads to sub-optimal performance, reducing the potential of DMPs. What is needed is a way to determine the optimal partitioning and grouping of resources to maximize performance. As a first step, this paper studies the effect of thread partitioning and hardware resource allocation on a set of StreamIt applications. We show that the resulting space is not trivial and exhibits a large performance variation depending on the combination of parameters. We introduce a machine-learning based methodology to tackle the space complexity. Our machine-learning model is able to directly predict the best combination of parameters using static code features. The predicted set of parameters leads to performance on-par with the best performance found in a space of more than 32,000 configurations per application.

References

[1]
J. Auerbach, D. Bacon, I. Burcea, P. Cheng, S. Fink, R. Rabbah, and S. Shukla. A compiler and runtime for heterogeneous computing. In DAC, 2012, pages 271–276, June 2012.
[2]
S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook. Tile64 - processor: A 64-core soc with mesh interconnect. In ISSCC 2008. IEEE International, pages 88–598, Feb 2008.
[3]
F. Bower, D. Sorin, and L. Cox. The impact of dynamically heterogeneous multicore processors on thread scheduling. Micro, IEEE, 28(3): 17–25, May 2008. ISSN 0272-1732.
[4]
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: Stream computing on graphics hardware. In ACM SIGGRAPH 2004, pages 777–786, New York, NY, USA, 2004. ACM.
[5]
P. M. Carpenter, A. Ramirez, and E. Ayguade. Mapping stream programs onto heterogeneous multiprocessor systems. In CASES ’09, pages 57–66, New York, NY, USA, 2009. ACM.
[6]
J. Chen, M. I. Gordon, W. Thies, M. Zwicker, K. Pulli, and F. Durand. A reconfigurable architecture for load-balanced rendering. In HWWS ’05, pages 71–80, New York, NY, USA, 2005. ACM.
[7]
S. Eyerman and L. Eeckhout. Modeling critical sections in amdahl’s law and its implications for multicore design. SIGARCH Comput. Archit. News, 38(3):362–370, June 2010.
[8]
S. M. Farhad, Y. Ko, B. Burgstaller, and B. Scholz. Profile-guided deployment of stream programs on multicores. LCTES ’12, pages 79–88, New York, NY, USA, 2012. ACM.
[9]
M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. SIGARCH Comput. Archit. News, 30(5):291–303, Oct. 2002. ISSN 0163-5964.
[10]
M. Govindan, B. Robatmili, D. Li, B. Maher, A. Smith, S. W. Keckler, and D. Burger. Scaling power and performance via processor composability. IEEE Transactions on Computers, 63(8):2025–2038, 2014.
[11]
D. P. Gulati, C. Kim, S. Sethumadhavan, S. W. Keckler, and D. Burger. Multitasking workload scheduling on flexible core chip multiprocessors. SIGARCH Comput. Archit. News, 36(2):46–55, May 2008.
[12]
E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: Accommodating software diversity in chip multiprocessors. SIGARCH Comput. Archit. News, 35(2):186–197, June 2007.
[13]
C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable lightweight processors. In MICRO ’07, pages 381–394, Washington, DC, USA, 2007. IEEE Computer Society.
[14]
M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. SIGPLAN Not., 43(6):114–124, June 2008.
[15]
R. R. Newton, L. D. Girod, M. B. Craig, S. R. Madden, and J. G. Morrisett. Design and evaluation of a compiler for embedded stream programs. In LCTES ’08, pages 131–140, New York, NY, USA, 2008. ACM.
[16]
U. of Edinburgh. Edinburgh compute and data facility web site, 1 August 2007, accessed 4th of April. 2016. www.ecdf.ed.ac.uk.
[17]
P. Santos, G. Nazar, F. Anjam, S. Wong, D. Matos, and L. Carro. A fully dynamic reconfigurable noc-based mpsoc: The advantages of total reconfiguration. In HiPEAC ’13, Berlin, Germany, January 2013.
[18]
M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. SIGPLAN Not., 44(3):253–264, Mar. 2009.
[19]
W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In PACT ’10, pages 365–376, New York, NY, USA, 2010. ACM.
[20]
W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In CC, pages 179–196, London, UK, UK, 2002. Springer-Verlag.
[21]
R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, 2003. AAI3121741.
[22]
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. Computer, 30 (9):86–93, Sep 1997.
[23]
Z. Wang and M. F. P. O’boyle. Using machine learning to partition streaming programs. ACM Trans. Archit. Code Optim., 10(3):20:1– 20:25, Sept. 2008.
[24]
Y. Watanabe, J. D. Davis, and D. A. Wood. Widget: Wisconsin decoupled grid execution tiles. SIGARCH Comput. Archit. News, 38 (3):2–13, June 2010.
[25]
P. M. Wells, K. Chakraborty, and G. S. Sohi. Dynamic heterogeneity and the need for multicore virtualization. SIGOPS Oper. Syst. Rev., 43 (2):5–14, Apr. 2009.
[26]
Y. Zhou and D. Wentzlaff. The sharing architecture: Sub-core configurability for iaas clouds. SIGPLAN Not., 49(4):559–574, Feb. 2014.

Cited By

View all
  • (2024)A Dynamic Task Mapping Scheme Based on Machine Learning2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00074(389-397)Online publication date: 12-Apr-2024
  • (2021)FOGA: Flag Optimization with Genetic Algorithm2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA52262.2021.9548573(1-6)Online publication date: 25-Aug-2021
  • (2018)Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.279144029:6(1428-1441)Online publication date: 1-Jun-2018
  • Show More Cited By

Index Terms

  1. A machine learning approach to mapping streaming workloads to dynamic multicore processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems
      June 2016
      122 pages
      ISBN:9781450343169
      DOI:10.1145/2907950
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 51, Issue 5
        LCTES '16
        May 2016
        122 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2980930
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 June 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Dynamic Multicore Processor
      2. Machine Learning
      3. Streaming Programming Languages

      Qualifiers

      • Research-article

      Conference

      LCTES'16

      Acceptance Rates

      Overall Acceptance Rate 116 of 438 submissions, 26%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Dynamic Task Mapping Scheme Based on Machine Learning2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00074(389-397)Online publication date: 12-Apr-2024
      • (2021)FOGA: Flag Optimization with Genetic Algorithm2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA52262.2021.9548573(1-6)Online publication date: 25-Aug-2021
      • (2018)Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.279144029:6(1428-1441)Online publication date: 1-Jun-2018
      • (2017)Synthesizing benchmarks for predictive modelingProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049843(86-99)Online publication date: 4-Feb-2017
      • (2017)Adaptive optimization for OpenCL programs on embedded heterogeneous systemsACM SIGPLAN Notices10.1145/3140582.308104052:5(11-20)Online publication date: 21-Jun-2017
      • (2017)Towards memory-efficient processing-in-memory architecture for convolutional neural networksACM SIGPLAN Notices10.1145/3140582.308103252:5(81-90)Online publication date: 21-Jun-2017
      • (2017)A Study of Dynamic Phase Adaptation Using a Dynamic Multicore ProcessorACM Transactions on Embedded Computing Systems10.1145/312652316:5s(1-19)Online publication date: 27-Sep-2017
      • (2017)Adaptive optimization for OpenCL programs on embedded heterogeneous systemsProceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3078633.3081040(11-20)Online publication date: 21-Jun-2017
      • (2017)Towards memory-efficient processing-in-memory architecture for convolutional neural networksProceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3078633.3081032(81-90)Online publication date: 21-Jun-2017
      • (2017)End-to-End Deep Learning of Optimization Heuristics2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.24(219-232)Online publication date: Sep-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media