Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3605731.3605885acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model

Published: 07 September 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Dataflow Software Pipelining for Codelet Model is a coarse-grained code-mapping scheme designed to exploit pipelined parallelism across Codelets executing on different cores. The extended operational semantics of the Codelet model exploit pipelined parallelism across loops (coarse-grained) using single owner FIFO buffers across Codelet’s dependencies. The extended Codelet Model with Dataflow Software Pipelining extensions has shown promising performance benefits by leveraging FIFO buffers to communicate between producer and consumer codelets. These performance gains can be further amplified using an efficient implementation of FIFO buffers using hardware-software co-design principles for an architecture that supports explicit access to scratchpad memory closer to compute cores.
    In this work, we introduce Codelet Pipe which serves as an efficient hardware-software co-designed communication channel between producer-consumer codelets to take advantage of dataflow software pipelining for codelet model. The current implementation of Codelet Pipe exploits Shared Local Memory architectural feature of Intel Iris Pro GPU using OpenCL. Codelet Pipe enables users to construct well-structured Codelet Graphs as well as helps with the challenge of ease of Programmability by relieving user from the responsibility of handling communication between producer-consumer codelet pairs. We demonstrate performance gains using a set of micro-benchmarks for a GPU architecture of strategic importance for exascale supercomputers.

    References

    [1]
    Arvind and R. S. Nikhil. 1990. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39, 3 (March 1990), 300–318. https://doi.org/10.1109/12.48862
    [2]
    R.H. Dennard, F.H. Gaensslen, Hwa-Nien Yu, V.L. Rideout, E. Bassous, and A.R. LeBlanc. 1974. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Journal of Solid-State Circuits 9, 5 (1974), 256–268. https://doi.org/10.1109/JSSC.1974.1050511
    [3]
    Peter J. Denning and Jack B. Dennis. 2010. The Resurgence of Parallelism. Commun. ACM 53, 6 (June 2010), 30–32. https://doi.org/10.1145/1743546.1743560
    [4]
    Jack B. Dennis. 2017. Principles to Support Modular Software Construction. J. Comput. Sci. Technol. 32, 1 (2017), 3–10. https://doi.org/10.1007/s11390-017-1702-6
    [5]
    J. B. Dennis, G. R. Gao, and V. Sarkar. 2012. Determinacy and Repeatability of Parallel Program Schemata. In 2012 Data-Flow Execution Models for Extreme Scale Computing. 1–9. https://doi.org/10.1109/DFM.2012.10
    [6]
    Argonne Leadership Computing Facility. 2023. ALCF AI Testbed. https://www.alcf.anl.gov/alcf-ai-testbed.
    [7]
    Guang Gao, Joshua Suetterlein, and Stéphane Zuckerman. 2011. CAPSL Technical Memo 104: Toward an Execution Model for Extreme-Scale Systems - Runnemede and Beyond. Technical Report 104. University of Delaware.
    [8]
    Guang R. Gao. 1989. Algorithmic Aspects of Balancing Techniques for Pipelined Data Flow Code Generation. In J. Parallel Distrib. Comput, Vol. 6. Academic Press, Inc., Orlando, FL, USA, 39–61. https://doi.org/10.1016/0743-7315(89)90041-5
    [9]
    Guang R. Gao. 1990. A Code Mapping Scheme for Dataflow Software Pipelining. Kluwer Academic Publishers, Norwell, MA, USA.
    [10]
    G. R. Gao and R. Tio. 1989. Instruction set architecture of an efficient pipelined dataflow architecture. In [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track, Vol. 1. 385–392 vol.1. https://doi.org/10.1109/HICSS.1989.47180
    [11]
    Al Geist and Robert Lucas. 2009. Major Computer Science Challenges At Exascale. The International Journal of High Performance Computing Applications 23, 4 (2009), 427–436. https://doi.org/10.1177/1094342009347445 arXiv:https://doi.org/10.1177/1094342009347445
    [12]
    R. Govindarajan, Guang R. Gao, and Palash Desai. 2002. Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks. In Journal of VLSI signal processing systems for signal, image and video technology, Vol. 31. 207–229. https://doi.org/10.1023/A:1015452903532
    [13]
    Khronos Group. 2015. OpenCL Specification version 2.0 (API). https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf.
    [14]
    Khronos Group. 2019. OpenCL C++ Language Specification. https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL_Cxx.pdf.
    [15]
    Khronos Group. 2019. The SYCL 1.2.1 Specification. https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf.
    [16]
    Khronos Group. 2021. Khronos Group. https://www.khronos.org/.
    [17]
    H. Hum, X. Tang, Y. Zhu, G. Gao, X. Xue, H. Cai, and P. Ouellet. Oct 1996. Compiling C for the EARTH multithreaded architecture. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques. 12–23. https://doi.org/10.1109/PACT.1996.552551
    [18]
    Intel Inc. 2020. The Compute Architecture of The compute architecture of Intel Processor Graphics Gen 9. https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf.
    [19]
    Intel Inc. 2020. Intel Unveils New GPU Architecture with High-Performance Computing and AI Acceleration, and oneAPI Software Stack with Unified and Scalable Abstraction for Heterogeneous Architectures. https://newsroom.intel.com/news-releases/intel-unveils-new-gpu-architecture-optimized-for-hpc-ai-oneapi/#gs.1y2s04.
    [20]
    Intel Inc. 2021. Data Parallel C++ Language. https://software.intel.com/content/www/us/en/develop/tools/oneapi/data-parallel-c-plus-plus.html#gs.1dvgr2.
    [21]
    Intel Inc. 2021. Intel OpenCL Built in Intrinsics. https://github.com/intel/pti-gpu/blob/ea615893938f9efd1e736cf8dbaf0bb1f25930ed/chapters/binary_instrumentation/OpenCLBuiltIn.md.
    [22]
    Argonne National Lab. 2021. Aurora Supercomputer. https://www.alcf.anl.gov/aurora.
    [23]
    A. Munshi. 2009. The OpenCL specification., 314 pages. https://doi.org/10.1109/HOTCHIPS.2009.7478342
    [24]
    Top500 Org. 2020. Top 500 Supercomputer List, November 2022. https://www.top500.org/lists/top500/2022/11/.
    [25]
    Nicolas Poggi, Sherif Sakr, and Albert Y. Zomaya. 2019. Microbenchmark. Springer International Publishing, Cham, 1143–1152. https://doi.org/10.1007/978-3-319-77525-8_111
    [26]
    Siddhisanket Raskar. 2023. clCodeletPipe Library. https://github.com/sraskar/clCodeletPipe.
    [27]
    Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2019. Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. 640–645. https://doi.org/10.1109/COMPSAC.2019.10280
    [28]
    Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2019. Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. 640–645. https://doi.org/10.1109/COMPSAC.2019.10280
    [29]
    Siddhisanket Raskar, Jose M Monsalve Diaz, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2023. Implementation of Dataflow Software Pipelining for Codelet Model. In Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering (Coimbra, Portugal) (ICPE ’23). Association for Computing Machinery, New York, NY, USA, 161–172. https://doi.org/10.1145/3578244.3583734
    [30]
    R.R. Schaller. 1997. Moore’s law: past, present and future. IEEE Spectrum 34, 6 (1997), 52–59. https://doi.org/10.1109/6.591665
    [31]
    Kevin Bryan Theobald. 1999. EARTH: An Efficient Architecture for Running Threads. Ph. D. Dissertation. McGill, Montreal.
    [32]
    Stéphane Zuckerman, Joshua Suetterlein, Rob Knauerhase, and Guang R. Gao. 2011. Using a "Codelet" Program Execution Model for Exascale Machines: Position Paper. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (San Jose, California, USA) (EXADAPT ’11). ACM, New York, NY, USA, 64–69. https://doi.org/10.1145/2000417.2000424

    Index Terms

    1. Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops
      August 2023
      217 pages
      ISBN:9798400708428
      DOI:10.1145/3605731
      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 September 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Codelet Pipe
      2. Dataflow Model
      3. Dataflow Software Pipelining
      4. Extended Codelet Model
      5. Hardware-Software Co-design
      6. Many-core Architecture
      7. Programmability
      8. exa-scale

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      ICPP-W 2023

      Acceptance Rates

      Overall Acceptance Rate 91 of 313 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 34
        Total Downloads
      • Downloads (Last 12 months)34
      • Downloads (Last 6 weeks)2

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media