Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581784.3607049acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Itoyori: Reconciling Global Address Space and Global Fork-Join Task Parallelism

Published: 11 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    This paper introduces Itoyori, a task-parallel runtime system designed to tackle the challenge of scaling task parallelism (more specifically, nested fork-join parallelism) beyond a single node. The partitioned global address space (PGAS) model is often employed in task-parallel systems, but naively combining them can lead to poor performance due to fine-grained and redundant remote memory accesses. Itoyori addresses this issue by automatically caching global memory accesses at runtime, enabling efficient cache sharing among parallel tasks running on the same processor. As a real-world case study, we ported an existing task-parallel implementation of the Fast Multipole Method (FMM) to distributed memory with Itoyori and achieved a 7.5× speedup when scaled from a single node to 12 nodes and up to 6.0× faster performance than without caching. This study demonstrates that global-view fork-join programming can be made practical and scalable, while requiring minimal changes to the shared-memory code.

    References

    [1]
    Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2000. The Data Locality of Work Stealing. In Proceedings of the Twelfth Annual ACM Symposium on Parallel Algorithms and Architectures (Bar Harbor, Maine, USA) (SPAA '00). 1--12.
    [2]
    Sarita V. Adve and Mark D. Hill. 1990. Weak Ordering - A New Definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture (Seattle, Washington, USA) (ISCA '90). 2--14.
    [3]
    Shigeki Akiyama and Kenjiro Taura. 2015. Uni-Address Threads: Scalable Thread Management for RDMA-Based Work Stealing. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (Portland, Oregon, USA) (HPDC '15). 15--26.
    [4]
    Shigeki Akiyama and Kenjiro Taura. 2016. Scalable Work Stealing of Native Threads on an x86-64 Infiniband Cluster. Journal of Information Processing 24, 3 (May 2016), 583--596.
    [5]
    Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2009. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. In Proceedings of the 15th International European Conference on Parallel and Distributed Computing (Delft, The Netherlands) (Euro-Par '09). 863--874.
    [6]
    Eduard Ayguadé, Nawal Copty, Alejandro Duran, Jay Hoeflinger, Yuan Lin, Federico Massaioli, Xavier Teruel, Priya Unnikrishnan, and Guansong Zhang. 2008. The Design of OpenMP Tasks. IEEE Transactions on Parallel and Distributed Systems 20, 3 (June 2008), 404--418.
    [7]
    John Bachan, Scott Baden, Dan Bonachea, Johnny Corbino, Johnathan Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian Van Straalen, and Daniel Waters. 2022. UPC++ v1.0 Programmer's Guide, Revision 2022.9.0. Technical Report LBNL-2001479. Lawrence Berkeley National Laboratory, USA.
    [8]
    Ayon Basumallik and Rudolf Eigenmann. 2006. Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, New York, USA) (PPoPP '06). 119--128.
    [9]
    Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah, USA) (SC '12). 66:1--66:11.
    [10]
    J. K. Bennett, J. B. Carter, and W. Zwaenepoel. 1990. Munin: Distributed Shared Memory Based on Type-Specific Memory Coherence. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Seattle, Washington, USA) (PPOPP '90). 168--176.
    [11]
    B.N. Bershad, M.J. Zekauskas, and W.A. Sawdon. 1993. The Midway Distributed Shared Memory System. In Digest of Papers. The 38th IEEE Computer Society International Conference (San Francisco, California, USA) (COMPCON Spring '93). 528--537.
    [12]
    Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling Irregular Parallel Computations on Hierarchical Caches. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA) (SPAA '11). 355--366.
    [13]
    Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, and Keith H. Randall. 1996. An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms. In Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures (Padua, Italy) (SPAA '96). 297--308.
    [14]
    Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, and Keith H. Randall. 1996. Dag-Consistent Distributed Shared Memory. In Proceedings of the 10th International Parallel Processing Symposium (Honolulu, Hawaii, USA) (IPPS '96). 132--141.
    [15]
    Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, California, USA) (PPoPP '95). 207--216.
    [16]
    Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. J. ACM 46, 5 (Sept. 1999), 720--748.
    [17]
    George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Herault, and Jack J. Dongarra. 2013. PaRSEC: Exploiting Heterogeneity to Enhance Scalability. Computing in Science & Engineering 15, 6 (2013), 36--45.
    [18]
    Javier Bueno, Luis Martinell, Alejandro Duran, Montse Farreras, Xavier Martorell, Rosa M Badia, Eduard Ayguade, and Jesús Labarta. 2011. Productive Cluster Programming with OmpSs. In Proceedings of the 17th International European Conference on Parallel and Distributed Computing (Bordeaux, France) (Euro-Par '11). 555--566.
    [19]
    Qingchao Cai, Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. 2018. Efficient Distributed Memory Management with RDMA and Caching. Proceedings of the VLDB Endowment 11, 11 (July 2018), 1604--1617.
    [20]
    Hannah Cartier, James Dinan, and D. Brian Larkins. 2021. Optimizing Work Stealing Communication with Structured Atomic Operations. In Proceedings of the 50th International Conference on Parallel Processing (Lemont, Illinois, USA) (ICPP '21). 36:1--36:10.
    [21]
    Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications 21, 3 (2007), 291--312.
    [22]
    Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS Community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model (New York, New York, USA) (PGAS '10). 1--3.
    [23]
    Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (San Diego, California, USA) (OOPSLA '05). 519--538.
    [24]
    Ho-Ren Chuang, Robert Lyerly, Stefan Lankes, and Binoy Ravindran. 2020. Scaling Shared Memory Multiprocessing Applications in Non-Cache-Coherent Domains. In Proceedings of the 13th ACM International Systems and Storage Conference (Haifa, Israel) (SYSTOR '20). 13--24.
    [25]
    Salvatore Di Girolamo, Flavio Vella, and Torsten Hoefler. 2017. Transparent Caching for RMA Systems. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (Orlando, Florida, USA) (IPDPS '17). 1018--1027.
    [26]
    James Dinan, Sriram Krishnamoorthy, D. Brian Larkins, Jarek Nieplocha, and P. Sadayappan. 2008. Scioto: A Framework for Global-View Task Parallelism. In Proceedings of the 37th International Conference on Parallel Processing (Portland, Oregon, USA) (ICPP '08). 586--593.
    [27]
    James Dinan, D. Brian Larkins, Ponnuswamy Sadayappan, Sriram Krishnamoorthy, and Jarek Nieplocha. 2009. Scalable Work Stealing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Portland, Oregon, USA) (SC '09). 53:1--53:11.
    [28]
    Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick. 2005. UPC: Distributed Shared Memory Programming. John Wiley & Sons.
    [29]
    Wataru Endo, Shigeyuki Sato, and Kenjiro Taura. 2020. MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA. In Proceedings of 2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (Virtual Event) (IPDRM '20). 9--16.
    [30]
    Michael P. Ferguson and Daniel Buettner. 2015. Caching Puts and Gets in a PGAS Language Runtime. In Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming Models (Washington, District of Columbia, USA) (PGAS '15). 13--24.
    [31]
    Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (Montreal, Quebec, Canada) (PLDI '98). 212--223.
    [32]
    Karl Fuerlinger, Tobias Fuchs, and Roger Kowalewski. 2016. DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms. In Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications (Sydney, NSW, Australia) (HPCC '16). 983--990.
    [33]
    Thierry Gautier, Xavier Besseron, and Laurent Pigeon. 2007. KAAPI: A Thread Scheduling Runtime System for Data Flow Computations on Cluster of Multi-Processors. In Proceedings of the 2007 International Workshop on Parallel Symbolic Computation (London, Ontario, Canada) (PASCO '07). 15--23.
    [34]
    Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. 1990. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture (Seattle, Washington, USA) (ISCA '90). 15--26.
    [35]
    Sayan Ghosh, Yanfei Guo, Pavan Balaji, and Assefaw H. Gebremedhin. 2021. RMACXX: An Efficient High-Level C++ Interface over MPI-3 RMA. In Proceedings of the 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (Melbourne, Australia) (CCGrid '21). 143--155.
    [36]
    Max Grossman, Vivek Kumar, Zoran Budimlić, and Vivek Sarkar. 2016. Integrating Asynchronous Task Parallelism with OpenSHMEM. In Proceedings of the Third Workshop on OpenSHMEM and Related Technologies (Baltimore, Maryland, USA) (OpenSHMEM '16). 3--17.
    [37]
    Tasuku Hiraishi, Masahiro Yasugi, Seiji Umatani, and Taiichi Yuasa. 2009. Backtracking-Based Load Balancing. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Raleigh, North Carolina, USA) (PPoPP '09). 55--64.
    [38]
    Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. 1992. Compiling Fortran D for MIMD Distributed-Memory Machines. Commun. ACM 35, 8 (1992), 66--80.
    [39]
    Torsten Hoefler, James Dinan, Rajeev Thakur, Brian Barrett, Pavan Balaji, William Gropp, and Keith Underwood. 2015. Remote Memory Access Programming in MPI-3. ACM Transactions on Parallel Computing 2, 2 (July 2015), 1--26.
    [40]
    Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. HPX: A Task Based Programming Model in a Global Address Space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (Eugene, Oregon, USA) (PGAS '14). 6:1--6:11.
    [41]
    Stefanos Kaxiras, David Klaftenegger, Magnus Norgren, Alberto Ros, and Konstantinos Sagonas. 2015. Turning Centralized Coherence and Distributed Critical-Section Execution on Their Head: A New Approach for Scalable Distributed Shared Memory. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (Portland, Oregon, USA) (HPDC '15). 3--14.
    [42]
    Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, and Willy Zwaenepoel. 1994. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference (San Francisco, California, USA) (WTEC '94).
    [43]
    Pete Keleher, Alan L. Cox, and Willy Zwaenepoel. 1992. Lazy Release Consistency for Software Distributed Shared Memory. In Proceedings of the 19th Annual International Symposium on Computer Architecture (Queensland, Australia) (ISCA '92). 13--21.
    [44]
    Charles H. Koelbel, David Loveman, Robert S. Schreiber, Guy L. Steele Jr., and Mary Zosel. 1993. High Performance Fortran Handbook. The MIT Press.
    [45]
    Vivek Kumar, Yili Zheng, Vincent Cavé, Zoran Budimlić, and Vivek Sarkar. 2014. HabaneroUPC++: A Compiler-Free PGAS Library. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (Eugene, Oregon, USA) (PGAS '14). 5:1--5:10.
    [46]
    Okwan Kwon, Fahed Jubair, Rudolf Eigenmann, and Samuel Midkiff. 2012. A Hybrid Approach of OpenMP for Clusters. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New Orleans, Louisiana, USA) (PPoPP '12). 75--84.
    [47]
    Jinpil Lee and Mitsuhisa Sato. 2010. Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems. In Proceedings of the 39th International Conference on Parallel Processing Workshops (San Diego, California, USA) (ICPPW '10). 413--420.
    [48]
    Kai Li and Paul Hudak. 1989. Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems 7, 4 (Nov. 1989), 321--359.
    [49]
    Jimmy Aguilar Mena, Omar Shaaban, Vicenç Beltran, Paul Carpenter, Eduard Ayguade, and Jesus Labarta. 2022. OmpSs-2@Cluster: Distributed Memory Execution of Nested OpenMP-style Tasks. In Proceedings of the 28th International European Conference on Parallel and Distributed Computing (Glasgow, Scotland, UK) (Euro-Par '22). 319--334.
    [50]
    Seung-Jai Min, Costin Iancu, and Katherine Yelick. 2011. Hierarchical Work Stealing on Manycore Clusters. In Proceedings of the Fifth Conference on Partitioned Global Address Space Programming Models (Galveston Island, Texas, USA) (PGAS '11). 1--10.
    [51]
    Eric Mohr, David A. Kranz, and Robert H. Halstead. 1990. Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming (Nice, France) (LFP '90). 185--197.
    [52]
    Alessandro Morari, Antonino Tumeo, Daniel Chavarría-Miranda, Oreste Villa, and Mateo Valero. 2014. Scaling Irregular Applications through Data Aggregation and Software Multithreading. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (Phoenix, Arizona, USA) (IPDPS '14). 1126--1135.
    [53]
    Jun Nakashima and Kenjiro Taura. 2014. MassiveThreads: A Thread Library for High Productivity Languages. Concurrent Objects and Beyond 8665 (Jan. 2014), 222--238.
    [54]
    Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2014. Grappa: A Latency-Tolerant Runtime for Large-Scale Irregular Applications. In Proceedings of the First International Workshop on Rack-Scale Computing (Amsterdam, The Netherlands) (WRSC '14). 1--7.
    [55]
    Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-Tolerant Software Distributed Shared Memory. In Proceedings of the 2015 USENIX Annual Technical Conference (Denver, Colorado, USA) (USENIX ATC '15). 291--305.
    [56]
    Jaroslaw Nieplocha, Robert J. Harrison, and Richard J. Littlefield. 1996. Global Arrays: A Nonuniform Memory Access Programming Model for High-Performance Computers. The Journal of Supercomputing 10, 2 (1996), 169--189.
    [57]
    Robert W. Numrich and John Reid. 1998. Co-Array Fortran for Parallel Programming. SIGPLAN Fortran Forum 17, 2 (Aug. 1998), 1--31.
    [58]
    Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P. Sadayappan, and Chau-Wen Tseng. 2006. UTS: An Unbalanced Tree Search Benchmark. In Proceedings of the 19th International Conference on Languages and Compilers for Parallel Computing (New Orleans, Los Angeles, USA) (LCPC '06). 235--250.
    [59]
    Jeeva Paudel, Olivier Tardieu, and José Nelson Amaral. 2013. On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks. In Proceedings of the 42nd International Conference on Parallel Processing (Lyon, France) (ICPP '13). 100--109.
    [60]
    Keith H. Randall. 1998. Cilk: Efficient Multithreaded Computing. Ph. D. Dissertation. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
    [61]
    James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly Media.
    [62]
    Tao B. Schardl and I-Ting Angelina Lee. 2023. OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (Montreal, QC, Canada) (PPoPP '23). 189--203.
    [63]
    Joseph Schuchart and José Gracia. 2019. Global Task Data-Dependencies in PGAS Applications. In High Performance Computing: the 34th International Conference, ISC High Performance 2019 (Frankfurt/Main, Germany) (ISC '19). 312--329.
    [64]
    Shumpei Shiina and Kenjiro Taura. 2019. Almost Deterministic Work Stealing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado, USA) (SC '19). 47:1--47:16.
    [65]
    Shumpei Shiina and Kenjiro Taura. 2022. Distributed Continuation Stealing is More Scalable than You Might Think. In Proceedings of the 2022 IEEE International Conference on Cluster Computing (Heidelberg, Germany) (Cluster '22). 129--141.
    [66]
    Shumpei Shiina and Kenjiro Taura. 2022. Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing. IEEE Transactions on Parallel and Distributed Systems 33, 12 (Dec. 2022), 4530--4546.
    [67]
    Min Si, Huansong Fu, Jeff R. Hammond, and Pavan Balaji. 2021. OpenSHMEM over MPI as a Performance Contender: Thorough Analysis and Optimizations. In Proceedings of the 8th Workshop on OpenSHMEM and Related Technologies (Virtual Event) (OpenSHMEM '21). 39--60.
    [68]
    Matthew D. Sinclair, Johnathan Alsop, and Sarita V. Adve. 2015. Efficient GPU Synchronization without Scopes: Saying No to Complex Consistency Models. In Proceedings of the 48th International Symposium on Microarchitecture (Waikiki, Hawaii, USA) (MICRO-48). 647--659.
    [69]
    Kenjiro Taura, Jun Nakashima, Rio Yokota, and Naoya Maruyama. 2012. A Task Parallel Implementation of Fast Multipole Methods. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis-Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (Salt Lake City, Utah, USA) (ScalA' 12). 617--625.
    [70]
    Keisuke Tsugane, Jinpil Lee, Hitoshi Murai, and Mitsuhisa Sato. 2018. Multi-Tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-Core Clusters. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (Chiyoda, Tokyo, Japan) (HPC Asia 2018). 75--85.
    [71]
    Rio Yokota and Lorena Barba. 2020. GitHub repository: exafmm/exafmm-beta. Retrieved 2022-11-30 from https://github.com/exafmm/exafmm-beta
    [72]
    Rio Yokota, Lorena A. Barba, Tetsu Narumi, and Kenji Yasuoka. 2013. Petascale Turbulence Simulation Using a Highly Parallel Fast Multipole Method on GPUs. Computer Physics Communications 184, 3 (2013), 445--455.
    [73]
    Jin Zhang, Xiangyao Yu, Zhengwei Qi, and Haibing Guan. 2022. Falcon: A Timestamp-based Protocol to Maximize the Cache Efficiency in the Distributed Shared Memory. In Proceedings of the 36th IEEE International Parallel and Distributed Processing Symposium (Lyon, France) (IPDPS '22). 974--984.
    [74]
    Wei Zhang, Olivier Tardieu, David Grove, Benjamin Herta, Tomio Kamada, Vijay Saraswat, and Mikio Takeuchi. 2014. GLB: Lifeline-Based Global Load Balancing Library in X10. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications (Orlando, Florida, USA) (PPAA '14). 31--40.
    [75]
    Zhang Zhang, Jeevan Savant, and Steven Seidel. 2006. A UPC Runtime System Based on MPI and POSIX Threads. In Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (Montbeliard-Sochaux, France) (PDP '06). 195--202.
    [76]
    Yili Zheng, Amir Kamil, Michael B. Driscoll, Hongzhang Shan, and Katherine Yelick. 2014. UPC++: A PGAS Extension for C++. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (Phoenix, Arizona, USA) (IPDPS '14). 1105--1114.

    Index Terms

    1. Itoyori: Reconciling Global Address Space and Global Fork-Join Task Parallelism

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
        November 2023
        1428 pages
        ISBN:9798400701092
        DOI:10.1145/3581784
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 November 2023

        Check for updates

        Badges

        Author Tags

        1. PGAS
        2. task parallelism
        3. fork-join
        4. work stealing
        5. cache coherence

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        SC '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 448
          Total Downloads
        • Downloads (Last 12 months)448
        • Downloads (Last 6 weeks)34
        Reflects downloads up to 30 Jul 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media