Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Grain graphs: OpenMP performance analysis made easy

Published: 27 February 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Average programmers struggle to solve performance problems in OpenMP programs with tasks and parallel for-loops. Existing performance analysis tools visualize OpenMP task performance from the runtime system's perspective where task execution is interleaved with other tasks in an unpredictable order. Problems with OpenMP parallel for-loops are similarly difficult to resolve since tools only visualize aggregate thread-level statistics such as load imbalance without zooming into a per-chunk granularity. The runtime system/threads oriented visualization provides poor support for understanding problems with task and chunk execution time, parallelism, and memory hierarchy utilization, forcing average programmers to rely on experts or use tedious trial-and-error tuning methods for performance. We present grain graphs, a new OpenMP performance analysis method that visualizes grains -- computation performed by a task or a parallel for-loop chunk instance -- and highlights problems such as low parallelism, work inflation and poor parallelization benefit at the grain level. We demonstrate that grain graphs can quickly reveal performance problems that are difficult to detect and characterize in fine detail using existing visualizations in standard OpenMP programs, simplifying OpenMP performance analysis. This enables average programmers to make portable optimizations for poor performing OpenMP programs, reducing pressure on experts and removing the need for tedious trial-and-error tuning.

    References

    [1]
    L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6):685--701, 2010.
    [2]
    J. M. Arul, G.-J. Hwang, and H.-Y. Ko. GOMP profiler: A profiler for OpenMP task level parallelism. Computer Science and Engineering, 3(3):56--66, 2013.
    [3]
    E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. Parallel and Distributed Systems, IEEE Transactions on, 20(3): 404--418, 2009.
    [4]
    Barcelona Supercomputing Center. OmpSs task dependency graph, 2013. http://pm.bsc.es/ompss-docs/user-guide/run-programs-plugin-instrument-tdg.html. Accessed 10 April 2015.
    [5]
    C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proc. of the International Conference on Parallel Architecture and Compilation Techniques (17th PACT'08), pages 72--81. ACM, 2008.
    [6]
    S. Brinkmann, J. Gracia, and C. Niethammer. Task debugging with temanejo. In Tools for High Performance Computing 2012, pages 13--21. Springer, 2013.
    [7]
    H. Brunst and B. Mohr. Performance analysis of large-scale OpenMP and hybrid MPI/OpenMP applications with Vampir NG. In OpenMP Shared Memory Parallel Programming, number 4315 in LNCS, pages 5--14. Springer, 2008.
    [8]
    D. Chase and Y. Lev. Dynamic circular work-stealing deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA'05, pages 21--28. ACM, 2005.
    [9]
    J. Cownie, J. DelSignore, John, B. de Supinski, and K. Warren. DMPL: An OpenMP DLL debugging interface. In OpenMP Shared Memory Parallel Programming, volume 2716 of LNCS, pages 137--146. Springer, 2003.
    [10]
    G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006.
    [11]
    Y. Ding, K. Hu, K. Wu, and Z. Zhao. Performance monitoring and analysis of task-based OpenMP. PLoS ONE, 8(10):e77742, 2013.
    [12]
    A. Drebes, A. Pop, K. Heydemann, A. Cohen, and N. Drach-Temam. Aftermath: A graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems. In 7th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG, associated with HiPEAC), Vienna, Austria, 2014.
    [13]
    A. Duran, J. Corbalán, and E. Ayguadé. An adaptive cut-off for task parallelism. In High Performance Computing, Networking, Storage and Analysis. SC'08. International Conference for, pages 1--11, 2008.
    [14]
    A. Duran, X. Teruel, R. Ferrer, X. Martorell, and E. Ayguade. Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In Parallel Processing, 2009. ICPP'09. International Conference on, pages 124--131, 2009.
    [15]
    A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02): 173--193, 2011.
    [16]
    A. E. Eichenberger, J. Mellor-Crummey, M. Schulz, M. Wong, N. Copty, R. Dietrich, X. Liu, E. Loh, and D. Lorenz. OMPT: An OpenMP tools application programming interface for performance analysis. In OpenMP in the Era of Low Power Devices and Accelerators, pages 171--185. Springer, 2013.
    [17]
    K. Fürlinger. OpenMP application profiling---state of the art and directions for the future. Procedia Computer Science, 1(1):2107--2114, 2010.
    [18]
    K. Fürlinger and D. Skinner. Performance profiling for OpenMP tasks. In Evolving OpenMP in an Age of Extreme Parallelism, number 5568 in LNCS, pages 132--139. Springer, Jan. 2009.
    [19]
    M. Geimer, F. Wolf, B. J. Wylie, E. Ábrahám, D. Becker, and B. Mohr. The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6):702--719, 2010.
    [20]
    Intel Corporation. OpenMP* Runtime to align with Intel Parallel Studio XE 2015 Composer Edition Update 3, 2015. https://www.openmprtl.org/download. Accessed 10 April 2015.
    [21]
    K. E. Isaacs, A. Bhatele, J. Lifflander, D. Böhme, T. Gamblin, M. Schulz, B. Hamann, and P.-T. Bremer. Recovering logical structure from Charm++ event traces. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC, volume 15, 2015.
    [22]
    D. Lorenz, P. Philippen, D. Schmidl, and F. Wolf. Profiling of OpenMP tasks with score-p. In Parallel Processing Workshops (ICPPW), 2012 41st International Conference on, pages 444--453, 2012.
    [23]
    M. McCool, J. Reinders, and A. Robison. Structured Parallel Programming: Patterns for Efficient Computation. Access Online via Elsevier, 2012.
    [24]
    B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. The paradyn parallel performance measurement tool. Computer, 28 (11):37--46, 1995.
    [25]
    M. S. Mohsen, R. Abdullah, and Y. M. Teo. A survey on performance tools for OpenMP. World Academy of Science, Engineering and Technology, 49, 2009.
    [26]
    P. J. Mucci, S. Browne, C. Deane, and G. Ho. PAPI: A portable interface to hardware performance counters. In Proceedings of the Department of Defense HPCMP Users Group Conference, pages 7--10, 1999.
    [27]
    A. Muddukrishna, P. A. Jonsson, V. Vlassov, and M. Brorsson. Locality-aware task scheduling and data distribution on NUMA systems. In OpenMP in the Era of Low Power Devices and Accelerators, number 8122 in LNCS, pages 156--170. Springer, 2013.
    [28]
    A. Muddukrishna, P. A. Jonsson, and M. Brorsson. Characterizing task-based OpenMP programs. PLoS ONE, 10(4):e0123545, 2015.
    [29]
    M. S. Müller, J. Baron, W. C. Brantley, H. Feng, D. Hackenberg, R. Henschel, G. Jost, D. Molka, C. Parrott, J. Robichaux, et al. Spec OMP2012---an application benchmark suite for parallel systems using openmp. In OpenMP in a Heterogeneous World, pages 223--236. Springer, 2012.
    [30]
    S. L. Olivier, B. R. de Supinski, M. Schulz, and J. F. Prins. Characterizing and mitigating work time inflation in task parallel programs. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pages 1--12, 2012.
    [31]
    OpenMP Architecture Review Board. OpenMP application program interface version 4.5, 2015. http://www.openmp.org/mp-documents/openmp-4.5.pdf.
    [32]
    V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments, volume 44, pages 17--31, 1995.
    [33]
    A. Podobas and M. Brorsson. A comparison of some recent task-based parallel programming models. In Proceedings of the 3rd Workshop on Programmability Issues for Multi-Core Computers, (MULTIPROG' 2010), Pisa, 2010.
    [34]
    A. Podobas, M. Brorsson, and K.-F. Faxén. A comparative performance study of common and popular task-centric programming frameworks. Concurrency and Computation: Practice and Experience, 27(1):1--28, 2015.
    [35]
    T. B. Schardl, B. C. Kuszmaul, I. Lee, W. M. Leiserson, C. E. Leiserson, and others. The Cilkprof Scalability Profiler. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, pages 89--100. ACM. URL http://dl.acm.org/citation.cfm?id=2755603.
    [36]
    D. Schmidl, P. Philippen, D. Lorenz, C. Rössel, M. Geimer, D. a. Mey, B. Mohr, and F. Wolf. Performance analysis techniques for task-based OpenMP applications. In OpenMP in a Heterogeneous World, number 7312 in LNCS, pages 196--209. Springer, 2012.
    [37]
    D. Schmidl, C. Terboven, D. a. Mey, and M. S. Müller. Suitability of performance tools for OpenMP task-parallel programs. In Tools for High Performance Computing 2013, pages 25--37. Springer, 2014.
    [38]
    H. Servat, X. Teruel, G. Llort, A. Duran, J. Gimenez, X. Martorell, E. Ayguadé, and J. Labarta. On the instrumentation of OpenMP and OmpSs tasking constructs. In Euro-Par Workshops, volume 7640 of LNCS, pages 414--428. Springer, 2012.
    [39]
    O. Sinnen. Task scheduling for parallel systems, volume 60. John Wiley & Sons, 2007.
    [40]
    M. E. Smoot, K. Ono, J. Ruscheinski, P.-L. Wang, and T. Ideker. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics, 27(3):431--432, 2011.
    [41]
    V. Subotic, S. Brinkmann, V. Marjanovic, R. M. Badia, J. Gracia, C. Niethammer, E. Ayguade, J. Labarta, and M. Valero. Programmability and portability for exascale: Top down programming methodology and tools with starss. Journal of Computational Science, 4(6):450--456, 2013.
    [42]
    G. Team. Gecode: Generic constraint development environment, 2006. http://www.gecode.org.
    [43]
    M. Thottethodi, S. Chatterjee, and A. R. Lebeck. Tuning Strassen's matrix multiplication for memory efficiency. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, pages 1--14. IEEE Computer Society, 1998.
    [44]
    V. Tovinkere and M. Voss. Flow graph designer: A tool for designing and analyzing Intel® threading building blocks flow graphs. In ICPP Workshops, pages 149--158. IEEE Computer Society, 2014.
    [45]
    yWorks GmBh. yEd graph editor, 2015. http://www.yworks.com/en/products_yed_about.html. Accessed 10 April 2015.

    Cited By

    View all
    • (2021)EasyPAP: A Framework for Learning Parallel ProgrammingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.018Online publication date: Aug-2021
    • (2020)EASYPAP: a Framework for Learning Parallel Programming2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00059(276-283)Online publication date: May-2020
    • (2020)AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and LoopsOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_11(165-180)Online publication date: 1-Sep-2020
    • Show More Cited By

    Index Terms

    1. Grain graphs: OpenMP performance analysis made easy

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 8
      PPoPP '16
      August 2016
      405 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3016078
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
        February 2016
        420 pages
        ISBN:9781450340922
        DOI:10.1145/2851141
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 February 2016
      Published in SIGPLAN Volume 51, Issue 8

      Check for updates

      Author Tags

      1. OpenMP
      2. performance analysis
      3. performance visualization
      4. task-based programs

      Qualifiers

      • Research-article

      Funding Sources

      • ARTEMIS-JU

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)EasyPAP: A Framework for Learning Parallel ProgrammingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.018Online publication date: Aug-2021
      • (2020)EASYPAP: a Framework for Learning Parallel Programming2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00059(276-283)Online publication date: May-2020
      • (2020)AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and LoopsOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_11(165-180)Online publication date: 1-Sep-2020
      • (2020)STHEM: Productive Implementation of High-Performance Embedded Image Processing ApplicationsTowards Ubiquitous Low-power Image Processing Platforms10.1007/978-3-030-53532-2_5(79-91)Online publication date: 16-Dec-2020
      • (2019)Analysis and Optimization of Task Granularity on the Java Virtual MachineACM Transactions on Programming Languages and Systems (TOPLAS)10.1145/333849741:3(1-47)Online publication date: 16-Jul-2019
      • (2023)Measuring Thread Timing to Assess the Feasibility of Early-bird Message DeliveryProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605884(119-126)Online publication date: 7-Aug-2023
      • (2023)Traveler: Navigating Task Parallel Traces for Performance AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320937529:1(788-797)Online publication date: Jan-2023
      • (2021)Providing In-depth Performance Analysis for Heterogeneous Task-based Applications with StarVZ2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00013(16-25)Online publication date: Jun-2021
      • (2020)Extending High-Level Synthesis with High-Performance Computing Performance Visualization2020 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER49012.2020.00047(371-380)Online publication date: Sep-2020
      • (2020)AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and LoopsOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_11(165-180)Online publication date: 1-Sep-2020
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media