article

The Tau Parallel Performance System

Authors:

Sameer S. Shende and

Allen D. MalonyAuthors Info & Claims

International Journal of High Performance Computing Applications, Volume 20, Issue 2

Pages 287 - 311

https://doi.org/10.1177/1094342006064482

Published: 01 May 2006 Publication History

Abstract

The ability of performance technology to keep pace with the growing complexity of parallel and distributed systems depends on robust performance frameworks that can at once provide system-specific performance capabilities and support high-level performance problem solving. Flexibility and portability in empirical methods and processes are influenced primarily by the strategies available for instrmentation and measurement, and how effectively they are integrated and composed. This paper presents the TAU (Tuning and Analysis Utilities) parallel performance sytem and describe how it addresses diverse requirements for performance observation and analysis.

References

[1]

Ahn, D., Kufrin, R., Raghuraman, A., and Seo, J. Perfsuite. http://perfsuite.ncsa.uiuc.edu/.]]

[2]

Bell, R., Malony, A. D., and Shende, S. 2003. A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis. Proceedings of the Europar 2003 Coference (LNCS 2790), pp. 17-26.]]

[3]

Bernholdt, D. E., Allan, B. A., Armstrong, R. et al. 2006. A Component Architecture for High-Performance Scientific Computing. Intl. Journal of High-Performance Compuing Applications ACTS Collection Special Issue.]]

Digital Library

[4]

Berrendorf, R., Ziegler, H., and Mohr, B.PCL -- The Performance Counter Library. http://www.fz-juelich.de/zam/PCL/.]]

[5]

Browne, S., Dongarra, J., Garner, N., Ho, G., and Mucci, P.2000. A Portable Programming Interface for Performance Evauation on Modern Processors. International Journal of High Performance Computing Applications14(3):189-204.]]

Digital Library

[6]

Brunst, H., Malony, A. D., Shende, S., and Bell, R. 2003. Online Remote Trace Analysis of Parallel Applications on High-Performance Clusters. Proceedings of the ISHPC Conference (LNCS 2858), pp. 440-449. Springer.]]

[7]

Brunst, H., Nagel, W. E., and Malony, A. D. 2003. A Distriuted Performance Analysis Architecture for Clusters. Prceedings of the IEEE International Conference on Cluster Computing (Cluster 2003), pp. 73-83. IEEE Computer Society.]]

[8]

Buck, B. and Hollingsworth, J.2000. An API for Runtime Code Patching. Journal of High Performance Computing Applcations14(4):317-329.]]

Digital Library

[9]

California Institute of Technology. VTF -- Virtual Test Shock Facility. http://www.cacr.caltech.edu/ASAP.]]

[10]

CCA Forum. The Common Component Architecture Forum. http://www.cca-forum.org.]]

[11]

DeRose, L. 2001. The Hardware Performance Monitor Toolkit. Proceedings of the European Conference on Parallel Computing (EuroPar 2001, LNCS 2150), pp. 122-131. Springer.]]

Digital Library

[12]

DeRose, L. and Reed, D. 1998. SvPablo: A Multi-Language Architecture-Independent Performance Analysis System. Proceedings of the International Conference on Parallel Processing, ICPP '99, pp. 311-318.]]

Digital Library

[13]

DeRose, L. and Wolf, F. 2002. CATCH - A Call-Graph Based Automatic Tool for Capture of Hardware Performance Metrics for MPI and OpenMP Applications. Proceedings of the Europar 2002 Conference.]]

Digital Library

[14]

Dongarra, J., Malony, A. D., Moore, S., Mucci, P., and Shende, S. 2003. Performance Instrumentation and Measurement for Terascale Systems. Proceedings of the ICCS 2003 Conference (LNCS 2660), pp. 53-62.]]

[15]

European Center for Parallelism of Barcelona (CEPBA). Paaver -- Parallel Program Visualization and Analysis Tool - reference manual. http://www.cepba.upc.es/paraver.]]

[16]

Forum, M. P. I.1994. MPI: A Message Passing Interface Stanard. International Journal of Supercomputer Applications(Special Issue on MPI) 8(3/4/).]]

[17]

Graham, S., Kessler, P., and McKusick, M.1982. gprof: A Call Graph Execution Profiler. SIGPLAN '82 Symposium on Compiler Construction pp. 120-126.]]

Digital Library

[18]

Gropp, W. and Lusk, E.User's Guide for MPE: Extensions for MPI Programs. http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpeguide/paper.htm.]]

[19]

HPC++ Working Group. 1995. HPC++ White Papers. Techncal Report TR 95633, Center for Research on Parallel Computation.]]

[20]

Huck, K., Malony, A., Bell, R., and Morris, A.2005. Design and Implementation of a Parallel Performance Data Maagement Framework. Proc.International Conference on Parallel Processing, ICPP-05.]]

Digital Library

[21]

IBM. IBM DB2 Information Management Software. http://www.ibm.com/software/data.]]

[22]

Intel Corporation. Intel (R) Trace Analyzer 4.0. http://www.intel.com/software/products/cluster/tanalyzer/.]]

[23]

Kessler, P.1990. Fast Breakpoints: Design and Implementation. SIGPLAN Notices25(6):78-84.]]

Digital Library

[24]

Kohn, S., Kumfert, G., Painter, J., and Ribbens, C. 2001. Divorcing Language Dependencies from a Scientific Sofware Library. Proceedings of the 10th SIAM Conference on Parallel Processing.]]

[25]

Lindlan, K., Cuny, J., Malony, A. D., Shende, S., Mohr, B., Rivenburgh, R., and Rasmussen, C. 2000. A Tool Framwork for Static and Dynamic Analysis of Object-Oriented Software with Templates. Proceedings of the SC'2000 Conference.]]

Digital Library

[26]

Malony, A. D. 1990. Performance Observability. PhD thesis, University of Illinois, Urbana-Champaign.]]

Digital Library

[27]

Malony, A. and Shende, S.2000. Performance Technology for Complex Parallel and Distributed Systems. In: Distributed and Parallel Systems: From Concepts to Applications(eds. G. Kotsis and P. Kacsuk), pp. 37-46, Norwell, MA: Klwer.]]

Digital Library

[28]

Malony, A., Shende, S., Bell, R., Li, K., Li, L., and Trebon, N.2003. Advances in the TAU Performance System. In: Peformance Analysis and Grid Computing (eds. V. Getov, M. Gerndt, A. Hoisie, A. Malony, B. Miller), pp. 129-144. Norwell, MA: Kluwer.]]

Digital Library

[29]

Mellor-Crummey, J., Fowler, R., and Marlin, G.2002. HPCView: A Tool for Top-down Analysis of Node Peformance. The Journal of Supercomputing23:81-104.]]

Digital Library

[30]

Mohr, B.KOJAK -- Kit for Objective Judgment and Knowedge-based Detection of Bottlenecks. http://www.fz-juelich.de/zam/kojak.]]

[31]

Mohr, B., Malony, A., Shende, S., and Wolf, F.2002. Design and Prototype of a Performance Tool Interface for OpenMP. The Journal of Supercomputing23:105-128.]]

Digital Library

[32]

Mohr, B. and Wolf, F. 2003. KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications. Proceeings of the European Conference on Parallel Computing (EuroPar 2003, LNCS 2790), pp. 1301-1304. Springer.]]

[33]

Mucci, P. Dynaprof. http://www.cs.utk.edu/mucci/dynaprof.]]

[34]

MySQL. MySQL: The World's Most Popular Open Source Database. www.mysql.org.]]

[35]

Nagel, W., Arnold, A., Weber, M., Hoppe, H.-C., and Solchebach, K.1996. VAMPIR: Visualization and Analysis of MPI Resources. Supercomputer12(1):69-80.]]

[36]

Norris, B., Ray, J., McInnes, L., Bernholdt, D., Elwasif, W., Malony, A., and Shende, S. 2004. Computational quality of service for scientific components. Proceedings of the International Symposium on Component-based Software Engineering (CBSE7). Springer.]]

[37]

Oracle Corporation. Oracle. http://www.oracle.com.]]

[38]

PostgreSQL. PostgreSQL: The World's Most Advanced Open Source Database. http://www.postgresql.org.]]

[39]

Ray, J., Trebon, N., Shende, S., Armstrong, R., and Malony, A. 2004. Performance Measurement and Modeling of Coponent Applications in a High Performance Computing Environment: A Case Study. Proc. International Parallel and Distributed Processing Symposium (IPDPS'04).]]

[40]

Sarukkai, S. and Malony, A. D.1993. Perturbation Analysis of High Level Instrumentation for SPMD Programs. SIGPLAN Notices28(7).]]

Digital Library

[41]

Seidl, S. 2003. VTF3 - A Fast Vampir Trace File Low-Level Management Library. Technical Report ZHR-R-0304, Dresden University of Technology, Center for High-Peformance Computing.]]

[42]

Shende, S. 2001. The Role of Instrumentation and Mapping in Performance Measurement. PhD thesis, University of Oregon.]]

Digital Library

[43]

Shende, S. and Malony, A. D.2003. Integration and Appliction of TAU in Parallel Java Environments. Concurrency and Computation: Practice and Experience15(3-5):501-519.]]

[44]

Shende, S., Malony, A. D., Cuny, J., Lindlan, K., Beckman, P., and Karmesin, S. 1998. Portable Profiling and Tracing for Parallel Scientific Applications using C++. Proceedings of the SIGMETRICS Symposium on Parallel and Distriuted Tools, SPDT'98, pp. 134-145.]]

Digital Library

[45]

Shende, S., Malony, A. D., Rasmussen, C., and Sottile, M. 2003. A Performance Interface for Component-Based Applications. Proceedings of International Workshop on Performance Modeling, Evaluation and Optimization, International Parallel and Distributed Processing Sympsium.]]

Digital Library

[46]

Song, F., Wolf, F., Bhatia, N., Dongarra, J., and Moore, S. 2004. An Algebra for Cross-Experiment Performance Analysis. Proc. of International Conference on Parallel Processing, ICPP-04.]]

Digital Library

[47]

Subramanya, R. and Reddy, R. 2000. Sandia DNS code for 3D compressible flows - Final Report. Technical Report PSC-Sandia-FR-3.0, Pittsburgh Supercomputing Center, PA.]]

[48]

SUN Microsystems Inc.Java Virtual Machine Profiler Interface (JVMPI). http://java.sun.com/j2se/1.5.0/docs/guide/jvmpi/.]]

[49]

Szyperski, C.1997. Component Software: Beyond Object-Orented Programming. Addison-Wesley.]]

Digital Library

[50]

University of Oregon, A TAU Portable Profiling. http://www.cs.uoregon.edu/research/paracomp/tau.]]

[51]

University of Oregon, b. Tuning and Analysis Utilities User's Guide. http://www.cs.uoregon.edu/research/paracomp/tau.]]

[52]

Vetter, J. and Chambreau, C.mpiP: Lightweight, Scalable MPI Profiling. http://www.llnl.gov/CASC/mpip/.]]

[53]

Viswanathan, D. and Liang, S.2000. Java Virtual Machine Prfiler Interface. IBM Systems Journal39(1):82-95.]]

Digital Library

[54]

Wolf, F., Mohr, B., Dongarra, J., and Moore, S. 2004. Efficient Patern Search in Large Traces through Successive Refinement. Proceedings of the European Conference on Parallel Coputing (EuroPar 2004, LNCS 3149), pp. 47-54. Springer.]]

[55]

Wu, C. E., Bolmarcich, A., Snir, M., Wootton, D., Parpia, F., Chan, A., Lusk, E., and Gropp, W. 2000. From trace geeration to visualization: A performance framework for distributed parallel systems. Proc. of SC2000: High Peformance Networking and Computing.]]

Digital Library

Cited By

Miao DLaguna IGeorgakoudis GParasyris KRubio-González C(2024)MUPPETProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649246(22-31)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649246
Park JHuang XLee JHong T(2024)I/O-signature-based feature analysis and classification of high-performance computing applicationsCluster Computing10.1007/s10586-023-04139-y27:3(3219-3231)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10586-023-04139-y
Dongarra JTourancheau BPearce OBrink S(2023)Finding the forest in the treesInternational Journal of High Performance Computing Applications10.1177/1094342023117568737:3-4(434-441)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1177/10943420231175687
Show More Cited By

Index Terms

The Tau Parallel Performance System
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Concurrent programming languages

Recommendations

TAU Performance System
IWOCL '22: Proceedings of the 10th International Workshop on OpenCL

The TAU Performance System 1 is a versatile performance evaluation tool that supports OpenCL, DPC++/SYCL, OpenMP, and other GPU runtimes. It features a performance profiling and tracing module that is widely portable and can access hardware performance ...
Read More
Workload characterization using the TAU performance system
PARA'06: Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing

Workload characterization is an important technique that helps us understand the performance of parallel applications and the demands they place on the system. It can be used to describe performance effects due to application parameters, compiler ...
Read More
Integrated parallel performance views

The influences of the operating system and system-specific effects on application performance are increasingly important considerations in high performance computing. OS kernel measurement is key to understanding the performance influences and the ...
Read More

Comments

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications

International Journal of High Performance Computing Applications Volume 20, Issue 2

May 2006

148 pages

ISSN:1094-3420

Issue’s Table of Contents

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 May 2006

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

340
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Miao DLaguna IGeorgakoudis GParasyris KRubio-González C(2024)MUPPETProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649246(22-31)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649246
Park JHuang XLee JHong T(2024)I/O-signature-based feature analysis and classification of high-performance computing applicationsCluster Computing10.1007/s10586-023-04139-y27:3(3219-3231)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10586-023-04139-y
Dongarra JTourancheau BPearce OBrink S(2023)Finding the forest in the treesInternational Journal of High Performance Computing Applications10.1177/1094342023117568737:3-4(434-441)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1177/10943420231175687
Sen SVanecek SSchulz M(2023)GPUscout: Locating Data Movement-related Bottlenecks on GPUsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624208(1392-1402)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624208
Luettgau JSnyder SReddy TAwtrey NHarms KBez JWang RLatham RCarns P(2023)Enabling Agile Analysis of I/O Performance Data with PyDarshanProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624207(1380-1391)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624207
Thärigen IHermanns MGeimer M(2023)An Event Model for Trace-Based Performance Analysis of MPI Partitioned Point-to-Point CommunicationProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624205(1357-1367)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624205
Huck KMalony A(2023)ZeroSum: User Space Monitoring of Resource Utilization and Contention on Heterogeneous HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624145(685-695)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624145
Wang YLi J(2023)PEAK: a Light-Weight Profiler for HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624143(677-680)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624143
Khuvis S(2023)BaRRT: Buildtime and Runtime Reproducibility Tool for Software Development and TestingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624142(673-676)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624142
Lu CMilfeld K(2023)REMORA Resource Monitor: Usability, Performance and User Interface ImprovementsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624141(663-672)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624141
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents