research-article

Cypress: combining static and dynamic analysis for top-down communication trace compression

Authors:

Xiongchao Tang,

Xiaosong Ma, and

Wenguang ChenAuthors Info & Claims

SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2014

Pages 143 - 153

https://doi.org/10.1109/SC.2014.17

Published: 16 November 2014 Publication History

Abstract

Communication traces are increasingly important, both for parallel applications' performance analysis/optimization, and for designing next-generation HPC systems. Meanwhile, the problem size and the execution scale on supercomputers keep growing, producing prohibitive volume of communication traces. To reduce the size of communication traces, existing dynamic compression methods introduce large compression overhead with the job scale. We propose a hybrid static-dynamic method that leverages information acquired from static analysis to facilitate more effective and efficient dynamic trace compression. Our proposed scheme, Cypress, extracts a program communication structure tree at compile time using inter-procedural analysis. This tree naturally contains crucial iterative computing features such as the loop structure, allowing subsequent runtime compression to "fill in", in a "top-down" manner, event details into the known communication template. Results show that Cypress reduces intra-process and inter-process compression overhead up to 5× and 9× respectively over state-of-the-art dynamic methods, while only introducing very low compiling overhead.

References

[1]

J. S. Vetter and F. Mueller, "Communication characteristics of large-scale scientific applications for contemporary cluster architectures," in IPDPS'02, 2002, pp. 853--865.

Digital Library

[2]

D. Becker, F. Wolf, W. Frings, M. Geimer, B. J. Wylie, and B. Mohr, "Automatic trace-based performance analysis of metacomputing applications," IPDPS'07, p. 48, 2007.

[3]

A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha, "A framework for application performance modeling and prediction," in SC'02, 2002, pp. 1--17.

Digital Library

[4]

N. Choudhury, Y. Mehta, and T. L. W. et al., "Scaling an optimistic parallel simulation of large-scale interconnection networks," in WSC'05, 2005, pp. 591--600.

Digital Library

[5]

R. Susukita, H. Ando, and M. A. et al., "Performance prediction of large-scale parallell system and application using macro-level simulation," in SC'08, 2008, pp. 1--9.

Digital Library

[6]

J. Zhai, W. Chen, and W. Zheng, "Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node," in PPoPP'10. ACM, 2010, pp. 305--314.

Digital Library

[7]

Intel Ltd., "Intel trace analyzer & collector. http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm."

[8]

W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach, "VAMPIR: Visualization and analysis of MPI resources," Supercomputer, vol. 12, no. 1, Jan. 1996.

[9]

S. Shende and A. D. Malony, "TAU: The tau parallel performance system," International Journal of High Performance Computing Applications, vol. 20, no. 2, 2006.

Digital Library

[10]

B. Mohr and F. Wolf, "KOJAK--A tool set for automatic performance analysis of parallel programs," in Euro-Par, 2003.

[11]

M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr, "The Scalasca performance toolset architecture," Concurrency and Computation: Practice and Experience, vol. 22, no. 6, pp. 702--719, 2010.

Digital Library

[12]

Advanced Simulation and Computing Program, "The asc smg2000 benchmark code, https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/smg/."

[13]

B. J. N. Wylie, M. Geimer, and F. Wolf, "Performance measurement and analysis of large-scale parallel applications on leadership computing systems," Sci. Program., vol. 16, no. 2-3, pp. 167--181, Apr. 2008.

Digital Library

[14]

M. Noeth, F. Mueller, M. Schulz, and B. de Supinski, "Scalable compression and replay of communication traces in massively parallel environments," in IPDPS'07, 2007, pp. 1--11.

[15]

Q. Xu, J. Subhlok, and N. Hammen, "Efficient discovery of loop nests in execution traces," in MASCOTS'10, 2010, pp. 193--202.

Digital Library

[16]

S. Krishnamoorthy and K. Agarwal, "Scalable communication trace compression," in IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), 2010.

Digital Library

[17]

S. Shao, A. K. Jones, and R. G. Melhem, "A compiler-based communication analysis approach for multiprocessor systems," in IPDPS, 2006.

Digital Library

[18]

X. Wu and F. Mueller, "Elastic and scalable tracing and accurate replay of non-deterministic events," in ICS'13, 2013, pp. 59--68.

Digital Library

[19]

"The LLVM compiler framework. http://llvm.org."

[20]

S. S. Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997.

Digital Library

[21]

M. Emami, R. Ghiya, and L. J. Hendren, "Context-sensitive interprocedural points-to analysis in the presence of function pointers," in PLDI'94. ACM, 1994, pp. 242--256.

Digital Library

[22]

A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman, "LogGP: Incorporating long messages into the LogP model-one step closer towards a realistic model for parallel computation," in SPAA'95. New York, NY, USA: ACM, 1995.

Digital Library

[23]

J. Zhang, J. Zhai, W. Chen, and W. Zheng, "Process mapping for mpi collective communications," in Euro-Par'09. Springer, 2009, pp. 81--92.

Digital Library

[24]

D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow, The NAS Parallel Benchmarks 2.0. Moffett Field, CA: NAS Systems Division, NASA Ames Research Center, 1995.

[25]

J. Wei, C. W. Muelder, K.-L. Ma, S. M. Legensky, C. P. Stone, D. Hiepler, and E. P. Duque, "Ifdtcintelligent in-situ feature detection, extraction, tracking and visualization for turbulent flow simulations," in ICCFD'12, Big Island, Hawaii, July 2012.

[26]

A. Knupfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel, "Introducing the open trace format (OTF)," in International Conference on Computational Science, 2006, pp. 526--533.

Digital Library

[27]

H. Chen, W. Chen, J. Huang, B. Robert, and H. Kuhn, "Mpipp: an automatic profile-guided parallel process placement toolset for smp clusters and multiclusters," in ICS'06. ACM, 2006, pp. 353--360.

Digital Library

[28]

J. S. Vetter and M. O. McCracken, "Statistical scalability analysis of communication operations in distributed applications," in PPoPP'01, 2001, pp. 123--132.

Digital Library

[29]

A. Knupfer and W. Nagel, "Construction and compression of complete call graphs for post-mortem program trace analysis," in ICPP'05, Washington, DC, USA, 2005, pp. 165--172.

Digital Library

[30]

P. Ratn, F. Mueller, B. R. de Supinski, and M. Schulz, "Preserving time in large-scale communication traces," in ICS'08. New York, NY, USA: ACM, 2008, pp. 46--55.

Digital Library

[31]

J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng, "FACT: Fast communication trace collection for parallel applications through program slicing," in SC'09, 2009.

Digital Library

Cited By

Zheng LZhai JTang XWang HYu TJin YSong SChen WLee JAgrawal KSpear M(2022)VaproProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508411(150-162)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508411
Jin YWang HZhong RZhang CZhai JLee JAgrawal KSpear M(2022)PerFlowProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508405(177-191)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508405
Tang XZhai JQian XHe BXue WChen W(2018)vSensorACM SIGPLAN Notices10.1145/3200691.317849753:1(124-136)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178497
Show More Cited By

Index Terms

Cypress: combining static and dynamic analysis for top-down communication trace compression

Recommendations

Pilgrim: scalable and (near) lossless MPI tracing
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Traces of MPI communications are used by many performance analysis and visualization tools. Storing exhaustive traces of large scale MPI applications is infeasible, due to their large volume. Aggregated or lossy MPI traces are smaller, but provide much ...
Read More
Lossless Trace Compression

The tremendous storage space required for a useful data base of program traces has prompted a search for trace reduction techniques. In this paper, we discuss a range of information-lossless address and instruction trace compression schemes that can ...
Read More
Elastic and scalable tracing and accurate replay of non-deterministic events
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

SCALATRACE represents the state-of-the-art of parallel application tracing for high performance computing (HPC). This paper presents SCALATRACE II, a next generation tracer that delivers even higher trace compression capability, even when events are not ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2014

1054 pages

ISBN:9781479955008

General Chair:
Trish Damkroger
Lawrence Livermore National Laboratory, Livermore, California
,
Program Chair:
Jack Dongarra
University of Tennessee, Knoxville, Tennessee

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 November 2014

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '14

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC '14: International Conference for High Performance Computing, Networking, Storage and Analysis

November 16 - 21, 2014

Louisana, New Orleans

Acceptance Rates

SC '14 Paper Acceptance Rate 83 of 394 submissions, 21%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
250
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Zheng LZhai JTang XWang HYu TJin YSong SChen WLee JAgrawal KSpear M(2022)VaproProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508411(150-162)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508411
Jin YWang HZhong RZhang CZhai JLee JAgrawal KSpear M(2022)PerFlowProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508405(177-191)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508405
Tang XZhai JQian XHe BXue WChen W(2018)vSensorACM SIGPLAN Notices10.1145/3200691.317849753:1(124-136)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178497
Tang XZhai JQian XHe BXue WChen WKrall AGross T(2018)vSensorProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178497(124-136)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3178487.3178497
Zhou AGong YHe BZhai JMohr BRaghavan P(2017)Efficient process mapping in geo-distributed cloud data centersProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126913(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126913
Sato KAhn DLaguna ILee GSchulz MKern JVetter J(2015)Clock delta compression for scalable order-replay of non-deterministic parallel applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807642(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2807591.2807642
Jin YMa XLiu MLiu QLogan JPodhorszki NChoi JKlasky S(2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274587643:1(309-320)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2796314.2745876
Jin YMa XLiu MLiu QLogan JPodhorszki NChoi JKlasky SLin BXu JSengupta SShah D(2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745876(309-320)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2745844.2745876

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents