Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/SC.2014.17acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Cypress: combining static and dynamic analysis for top-down communication trace compression

Published: 16 November 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Communication traces are increasingly important, both for parallel applications' performance analysis/optimization, and for designing next-generation HPC systems. Meanwhile, the problem size and the execution scale on supercomputers keep growing, producing prohibitive volume of communication traces. To reduce the size of communication traces, existing dynamic compression methods introduce large compression overhead with the job scale. We propose a hybrid static-dynamic method that leverages information acquired from static analysis to facilitate more effective and efficient dynamic trace compression. Our proposed scheme, Cypress, extracts a program communication structure tree at compile time using inter-procedural analysis. This tree naturally contains crucial iterative computing features such as the loop structure, allowing subsequent runtime compression to "fill in", in a "top-down" manner, event details into the known communication template. Results show that Cypress reduces intra-process and inter-process compression overhead up to 5× and 9× respectively over state-of-the-art dynamic methods, while only introducing very low compiling overhead.

    References

    [1]
    J. S. Vetter and F. Mueller, "Communication characteristics of large-scale scientific applications for contemporary cluster architectures," in IPDPS'02, 2002, pp. 853--865.
    [2]
    D. Becker, F. Wolf, W. Frings, M. Geimer, B. J. Wylie, and B. Mohr, "Automatic trace-based performance analysis of metacomputing applications," IPDPS'07, p. 48, 2007.
    [3]
    A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha, "A framework for application performance modeling and prediction," in SC'02, 2002, pp. 1--17.
    [4]
    N. Choudhury, Y. Mehta, and T. L. W. et al., "Scaling an optimistic parallel simulation of large-scale interconnection networks," in WSC'05, 2005, pp. 591--600.
    [5]
    R. Susukita, H. Ando, and M. A. et al., "Performance prediction of large-scale parallell system and application using macro-level simulation," in SC'08, 2008, pp. 1--9.
    [6]
    J. Zhai, W. Chen, and W. Zheng, "Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node," in PPoPP'10. ACM, 2010, pp. 305--314.
    [7]
    Intel Ltd., "Intel trace analyzer & collector. http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm."
    [8]
    W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach, "VAMPIR: Visualization and analysis of MPI resources," Supercomputer, vol. 12, no. 1, Jan. 1996.
    [9]
    S. Shende and A. D. Malony, "TAU: The tau parallel performance system," International Journal of High Performance Computing Applications, vol. 20, no. 2, 2006.
    [10]
    B. Mohr and F. Wolf, "KOJAK--A tool set for automatic performance analysis of parallel programs," in Euro-Par, 2003.
    [11]
    M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr, "The Scalasca performance toolset architecture," Concurrency and Computation: Practice and Experience, vol. 22, no. 6, pp. 702--719, 2010.
    [12]
    Advanced Simulation and Computing Program, "The asc smg2000 benchmark code, https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/smg/."
    [13]
    B. J. N. Wylie, M. Geimer, and F. Wolf, "Performance measurement and analysis of large-scale parallel applications on leadership computing systems," Sci. Program., vol. 16, no. 2-3, pp. 167--181, Apr. 2008.
    [14]
    M. Noeth, F. Mueller, M. Schulz, and B. de Supinski, "Scalable compression and replay of communication traces in massively parallel environments," in IPDPS'07, 2007, pp. 1--11.
    [15]
    Q. Xu, J. Subhlok, and N. Hammen, "Efficient discovery of loop nests in execution traces," in MASCOTS'10, 2010, pp. 193--202.
    [16]
    S. Krishnamoorthy and K. Agarwal, "Scalable communication trace compression," in IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), 2010.
    [17]
    S. Shao, A. K. Jones, and R. G. Melhem, "A compiler-based communication analysis approach for multiprocessor systems," in IPDPS, 2006.
    [18]
    X. Wu and F. Mueller, "Elastic and scalable tracing and accurate replay of non-deterministic events," in ICS'13, 2013, pp. 59--68.
    [19]
    "The LLVM compiler framework. http://llvm.org."
    [20]
    S. S. Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997.
    [21]
    M. Emami, R. Ghiya, and L. J. Hendren, "Context-sensitive interprocedural points-to analysis in the presence of function pointers," in PLDI'94. ACM, 1994, pp. 242--256.
    [22]
    A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman, "LogGP: Incorporating long messages into the LogP model-one step closer towards a realistic model for parallel computation," in SPAA'95. New York, NY, USA: ACM, 1995.
    [23]
    J. Zhang, J. Zhai, W. Chen, and W. Zheng, "Process mapping for mpi collective communications," in Euro-Par'09. Springer, 2009, pp. 81--92.
    [24]
    D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow, The NAS Parallel Benchmarks 2.0. Moffett Field, CA: NAS Systems Division, NASA Ames Research Center, 1995.
    [25]
    J. Wei, C. W. Muelder, K.-L. Ma, S. M. Legensky, C. P. Stone, D. Hiepler, and E. P. Duque, "Ifdtcintelligent in-situ feature detection, extraction, tracking and visualization for turbulent flow simulations," in ICCFD'12, Big Island, Hawaii, July 2012.
    [26]
    A. Knupfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel, "Introducing the open trace format (OTF)," in International Conference on Computational Science, 2006, pp. 526--533.
    [27]
    H. Chen, W. Chen, J. Huang, B. Robert, and H. Kuhn, "Mpipp: an automatic profile-guided parallel process placement toolset for smp clusters and multiclusters," in ICS'06. ACM, 2006, pp. 353--360.
    [28]
    J. S. Vetter and M. O. McCracken, "Statistical scalability analysis of communication operations in distributed applications," in PPoPP'01, 2001, pp. 123--132.
    [29]
    A. Knupfer and W. Nagel, "Construction and compression of complete call graphs for post-mortem program trace analysis," in ICPP'05, Washington, DC, USA, 2005, pp. 165--172.
    [30]
    P. Ratn, F. Mueller, B. R. de Supinski, and M. Schulz, "Preserving time in large-scale communication traces," in ICS'08. New York, NY, USA: ACM, 2008, pp. 46--55.
    [31]
    J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng, "FACT: Fast communication trace collection for parallel applications through program slicing," in SC'09, 2009.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2014
    1054 pages
    ISBN:9781479955008
    • General Chair:
    • Trish Damkroger,
    • Program Chair:
    • Jack Dongarra

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 16 November 2014

    Check for updates

    Author Tags

    1. high performance computing
    2. message passing
    3. performance analysis
    4. trace compression

    Qualifiers

    • Research-article

    Conference

    SC '14
    Sponsor:

    Acceptance Rates

    SC '14 Paper Acceptance Rate 83 of 394 submissions, 21%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)VaproProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508411(150-162)Online publication date: 2-Apr-2022
    • (2022)PerFlowProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508405(177-191)Online publication date: 2-Apr-2022
    • (2018)vSensorACM SIGPLAN Notices10.1145/3200691.317849753:1(124-136)Online publication date: 10-Feb-2018
    • (2018)vSensorProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178497(124-136)Online publication date: 10-Feb-2018
    • (2017)Efficient process mapping in geo-distributed cloud data centersProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126913(1-12)Online publication date: 12-Nov-2017
    • (2015)Clock delta compression for scalable order-replay of non-deterministic parallel applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807642(1-12)Online publication date: 15-Nov-2015
    • (2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274587643:1(309-320)Online publication date: 15-Jun-2015
    • (2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745876(309-320)Online publication date: 15-Jun-2015

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media