Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3295500.3356192acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Preparation and optimization of a diverse workload for a large-scale heterogeneous system

Published: 17 November 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Productivity from day one on supercomputers that leverage new technologies requires significant preparation. An institution that procures a novel system architecture often lacks sufficient institutional knowledge and skills to prepare for it. Thus, the "Center of Excellence" (CoE) concept has emerged to prepare for systems such as Summit and Sierra, currently the top two systems in the Top 500. This paper documents CoE experiences that prepared a workload of diverse applications and math libraries for a heterogeneous system. We describe our approach to this preparation, including our management and execution strategies, and detail our experiences with and reasons for using different programming approaches. Our early science and performance results show that the project enabled significant early seismic science with up to a l4X throughput increase over Cori. In addition to our successes, we discuss our challenges and failures so others may benefit from our experience.

    References

    [1]
    M.J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, and E. Lindahl. 2015. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1--2 (2015).
    [2]
    A.H. Baker, R.D. Falgout, T.V. Kolev, and U.M. Yang. 2012. Scaling hypre's multigrid solvers to 100,000 cores. In High-Performance Scientific Computing. Springer, New York, NY.
    [3]
    A.H. Baker, A. Klawonn, T. Kolev, M. Lanser, O. Rheinbach, and U.M. Yang. 2016. Scalability of classical algebraic multigrid for elasticity to half a million parallel tasks. In Software for Exascale Computing - SPPEXA 2013-2015, H.-J. Bungartz, P. Neumann, and W.E. Nagel (Eds.). Springer, New York, NY.
    [4]
    J. Carreira and A. Zisserman. 2017. Quo vadis, action recognition? A new model and the Kinetics dataset. CoRR (2017). arXiv:1705.07750 http://arxiv.org/abs/1705.07750
    [5]
    G. Cong, G. Domeniconi, J. Shapiro, C.C. Yang, and B. Chen. 2019. Video action recognition with an additional end-to-end trained temporal stream. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV '19). IEEE.
    [6]
    J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V. Le, M.Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A.Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the International Conference on Neural Information Processing Systems - Volume 1 (NIPS '12). Curran Associates, Red Hook, NY.
    [7]
    N. Dryden, N. Maruyama, T. Benson, T. Moon, M. Snir, and B. van Essen. 2019. Improving strong-scaling of CNN training by exploiting finer-grained parallelism. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '19). IEEE Press, Piscataway, NJ. To appear.
    [8]
    A.C. Hindmarsh, P.N. Brown, K.E. Grant, S.L. Lee, R. Serban, D.E. Shumaker, and C.S. Woodward. 2005. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software (TOMS) 31, 3 (2005).
    [9]
    hypre Team. 2019. hypre: High Performance Preconditioners. https://www.llnl.gov/CASC/hypre/. (2019).
    [10]
    H. Johansen, A. Rodgers, N.A. Petersson, D. McCallen, B. Sjogreen, and M. Miah. 2017. Toward exascale earthquake ground motion simulations for near-fault engineering analysis. Computing in Science Engineering 19, 5 (2017).
    [11]
    W. Joubert, R. Archibald, M. Berrill, W.M. Brown, M. Eisenbach, R. Grout, J. Larkin, J. Levesque, B. Messer, M. Norman, B. Philip, R. Sankaran, A. Tharrington, and J. Turner. 2015. Accelerated application development: The ORNL Titan experience. Computers and Electrical Engineering 46 (2015).
    [12]
    N.L. Petroni Jr., T. Fraser, J. Molina, and W.A. Arbaugh. 2004. Copilot - A coprocessor-based kernel runtime integrity monitor. In Proceedings of the Conference on USENIX Security Symposium - Volume 13 (SSYM '04). USENIX Association, Berkeley, CA.
    [13]
    I. Karlin, T. Scogland, A.C. Jacob, S.F. Antao, G.-T. Bercea, C. Bertolli, B.R. de Supinski, E.W. Draeger, A.E. Eichenberger, J. Glosli, H. Jones, A. Kunen, D. Poliakoff, and D.F. Richards. 2016. Early experiences porting three applications to OpenMP 4.5. In OpenMP: Memory, Devices, and Tasks, N. Maruyama, B.R. de Supinski, and M. Wahib (Eds.). Springer, New York, NY.
    [14]
    X. Lian, Y. Huang, Y. Li, and J. Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Proceedings of the International Conference on Neural Information Processing Systems - Volume 2 (NIPS '15). MIT Press, Cambridge, MA.
    [15]
    S.J. Marrink, H.J. Risselada, S. Yefimov, D.P. Tieleman, and A.H. de Vries. 2007. The MARTINI force field: Coarse grained model for niomolecular simulations. The Journal of Physical Chemistry B 111, 27 (July 2007).
    [16]
    A.A. Mirin, D.F. Richards, J.N. Glosli, E.W. Draeger, B. Chan, J.-L. Fattebert, W.D. Krauss, T. Oppelstrup, J.J. Rice, J.A. Gunnels, V. Gurev, C. Kim, J. Magerlein, M. Reumann, and H.-F. Wen. 2012. Toward real-time modeling of human heart ventricles at cellular resolution: Simulation of drug-induced arrhythmias. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '12). IEEE Press, Piscataway, NJ.
    [17]
    H. Nam, G. Rockefeller, M. Glass, S. Dawson, J. Levesque, and V. Lee. 2017. The Trinity Center of Excellence co-design best practices. Computing in Science Engineering 19, 05 (2017).
    [18]
    F. Di Natale, H.I. Ingólfsson, H. Bhatia, T. Carpenter, T. Oppelstrup, S. Kokkila Schumacher, X. Zhang, S. Sundram, T. Scogland, G. Dharuman, T. Bremer, L. Stanton, M. Surh, C. Neale, C. Lopez, S. Gnanakaran, C. Misale, L. Schneidenbach, C. Kim, B. D'Amora, D. Nissley, F. Streitz, F. Lightstone, and J.N. Glosli. 2019. A massively parallel infrastructure for adaptive multiscale simulation: Modeling RAS initiation pathway for cancer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '19). IEEE Press, Piscataway, NJ. Submitted.
    [19]
    R.J. Neely and B.R. de Supinski. 2017. Application modernization at LLNL and the Sierra Center of Excellence. Computing in Science Engineering 9, 5 (2017).
    [20]
    B. Nicolae, C.H.A. Costa, C. Misale, K. Katrinis, and Y. Park. 2016. Towards memory-optimized data shuffling patterns for big data analytics. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing (CCGRID '16). IEEE Press, Piscataway, NJ.
    [21]
    B. Nicolae, C.H.A. Costa, C. Misale, K. Katrinis, and Y. Park. 2017. Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics. IEEE Transactions on Parallel and Distributed Systems 28, 6 (June 2017).
    [22]
    S. Poudel, R. Pearce, and M. Gokhale. 2015. Towards scalable graph analytics on time dependent graphs. Technical Report. Lawrence Livermore National Lab (LLNL), Livermore, CA.
    [23]
    D.F. Richards, O. Aaziz, J. Cook, H. Finkel, B. Homerding, P. McCorquodale, T. Mintz, S. Moore, A. Bhatele, and R. Pavel. 2018. FY18 proxy app suite release. Milestone report for the ECP proxy app project. Technical Report LLNL-TR-760903. Lawrence Livermore National Lab, Livermore, CA.
    [24]
    R. Sacks, K. Mccandless, E. Feigenbaum, J.M.G. Di Nicola, K.J. Luke, W. Riedel, R.J. Learn, and B.J. Kraines. 2015. The virtual beamline (VBL) laser simulation code. Proceedings of SPIE - The International Society for Optical Engineering 9345 (Feb. 2015).
    [25]
    L. Schneidenbach, C. Misale, B. D'Amora, and C.H.A Costa. 2019. IBM Data Broker. https://github.com/IBM/data-broker. (2019).
    [26]
    H.A. Scott. 2001. Cretin-A radiative transfer capability for laboratory plasmas. Journal of Quantitative Spectroscopy and Radiative Transfer 71, 2 (2001).
    [27]
    F.H. Streitz, J.N. Glosli, and M.V. Patel. 2006. Beyond finite-size scaling in solidification simulations. Physical Review Letters 96 (June 2006). Issue 22.
    [28]
    F.H. Streitz, J.N. Glosli, M.V. Patel, B. Chan, R.K. Yates, B.R. de Supinski, J. Sexton, and J.A. Gunnels. 2005. 100+ TFlop solidification simulations on Blue Gene/L. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '05). IEEE Press, Piscataway, NJ.
    [29]
    MFEM Team. 2019. MFEM: Modular Finite Element Methods Library. https://mfem.org. (2019).
    [30]
    SAMRAI Team. 2019. SAMRAI: Structured Adaptive Mesh Refinement Application Infrastructure. https://computation.llnl.gov/projects/samrai. (2019).
    [31]
    SUNDIALS Team. 2019. SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers. https://www.llnl.gov/CASC/sundials/. (2019).
    [32]
    S.S. Vazhkudai, B.R. de Supinski, A.S. Bland, A. Geist, J. Sexton, J. Kahle, C.J. Zimmer, S. Atchley, S. Oral, D.E. Maxwell, V.G.V. Larrea, A. Bertsch, R. Goldstone, W. Joubert, C. Chambreau, D. Appelhans, R. Blackmore, B. Casses, G. Chochia, G. Davison, M.A. Ezell, T. Gooding, E. Gonsiorowski, L. Grinberg, B. Hanson, B. Hartner, I. Karlin, M. L. Leininger, D. Leverman, C. Marroquin, A. Moody, M. Ohmacht, R. Pankajakshan, F. Pizzano, J. H. Rogers, B. Rosenburg, D. Schmidt, M. Shankar, F. Wang, P. Watson, B. Walkup, L. D. Weems, and J. Yin. 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ.
    [33]
    S. Zhang, A. Choromanska, and Y. LeCun. 2015. Deep learning with elastic averaging SGD. In Proceedings of the International Conference on Neural Information Processing Systems - Volume 1 (NIPS '15). MIT Press, Cambridge, MA.
    [34]
    F. Zhou and G. Cong. 2018. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI '18). International Joint Conferences on Artificial Intelligence Organization.

    Cited By

    View all
    • (2022)Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation SolversACM Transactions on Mathematical Software10.1145/353980148:3(1-24)Online publication date: 10-Sep-2022
    • (2021)Enabling GPU accelerated computing in the SUNDIALS time integration libraryParallel Computing10.1016/j.parco.2021.102836108:COnline publication date: 1-Dec-2021
    • (2020)Porting a 3D seismic modeling code (SW4) to CORAL machinesIBM Journal of Research and Development10.1147/JRD.2019.296021864:3/4(17:1-17:11)Online publication date: 1-May-2020

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2019
    1921 pages
    ISBN:9781450362290
    DOI:10.1145/3295500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPUs
    2. heterogeneous systems
    3. large-scale applications
    4. performance
    5. programming models
    6. project management

    Qualifiers

    • Research-article

    Conference

    SC '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)134
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation SolversACM Transactions on Mathematical Software10.1145/353980148:3(1-24)Online publication date: 10-Sep-2022
    • (2021)Enabling GPU accelerated computing in the SUNDIALS time integration libraryParallel Computing10.1016/j.parco.2021.102836108:COnline publication date: 1-Dec-2021
    • (2020)Porting a 3D seismic modeling code (SW4) to CORAL machinesIBM Journal of Research and Development10.1147/JRD.2019.296021864:3/4(17:1-17:11)Online publication date: 1-May-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media