-
Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study
Authors:
Lukasz Lacinski,
Lee Liming,
Steven Turoscy,
Cameron Harr,
Kyle Chard,
Eli Dart,
Paul Durack,
Sasha Ames,
Forrest M. Hoffman,
Ian T. Foster
Abstract:
We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and OR…
▽ More
We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and ORNL, was performed largely automatically by a simple replication tool, a script that invoked Globus to transfer large bundles of files while tracking progress in a database. Under the covers, Globus organized transfers to make efficient use of the high-speed Energy Sciences network (ESnet) and the data transfer nodes deployed at participating sites, and also addressed security, integrity checking, and recovery from a variety of transient failures. This success demonstrates the considerable benefits that can accrue from the adoption of performant data replication infrastructure.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments
Authors:
Jim Pruyne,
Valerie Hayot-Sasson,
Weijian Zheng,
Ryan Chard,
Justin M. Wozniak,
Tekin Bicer,
Kyle Chard,
Ian T. Foster
Abstract:
Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the…
▽ More
Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the steps to be performed on data. As each scan or measurement is performed by an instrument, a new instance of the flow is initiated resulting in a "fleet" of concurrently running flows, with the overall goal to process all the data collected during a potentially long-running experiment. During the course of the experiment, each flow may need to adapt its execution due to changes in the environment, such as computational or storage resource availability, or based on the progress of the fleet as a whole such as completion or discovery of an intermediate result leading to a change in subsequent flow's behavior. We introduce a cloud-based decision engine, Braid, which flows consult during execution to query their run-time environment and coordinate with other flows within their fleet. Braid accepts streams of measurements taken from the run-time environment or from within flow runs which can then be statistically aggregated and compared to other streams to determine a strategy to guide flow execution. For example, queue lengths in execution environments can be used to direct a flow to run computations in one environment or another, or experiment progress as measured by individual flows can be aggregated to determine the progress and subsequent direction of the flows within a fleet. We describe Braid, its interface, implementation and performance characteristics. We further show through examples and experience modifying an existing scientific flow how Braid is used to make adaptable flows.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Rapid detection of rare events from in situ X-ray diffraction data using machine learning
Authors:
Weijian Zheng,
Jun-Sang Park,
Peter Kenesei,
Ahsan Ali,
Zhengchun Liu,
Ian T. Foster,
Nicholas Schwarz,
Rajkumar Kettimuthu,
Antonino Miceli,
Hemant Sharma
Abstract:
High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs o…
▽ More
High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs of traditional data acquisition and reduction approaches pose a barrier to quickly extracting actionable insights and improving the temporal resolution of these snapshots. Here we present a fully automated technique capable of rapidly detecting the onset of plasticity in high-energy X-ray microscopy data. Our technique is computationally faster by at least 50 times than the traditional approaches and works for data sets that are up to 9 times sparser than a full data set. This new technique leverages self-supervised image representation learning and clustering to transform massive data into compact, semantic-rich representations of visually salient characteristics (e.g., peak shapes). These characteristics can be a rapid indicator of anomalous events such as changes in diffraction peak shapes. We anticipate that this technique will provide just-in-time actionable information to drive smarter experiments that effectively deploy multi-modal X-ray diffraction methods that span many decades of length scales.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Deep learning at the edge enables real-time streaming ptychographic imaging
Authors:
Anakha V Babu,
Tao Zhou,
Saugat Kandel,
Tekin Bicer,
Zhengchun Liu,
William Judge,
Daniel J. Ching,
Yi Jiang,
Sinisa Veseli,
Steven Henke,
Ryan Chard,
Yudong Yao,
Ekaterina Sirazitdinova,
Geetika Gupta,
Martin V. Holt,
Ian T. Foster,
Antonino Miceli,
Mathew J. Cherukara
Abstract:
Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials charact…
▽ More
Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
High-Performance Ptychographic Reconstruction with Federated Facilities
Authors:
Tekin Bicer,
Xiaodong Yu,
Daniel J. Ching,
Ryan Chard,
Mathew J. Cherukara,
Bogdan Nicolae,
Rajkumar Kettimuthu,
Ian T. Foster
Abstract:
Beamlines at synchrotron light source facilities are powerful scientific instruments used to image samples and observe phenomena at high spatial and temporal resolutions. Typically, these facilities are equipped only with modest compute resources for the analysis of generated experimental datasets. However, high data rate experiments can easily generate data in volumes that take days (or even week…
▽ More
Beamlines at synchrotron light source facilities are powerful scientific instruments used to image samples and observe phenomena at high spatial and temporal resolutions. Typically, these facilities are equipped only with modest compute resources for the analysis of generated experimental datasets. However, high data rate experiments can easily generate data in volumes that take days (or even weeks) to process on those local resources. To address this challenge, we present a system that unifies leadership computing and experimental facilities by enabling the automated establishment of data analysis pipelines that extend from edge data acquisition systems at synchrotron beamlines to remote computing facilities; under the covers, our system uses Globus Auth authentication to minimize user interaction, funcX to run user-defined functions on supercomputers, and Globus Flows to define and execute workflows. We describe the application of this system to ptychography, an ultra-high-resolution coherent diffraction imaging technique that can produce 100s of gigabytes to terabytes in a single experiment. When deployed on the DGX A100 ThetaGPU cluster at the Argonne Leadership Computing Facility and a microscopy beamline at the Advanced Photon Source, our system performs analysis as an experiment progresses to provide timely feedback.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing
Authors:
Justin M. Wozniak,
Timothy G. Armstrong,
Ketan C. Maheshwari,
Daniel S. Katz,
Michael Wilde,
Ian T. Foster
Abstract:
Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitat…
▽ More
Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Advancing Computing's Foundation of US Industry & Society
Authors:
Thomas M. Conte,
Ian T. Foster,
William Gropp,
Mark D. Hill
Abstract:
While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent succes…
▽ More
While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent successes in AI/ML required the synergy of improved algorithms and hardware architectures (e.g., general-purpose graphics processing units). However, unlike in the 20th Century and early 2000s, tomorrow's performance aspirations must be achieved without continued semiconductor scaling formerly provided by Moore's Law and Dennard Scaling. How will one deliver the next 100x improvement in capability at similar or less cost to enable great value? Can we make the next AI leap without 100x better hardware?
This whitepaper argues for a multipronged effort to develop new computing approaches beyond Moore's Law to advance the foundation that computing provides to US industry, education, medicine, science, and government. This impact extends far beyond the IT industry itself, as IT is now central for providing value across society, for example in semi-autonomous vehicles, tele-education, health wearables, viral analysis, and efficient administration. Herein we draw upon considerable visioning work by CRA's Computing Community Consortium (CCC) and the IEEE Rebooting Computing Initiative (IEEE RCI), enabled by thought leader input from industry, academia, and the US government.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes
Authors:
Mert Hidayetoglu,
Tekin Bicer,
Simon Garcia de Gonzalo,
Bin Ren,
Vincent De Andrade,
Doga Gursoy,
Raj Kettimuthu,
Ian T. Foster,
Wen-mei W. Hwu
Abstract:
X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, their use has been limited to small/medium datasets due to their computational requirements. In this paper, we propose a high-performance iterativ…
▽ More
X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, their use has been limited to small/medium datasets due to their computational requirements. In this paper, we propose a high-performance iterative reconstruction system for terabyte(s)-scale 3D volumes. Our design involves three novel optimizations: (1) optimization of (back)projection operators by extending the 2D memory-centric approach to 3D; (2) performing hierarchical communications by exploiting "fat-node" architecture with many GPUs; (3) utilization of mixed-precision types while preserving convergence rate and quality. We extensively evaluate the proposed optimizations and scaling on the Summit supercomputer. Our largest reconstruction is a mouse brain volume with 9Kx11Kx11K voxels, where the total reconstruction time is under three minutes using 24,576 GPUs, reaching 65 PFLOPS: 34% of Summit's peak performance.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Convolutional Neural Network Training with Distributed K-FAC
Authors:
J. Gregory Pauloski,
Zhao Zhang,
Lei Huang,
Weijia Xu,
Ian T. Foster
Abstract:
Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability…
▽ More
Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale. We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training time while preserving convergence. We use residual neural networks (ResNet) applied to the CIFAR-10 and ImageNet-1k datasets to evaluate the correctness and scalability of our K-FAC gradient preconditioner. With ResNet-50 on the ImageNet-1k dataset, our distributed K-FAC implementation converges to the 75.9% MLPerf baseline in 18-25% less time than does the classic stochastic gradient descent (SGD) optimizer across scales on a GPU cluster.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.