research-article

Open access

Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network Models

Authors:

Arnur Nigmetov,

Dmitriy Morozov,

Anthony Kougkas,

Bogdan NicolaeAuthors Info & Claims

ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing

Pages 812 - 821

https://doi.org/10.1145/3673038.3673070

Published: 12 August 2024 Publication History

All formats PDF

Abstract

Scientific workflows increasingly need to train a DNN model in real-time during an experiment (e.g. using ground truth from a simulation), while using it at the same time for inferences. Instead of sharing the same model instance, the training (producer) and inference server (consumer) often use different model replicas that are kept synchronized. In addition to efficient I/O techniques to keep the model replica of the producer and consumer synchronized, there is another important trade-off: frequent model updates enhance inference quality but may slow down training; infrequent updates may lead to less precise inference results. To address these challenges, we introduce Viper: a new I/O framework designed to determine a near-optimal checkpoint schedule and accelerate the delivery of the latest model updates. Viper builds an inference performance predictor to identify the optimal checkpoint schedule to balance the trade-off between training slowdown and inference quality improvement. It also creates a memory-first model transfer engine to accelerate model delivery through direct memory-to-memory communication. Our experiments show that Viper can reduce the model update latency by ≈ 9x using the GPU-to-GPU data transfer engine and ≈ 3x using the DRAM-to-DRAM host data transfer. The checkpoint schedule obtained from Viper’s predictor also demonstrates improved cumulative inference accuracy compared to the baseline of epoch-based solutions.

References

[1]

Anakha V Babu, Tekin Bicer, Saugat Kandel, Tao Zhou, Daniel J Ching, Steven Henke, Siniša Veseli, Ryan Chard, Antonino Miceli, and Mathew Joseph Cherukara. 2023. AI-assisted automated workflow for real-time x-ray ptychography data analysis via federated resources. arXiv preprint arXiv:2304.04297 (2023).

[2]

Nathan Baker, Frank Alexander, Timo Bremer, Aric Hagberg, Yannis Kevrekidis, Habib Najm, Manish Parashar, Abani Patra, James Sethian, Stefan Wild, 2019. Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence. Technical Report. USDOE Office of Science (SC), Washington, DC (United States).

[3]

Dimitri Bourilkov. 2019. Machine and deep learning applications in particle physics. International Journal of Modern Physics A 34, 35 (2019), 1930019.

[4]

Fahim Chowdhury, Yue Zhu, Todd Heer, Saul Paredes, Adam Moody, Robin Goldstone, Kathryn Mohror, and Weikuan Yu. 2019. I/o characterization and performance evaluation of beegfs for deep learning. In Proceedings of the 48th International Conference on Parallel Processing. 1–10.

Digital Library

[5]

Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In NSDI, Vol. 17. 613–627.

[6]

Hariharan Devarajan, Anthony Kougkas, Huihuo Zheng, Venkatram Vishwanath, and Xian-He Sun. 2022. Stimulus: Accelerate Data Management for Scientific AI applications in HPC. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 109–118.

[7]

Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence.

[8]

Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and Murali Annavaram. 2022. Check-N-Run: a checkpointing system for training deep learning recommendation models. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). 929–943.

[9]

Raia Hadsell, Dushyant Rao, Andrei A Rusu, and Razvan Pascanu. 2020. Embracing change: Continual learning in deep neural networks. Trends in Cognitive Sciences 24, 12 (2020), 1028–1040.

[10]

Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: scalable adaptation of video analytics. In Proceedings of the 2018 conference of the ACM special interest group on data communication. 253–266.

Digital Library

[11]

Zhengchun Liu, Hemant Sharma, J-S Park, Peter Kenesei, Antonino Miceli, Jonathan Almer, Rajkumar Kettimuthu, and Ian Foster. 2022. BraggNN: fast X-ray Bragg peak analysis using deep learning. IUCrJ 9, 1 (2022), 104–113.

[12]

Meghana Madhyastha, Robert Underwood, Randal Burns, and Bogdan Nicolae. 2023. DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access. In Proceedings of the 37th International Conference on Supercomputing. 133–143.

Digital Library

[13]

Avinash Maurya, Robert Underwood, Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. 2024. DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models. In HPDC’24: The 33nd International Symposium on High-Performance Parallel and Distributed Computing. Pisa, Italy.

[14]

Martial Mermillod, Aurélia Bugaiska, and Patrick Bonin. 2013. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in Psychology 4 (2013), 504.

[15]

microsoft. 2023. ONNX Runtime: A cross-platform inference and training machine-learning accelerator. https://github.com/microsoft/onnxruntime

[16]

Jayashree Mohan, Amar Phanishayee, and Vijay Chidambaram. 2021. CheckFreq: Frequent, Fine-Grained DNN Checkpointing. In FAST, Vol. 21. 203–216.

[17]

Bogdan Nicolae, Jiali Li, Justin M Wozniak, George Bosilca, Matthieu Dorier, and Franck Cappello. 2020. Deepfreeze: Towards scalable asynchronous checkpointing of deep learning models. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE, 172–181.

[18]

Bogdan Nicolae, Justin M Wozniak, Matthieu Dorier, and Franck Cappello. 2020. DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training. In CLUSTER’20: The 2020 IEEE International Conference on Cluster Computing. Kobe, Japan.

[19]

NVIDIA. 2023. NVIDIA Triton Inference Server. https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/architecture.html

[20]

Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. Tensorflow-serving: Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139 (2017).

[21]

pytorch. 2023. TorchServe: a flexible and easy to use tool for serving and scaling PyTorch models in production. https://github.com/pytorch/serve

[22]

Stephan Rasp, Peter D Dueben, Sebastian Scher, Jonathan A Weyn, Soukayna Mouatadid, and Nils Thuerey. 2020. WeatherBench: a benchmark data set for data-driven weather forecasting. Journal of Advances in Modeling Earth Systems 12, 11 (2020), e2020MS002203.

[23]

Dong In Shin, Young Jin Yu, Hyeong S Kim, Jae Woo Choi, Heon Y Yeom, 2013. Dynamic Interval Polling and Pipelined Post { I/O} Processing for { Low-Latency} Storage Class Memory. In 5th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 13).

[24]

Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, 2022. Ekko: A { Large-Scale} Deep Learning Recommender System with { Low-Latency} Model Update. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 821–839.

[25]

Robert Underwood, Meghana Madhyastha, Randal Burns, and Bogdan Nicolae. 2024. EvoStore: Towards Scalable Storage of Evolving Learning Models. In HPDC’24: The 33nd International Symposium on High-Performance Parallel and Distributed Computing. Pisa, Italy.

[26]

Tom Viering and Marco Loog. 2022. The shape of learning curves: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

[27]

Ricardo Vinuesa and Steven L Brunton. 2022. Enhancing computational fluid dynamics with machine learning. Nature Computational Science 2, 6 (2022), 358–366.

[28]

Yingcan Wei, Matthias Langer, Fan Yu, Minseok Lee, Jie Liu, Ji Shi, and Zehuan Wang. 2022. A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models. In Proceedings of the 16th ACM Conference on Recommender Systems. 408–419.

Digital Library

[29]

Justin M Wozniak, Rajeev Jain, Prasanna Balaprakash, Jonathan Ozik, Nicholson T Collier, John Bauer, Fangfang Xia, Thomas Brettin, Rick Stevens, Jamaludin Mohd-Yusof, 2018. CANDLE/Supervisor: A workflow framework for machine learning applied to cancer research. BMC bioinformatics 19, 18 (2018), 59–69.

Index Terms

Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network Models
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Software architectures
        Publish-subscribe / event-based architectures

Recommendations

Learning and Updating of Uncertainty in Dirichlet Models

In this paper we analyze the problem of learning and updating of uncertainty in Dirichlet models, where updating refers to determining the conditional distribution of a single variable when some evidence is known. We first obtain the most general family ...
High-performance copying garbage collection with low space overhead
Combining deep generative and discriminative models for Bayesian semi-supervised learning
Highlights
- Modelling framwork that enables Bayesian semi-supervised learning.
- Bayesian ...
Abstract
Generative models can be used for a wide range of tasks, and have the appealing ability to learn from both labelled and unlabelled data. In contrast, discriminative models cannot learn from unlabelled data, but tend to outperform their ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing

August 2024

1279 pages

ISBN:9798400717932

DOI:10.1145/3673038

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

ICPP '24

ICPP '24: the 53rd International Conference on Parallel Processing

August 12 - 15, 2024

Gotland, Sweden

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
144
Total Downloads

Downloads (Last 12 months)144
Downloads (Last 6 weeks)52

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents