research-article

Liquid: Mix-and-Match Multiple Image Formats to Balance DNN Training Pipeline

Authors:

Jae W. LeeAuthors Info & Claims

APSys '23: Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems

Pages 50 - 57

https://doi.org/10.1145/3609510.3609811

Published: 24 August 2023 Publication History

Abstract

Today's deep neural network (DNN) training pipeline utilizes hardware resources holistically, including host CPUs and storage devices for preprocessing the input data and accelerators like GPUs for computing gradients. As the performance of the accelerator scales rapidly, the frontend data preparation stages are becoming a new performance bottleneck to yield suboptimal training throughput. Since the bottleneck in the pipeline may vary depending on hardware configurations, DNN models, and datasets, overprovisioning hardware resources for data preparation such as CPU cores and disk bandwidth is not a cost-effective solution. Instead, we make a case for leveraging multiple data formats, possibly with opposing characteristics in resource utilization, to balance the training pipeline. This idea is realized by Liquid, a new system for building an efficient training pipeline with multi-format datasets. Our evaluation on three distinct execution environments demonstrates that Liquid achieves up to 3.05x and 1.54x higher data preparation throughput on Cityscapes/CityPersons (PNG) and ImageNet (JPEG) datasets, respectively, over the baseline single-format pipeline. This leads up to 2.02x and 1.25x higher end-to-end geomean training throughput with no accuracy drop.

References

[1]

Jonghyun Bae, Woohyeon Baek, Tae Jun Ham, and Jae W. Lee. 2022. L3: accelerator-friendly lossless image format for high-resolution, high-throughput dnn training. In European Conference on Computer Vision (ECCV 22). Springer. European Computer Vision Association, Tel-Aviv, Israel, (Oct. 2022), 171--188.

[2]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision. Springer, Munich, Germany, (Sept. 2018), 18 pages.

Digital Library

[3]

Fahim Chowdhury, Yue Zhu, Todd Heer, Saul Paredes, Adam Moody, Robin Goldstone, Kathryn Mohror, and Weikuan Yu. 2019. I/O characterization and performance evaluation of BeeGFS for deep learning. In Proceedings of the 48th International Conference on Parallel Processing Article 80. Association for Computing Machinery, New York, NY, USA, 10 pages. isbn: 9781450362955.

Digital Library

[4]

Project Fiddle. 2022. Coordinated data loader. https://github.com/msr-fiddle/CoorDL. (2022).

[5]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, (June 2016), 11 pages.

[6]

NVIDIA. 2021. The NVIDIA data loading library (DALI). https://github.com/NVIDIA/DALI. (2021).

[7]

Antonin Descampe, Franois-Olivier Devaux, Gal Rouvroy, Jean-Didier Legat, Jean-Jacques Quisquater, and Benot Macq. 2006. A flexible hardware JPEG 2000 decoder for digital cinema. IEEE Transactions on Circuits and Systems for Video Technology, 16, 11, 1397--1410.

Digital Library

[8]

Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, and Torsten Hoefler. 2021. Clairvoyant prefetching for distributed machine learning I/O. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Article 92. Association for Computing Machinery, New York, NY, USA, 15 pages. isbn: 9781450384421.

Digital Library

[9]

Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2017. Adaptive lossless data compression method optimized for GPU decompression. Concurrency and Computation: Practice and Experience, 29, 24, e4283. e4283 cpe.4283.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385. (2015).

[11]

Andrew Howard et al. 2019. Searching for mobilenetv3. arXiv preprint arXiv:1905.02244. (2019).

[12]

Shizhen Huang and Tianyi Zheng. 2008. Hardware design for accelerating PNG decode. In Proceedings of the 2008 IEEE International Conference on Electron Devices and Solid-State Circuits. IEEE, Hong Kong, China, 1--4.

[13]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. Squeezenet: alexnet-level accuracy with 50x fewer parameters and <0.5mb model size. arXiv preprint arXiv:1602.07360. (2016).

[14]

Alexander Kirillov, Yuxin Wu, Kaiming He, and Ross Girshick. 2020. Pointrend: image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Virtual, (June 2020), 10 pages.

[15]

Morgan Ledwon, Bruce F. Cockburn, and Jie Han. 2019. Design and evaluation of an FPGA-based hardware accelerator for deflate data decompression. In Proceedings of the 2019 IEEE Canadian Conference of Electrical and Computer Engineering. IEEE, Edmonton, AB, Canada, 1--6.

[16]

Mihir Mody, Vipul Paladiya, and Kapil Ahuja. 2013. Efficient progressive JPEG decoder using JPEG baseline hardware. In Proceedings of the 2013 IEEE Second International Conference on Image Information Processing. IEEE, Shimla, India, 369--372.

[17]

Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. 2021. Analyzing and mitigating data stalls in DNN training. Proceedings of the VLDB Endowment, 14, 5, (Jan. 2021), 771--784.

Digital Library

[18]

Derek G. Murray, Jiri Simsa, Ana Klimovic, and Ihor Indyk. 2021. Tf.data: a machine learning data processing framework. Proceedings of the VLDB Endowment, 14, 12, (Aug. 2021), 2945--2958.

Digital Library

[19]

Pyeongsu Park, Heetaek Jeong, and Jangwoo Kim. 2020. TrainBox: an extreme-scale neural network training server architecture by systematically balancing operations. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, Athens, Greece, 825--838.

[20]

PyTorch. 2021. PyTorch. https://pytorch.org. (2021).

[21]

Apache Mesos. 2022. RecordIO Data Format. https://mesos.apache.org/documentation/latest/recordio. (2022).

[22]

Olga Russakovsky et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115, 3, 211--252.

Digital Library

[23]

Frederic Schimmelpfennig, Marc-André Vef, Reza Salkhordeh, Alberto Miranda, Ramon Nou, and André Brinkmann. 2021. Streamlining distributed deep learning I/O with ad hoc file systems. In Proceedings of the 2021 IEEE International Conference on Cluster Computing. IEEE, Portland, OR, USA, 169--180.

[24]

Evangelia Sitaridi, Rene Mueller, Tim Kaldewey, Guy Lohman, and Kenneth A. Ross. 2016. Massively-parallel lossless data decompression. In Procceedings of the 2016 45th International Conference on Parallel Processing. IEEE, Philadelphia, PA, USA, 242--247.

[25]

TensorFlow. 2022. TFRecord and tf.train.Example. https://www.tensorflow.org/tutorials/load_data/tfrecord. (2022).

[26]

Jiannan Tian et al. 2020. Cusz: an efficient gpu-based error-bounded lossy compression framework for scientific data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Association for Computing Machinery, New York, NY, USA, 3--15. isbn: 9781450380751.

[27]

Lipeng Wang, Songgao Ye, Baichen Yang, Youyou Lu, Hequan Zhang, Shengen Yan, and Qiong Luo. 2020. DIESEL: a dataset-based distributed storage and caching system for large-scale deep learning training. In Proceedings of the 49th International Conference on Parallel Processing Article 20. Association for Computing Machinery, New York, NY, USA, 11 pages. isbn: 9781450388160.

Digital Library

[28]

André Weißenberger and Bertil Schmidt. 2018. Massively parallel huffman decoding on GPUs. In Proceedings of the 47th International Conference on Parallel Processing Article 27. Association for Computing Machinery, New York, NY, USA, 10 pages. isbn: 9781450365109.

Digital Library

[29]

Ultralytics. 2021. Yolov5. https://github.com/ultralytics/yolov5/. (2021).

[30]

S. Zhang, R. Benenson, and B. Schiele. 2017. Citypersons: a diverse dataset for pedestrian detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, USA, (July 2017), 4457--4465.

[31]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: an extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, (June 2018), 6848--6856.

Index Terms

Liquid: Mix-and-Match Multiple Image Formats to Balance DNN Training Pipeline
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks

Recommendations

L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training
Computer Vision – ECCV 2022
Abstract
The training process of deep neural networks (DNNs) is usually pipelined with stages for data preparation on CPUs followed by gradient computation on accelerators like GPUs. In an ideal pipeline, the end-to-end training throughput is eventually ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
High performance distributed deep learning: a beginner's guide
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

The current wave of advances in Deep Learning (DL) has led to many exciting challenges and opportunities for Computer Science and Artificial Intelligence researchers alike. Modern DL frameworks like Caffe2, TensorFlow, Cognitive Toolkit (CNTK), PyTorch, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

APSys '23: Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems

August 2023

98 pages

ISBN:9798400703058

DOI:10.1145/3609510

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Institute of Information & Communications Technology Planning & Evaluation
SNU-SK Hynix Solution Research Center

Conference

APSys '23

Sponsor:

SIGOPS

APSys '23: 14th ACM SIGOPS Asia-Pacific Workshop on Systems

August 24 - 25, 2023

Seoul, Republic of Korea

Acceptance Rates

APSys '23 Paper Acceptance Rate 13 of 32 submissions, 41%;

Overall Acceptance Rate 149 of 386 submissions, 39%

Upcoming Conference

APSys '24

Sponsor:
sigops

15th ACM SIGOPS Asia-Pacific Workshop on Systems

September 4 - 5, 2024

Kyoto , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
156
Total Downloads

Downloads (Last 12 months)156
Downloads (Last 6 weeks)8

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents