poster

A GPU memory efficient speed-up scheme for training ultra-deep neural networks: poster

Authors:

Qu Lu,

Ruixuan LiAuthors Info & Claims

PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

Pages 397 - 398

https://doi.org/10.1145/3293883.3295718

Published: 16 February 2019 Publication History

Get Access

Abstract

Ultra-deep neural network(UDNN) tends to yield higher-quality model but its training process is often difficult to handle. Scarce GPU DRAM capacity is the primary bottleneck that limits the depth of neural network and the range of trainable minibatch size. In this paper, we present a scheme that dedicates to make the utmost use of finite GPU memory resource to speed up the training process for UDNN. Firstly, a performance-model guided dynamic swap out/in strategy between GPU and host memory is carefully orchestrated to tackle the out-of-memory problem without introducing performance penalty. Then, a hyperparameter (minibatch size, learning rate) tuning policy is designed to explore the optimal configuration after applying the swap strategy from the perspectives of training time and final accuracy simultaneously. Finally, we verify the effectiveness of our scheme in both single and distributed GPU mode.

References

[1]

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.

Google Scholar

[2]

Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. Gist: Efficient data encoding for deep neural network training. In Acm/ieee International Symposium on Computer Architecture, 2018.

Digital Library

Google Scholar

[3]

Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W Keckler. vdnn: Virtualized deep neural networks for scalable, memory-efficient neural network design. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, page 18. IEEE Press, 2016.

Digital Library

Google Scholar

[4]

Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. Superneurons: dynamic gpu memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 41--53. ACM, 2018.

Digital Library

Google Scholar

Cited By

View all

Yang YYang RLi YCui KYang ZWang YXu JXie H(2023)RoSGAS: Adaptive Social Bot Detection with Reinforced Self-supervised GNN Architecture SearchACM Transactions on the Web10.1145/357240317:3(1-31)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1145/3572403
Hijma PHeldens SSclocco Avan Werkhoven BBal H(2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3570638
Joblin MEckl BBock TSchmid ASiegmund JApel S(2023)Hierarchical and Hybrid Organizational Structures in Open-source Software Projects: A Longitudinal StudyACM Transactions on Software Engineering and Methodology10.1145/356994932:4(1-29)Online publication date: 26-May-2023
https://dl.acm.org/doi/10.1145/3569949
Show More Cited By

Index Terms

A GPU memory efficient speed-up scheme for training ultra-deep neural networks: poster
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Neural networks

Recommendations

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and ...
Efficient GPU NVRAM Persistence with Helper Warps
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Non-volatile Random-Access Memories (NVRAM) have emerged in recent years to bridge the performance gap between the main memory and external storage devices. To utilize the non-volatility of NVRAMs, programs should allow durable stores, meaning ...
Thread Batching for High-performance Energy-efficient GPU Memory Design
Special Issue on HALO for Energy-Constrained On-Chip Machine Learning, Part 2 and Regular Papers

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, memory becomes a bottleneck of GPU’s performance and energy efficiency. In ...

Comments

Information & Contributors

Information

Published In

PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

February 2019

472 pages

ISBN:9781450362252

DOI:10.1145/3293883

General Chair:
Jeff Hollingsworth
University of Maryland
,
Program Chair:
Idit Keidar
Technion, Israel

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2019

Check for updates

Author Tags

Qualifiers

Poster

Conference

PPoPP '19

Sponsor:

PPoPP '19: 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 16 - 20, 2019

District of Columbia, Washington

Acceptance Rates

PPoPP '19 Paper Acceptance Rate 29 of 152 submissions, 19%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
299
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yang YYang RLi YCui KYang ZWang YXu JXie H(2023)RoSGAS: Adaptive Social Bot Detection with Reinforced Self-supervised GNN Architecture SearchACM Transactions on the Web10.1145/357240317:3(1-31)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1145/3572403
Hijma PHeldens SSclocco Avan Werkhoven BBal H(2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3570638
Joblin MEckl BBock TSchmid ASiegmund JApel S(2023)Hierarchical and Hybrid Organizational Structures in Open-source Software Projects: A Longitudinal StudyACM Transactions on Software Engineering and Methodology10.1145/356994932:4(1-29)Online publication date: 26-May-2023
https://dl.acm.org/doi/10.1145/3569949
Guo JLiu WWang WYao CHan JLi RLu YHu S(2019)AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-Deep Neural Networks2019 IEEE 37th International Conference on Computer Design (ICCD)10.1109/ICCD46524.2019.00017(65-72)Online publication date: Nov-2019
https://doi.org/10.1109/ICCD46524.2019.00017
Mittal SVaishay S(2019)A Survey of Techniques for Optimizing Deep Learning on GPUsJournal of Systems Architecture10.1016/j.sysarc.2019.101635(101635)Online publication date: Aug-2019
https://doi.org/10.1016/j.sysarc.2019.101635

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities

Efficient GPU NVRAM Persistence with Helper Warps

Thread Batching for High-performance Energy-efficient GPU Memory Design