research-article

Parallel Deep Neural Network Training for Big Data on Blue Gene/Q

Authors:

Brian KingsburyAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 28, Issue 6

Pages 1703 - 1714

https://doi.org/10.1109/TPDS.2016.2626289

Published: 01 June 2017 Publication History

Abstract

Deep Neural Networks (DNNs) have recently been shown to significantly outperform existing machine learning techniques in several pattern recognition tasks. DNNs are the state-of-the-art models used in image recognition, object detection, classification and tracking, and speech and language processing applications. The biggest drawback to DNNs has been the enormous cost in computation and time taken to train the parameters of the networks-often a tenfold increase relative to conventional technologies. Such training time costs can be mitigated by the application of parallel computing algorithms and architectures. However, these algorithms often run into difficulties because of the cost of inter-processor communication bottlenecks. In this paper, we describe how to enable Parallel Deep Neural Network Training on the IBM Blue Gene/Q (BG/Q) computer system. Specifically, we explore DNN training using the data-parallel Hessian-free 2nd order optimization algorithm. Such an algorithm is particularly well-suited to parallelization across a large set of loosely coupled processors. BG/Q, with its excellent inter-processor communication characteristics, is an ideal match for this type of algorithm. The paper discusses how issues regarding programming model and data-dependent imbalances are addressed. Results on large-scale speech tasks show that the performance on BG/Q scales linearly up to 4,096 processes with no loss in accuracy. This allows us to train neural networks using billions of training examples in a few hours.

Cited By

View all

Chen M(2022)Construction of Flipped Classroom for College English Courses Using Big Data MOOCs and Information SystemMobile Information Systems10.1155/2022/96802052022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9680205
Zhao H(2022)Teaching Mode in the Management of Higher Vocational Colleges in the Era of Big DataMobile Information Systems10.1155/2022/81004952022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/8100495
Li JWei P(2022)Three-Dimensional Landscape Rendering and Landscape Spatial Distribution of Traditional Villages Based on Big Data Information SystemMobile Information Systems10.1155/2022/49459182022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4945918
Show More Cited By

Parallel Deep Neural Network Training for Big Data on Blue Gene/Q
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
2. Theory of computation
  1. Models of computation
    1. Concurrency

Recommendations

Parallel deep neural network training for big data on blue gene/Q
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Deep Neural Networks (DNNs) have recently been shown to significantly outperform existing machine learning techniques in several pattern recognition tasks. DNNs are the state-of-the-art models used in image recognition, object detection, classification ...
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q

As newer supercomputers continue to increase the number of threads, there is growing pressure on applications to exploit more of the available parallelism in their codes, including coarse-, medium-, and fine-grain parallelism. OpenMPi is one of the ...
Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer

The Blue Gene/Q (BG/Q) machine is the latest in the line of IBM massively parallel supercomputers, designed to scale to 262,144 nodes and 16 million threads. Each BG/Q node has 68 hardware threads. Hybrid programming paradigms, which use message passing ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 28, Issue 6

June 2017

276 pages

ISSN:1045-9219

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 June 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chen M(2022)Construction of Flipped Classroom for College English Courses Using Big Data MOOCs and Information SystemMobile Information Systems10.1155/2022/96802052022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9680205
Zhao H(2022)Teaching Mode in the Management of Higher Vocational Colleges in the Era of Big DataMobile Information Systems10.1155/2022/81004952022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/8100495
Li JWei P(2022)Three-Dimensional Landscape Rendering and Landscape Spatial Distribution of Traditional Villages Based on Big Data Information SystemMobile Information Systems10.1155/2022/49459182022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4945918
Cai GSouri AChen M(2021)Accurate mining of location data in the communication field based on big dataJournal of High Speed Networks10.3233/JHS-21066527:3(251-264)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/JHS-210665
Du SChen ZWu HTang YLi Y(2021)Image Recommendation Algorithm Combined with Deep Neural Network Designed for Social NetworksComplexity10.1155/2021/51961902021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/5196190
Xu JWang JQi QSun HLiao JYang D(2021)Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU ClusterJournal of Grid Computing10.1007/s10723-021-09550-619:1Online publication date: 1-Mar-2021
https://dl.acm.org/doi/10.1007/s10723-021-09550-6
Zu Y(2020)Deep learning parallel computing and evaluation for embedded system clustering architecture processorDesign Automation for Embedded Systems10.1007/s10617-020-09235-524:3(145-159)Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1007/s10617-020-09235-5
Ning LShen XEigenmann RDing CMcKee S(2019)Deep reuseProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330384(438-448)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330384
Ben-Nun THoefler T(2019)Demystifying Parallel and Distributed Deep LearningACM Computing Surveys10.1145/332006052:4(1-43)Online publication date: 30-Aug-2019
https://dl.acm.org/doi/10.1145/3320060
Pittman RGuan HShen XLim SPatton R(2018)Exploring flexible communications for streamlining DNN ensemble training pipelinesProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291742(1-12)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291742
Show More Cited By

Abstract

Cited By

Recommendations