abstract

Model-Parallel Model Selection for Deep Learning Systems

Author:

Kabir NagrechaAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 2929 - 2931

https://doi.org/10.1145/3448016.3450571

Published: 18 June 2021 Publication History

Get Access

Abstract

As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too large to be fit onto a single processor. To address the issue, many ML practitioners have turned to model parallelism as a method of distributing the computational requirements across several devices. Unfortunately, the sequential nature of neural networks causes very low efficiency and device utilization in model parallel training jobs. We propose a new form of "shard parallelism" combining task parallelism and model parallelism, and package it into a framework we name Hydra. Hydra recasts the problem of model parallelism in the multi-model context to produce a fine-grained parallel workload of independent model shards, rather than independent models. This new parallel design promises dramatic speedups relative to the traditional model parallelism paradigm.

References

[1]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.

Google Scholar

[2]

Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Well-read students learn better: On the importance of pre-training compact models. arXiv preprint arXiv:1908.08962v2, 2019.

Google Scholar

[3]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text, 2016.

Google Scholar

[4]

Supun Nakandala, Yuhao Zhang, and Arun Kumar. Cerebro: Efficient and reproducible model selection on deep learning systems. pages 1--4, 06 2019.

Google Scholar

[5]

Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS'12, page 1223--1231, Red Hook, NY, USA, 2012. Curran Associates Inc.

Digital Library

Google Scholar

[6]

Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond data and model parallelism for deep neural networks, 2018.

Google Scholar

[7]

Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. Taso: Optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP '19, page 47--62, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

Google Scholar

[8]

Amir Gholami, Ariful Azad, Peter Jin, Kurt Keutzer, and Aydin Buluc. Integrated model, batch and domain parallelism in training neural networks, 2018.

Digital Library

Google Scholar

[9]

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17, page 1487--1495, New York, NY, USA, 2017. Association for Computing Machinery.

Digital Library

Google Scholar

[10]

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, and Ion Stoica. Tune: A research platform for distributed model selection and training, 2018.

Google Scholar

Cited By

View all

Guliyev RHaldar AFerhatosmanoglu H(2024)D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural NetworksProceedings of the VLDB Endowment10.14778/3681954.368196117:11(2764-2777)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681961
Nagrecha KKumar A(2024)Saturn: An Optimized Data System for Multi-Large-Model Deep Learning WorkloadsProceedings of the VLDB Endowment10.14778/3636218.363622717:4(712-725)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.14778/3636218.3636227
Nagrecha KLiu LDelgado PPadmanabhan P(2023)InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation ModelsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608778(430-442)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608778
Show More Cited By

Recommendations

Multi-core parallel architecture design and experiment for deep learning model training
Abstract
The parallel architecture can improve the training speed of Deep learning models and it is beneficial to model optimization. This research designs Optimal Interleaved Distributed Architecture (OIDA). Its characteristics are (1) the data set that ...
Cross Model Parallelism for Faster Bidirectional Training of Large Convolutional Neural Networks
Machine Learning and Knowledge Discovery in Databases: Research Track
Abstract
Large convolutional neural networks (CNNs) have been successful in data mining tasks, but it is hard to train these large-scale models. Model parallelism (MP) places a large CNN to several workers (GPUs) to fit in the memory, but its computation ...
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow
High Performance Computing
Abstract
To reduce the training time of large-scale Deep Neural Networks (DNNs), Deep Learning (DL) scientists have started to explore parallelization strategies like data-parallelism, model-parallelism, and hybrid-parallelism. While data-parallelism has ...

Comments

Information & Contributors

Information

Published In

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

June 2021

2969 pages

ISBN:9781450383431

DOI:10.1145/3448016

General Chairs:
Guoliang Li
Tsinghua University (China)
,
Zhanhuai Li
Northwestern Polytechnical University (China)
,
Program Chairs:
Stratos Idreos
Harvard University (USA)
,
Divesh Srivastava
AT&T (USA)

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Check for updates

Author Tags

Qualifiers

Abstract

Conference

SIGMOD/PODS '21

Sponsor:

SIGMOD

SIGMOD/PODS '21: International Conference on Management of Data

June 20 - 25, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
197
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)5

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Guliyev RHaldar AFerhatosmanoglu H(2024)D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural NetworksProceedings of the VLDB Endowment10.14778/3681954.368196117:11(2764-2777)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681961
Nagrecha KKumar A(2024)Saturn: An Optimized Data System for Multi-Large-Model Deep Learning WorkloadsProceedings of the VLDB Endowment10.14778/3636218.363622717:4(712-725)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.14778/3636218.3636227
Nagrecha KLiu LDelgado PPadmanabhan P(2023)InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation ModelsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608778(430-442)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608778
Huang YZhou WLi LLyu TXu H(2023)A Load-Balancing Strategy Based on Multi-Task Learning in a Distributed Training Environment2023 International Conference on Advances in Electrical Engineering and Computer Applications (AEECA)10.1109/AEECA59734.2023.00158(862-868)Online publication date: 18-Aug-2023
https://doi.org/10.1109/AEECA59734.2023.00158
Wang GLei YZhang ZPeng C(2023)2D-THA-ADMM: communication efficient distributed ADMM algorithm framework based on two-dimensional torus hierarchical AllReduceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01903-915:2(207-226)Online publication date: 28-Jun-2023
https://doi.org/10.1007/s13042-023-01903-9
Bilal MMoetesum MSiddiqi I(2022)Online Content Veracity Assessment using Deep Representation Learning2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST)10.1109/IBCAST54850.2022.9990148(325-330)Online publication date: 16-Aug-2022
https://doi.org/10.1109/IBCAST54850.2022.9990148
Aggarwal KSingh SChopra MKumar SColace F(2022)Deep Learning in Robotics for Strengthening Industry 4.0.: Opportunities, Challenges and Future DirectionsRobotics and AI for Cybersecurity and Critical Infrastructure in Smart Cities10.1007/978-3-030-96737-6_1(1-19)Online publication date: 29-Mar-2022
https://doi.org/10.1007/978-3-030-96737-6_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cited By

Index Terms

Recommendations

Multi-core parallel architecture design and experiment for deep learning model training

Cross Model Parallelism for Faster Bidirectional Training of Large Convolutional Neural Networks

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow