research-article

Distributed training for accelerating metalearning algorithms

Authors:

Rekha SinghalAuthors Info & Claims

BiDEDE '21: Proceedings of the International Workshop on Big Data in Emergent Distributed Environments

Article No.: 2, Pages 1 - 6

https://doi.org/10.1145/3460866.3461773

Published: 20 June 2021 Publication History

Abstract

The lack of large amounts of training data diminishes the power of deep learning to train models with a high accuracy. Few shot learning (i.e. learning using few data samples) is implemented by Meta-learning, a learn to learn approach. Most gradient based metalearning approaches are hierarchical in nature and computationally expensive. Metalearning approaches generalize well across new tasks by training on very few tasks; but require multiple training iterations which lead to large training times.

In this paper, we propose a generic approach to accelerate the training process of metalearning algorithms by leveraging a distributed training setup. Training is conducted on multiple worker nodes, with each node processing a subset of the tasks, in a distributed training paradigm. We propose QMAML (Quick MAML), which is a distributed variant of the MAML (Model Agnostic Metalearning) algorithm, to illustrate the efficacy of our approach. MAML is one of the most popular meta-learning algorithms that estimates initialization parameters for a meta model to be used by similar newer tasks for faster adaptation. However, MAML being hierarchical in nature is computationally expensive. The learning-tasks in QMAML are run on multiple workers in order to accelerate the training process. Similar to the distributed training paradigm, gradients for learning-tasks are consolidated to update the meta-model. We leverage a lightweight distributed training library, Horovod, to implement QMAML. Our experiments illustrate that QMAML reduces the training time of MAML by 50% over an open source library, learn2learn, for image recognition tasks, which are quasi-benchmark tasks in the field of metalearning.

References

[1]

[n.d.]. learn2learn open source library. Retrieved March 18, 2021 from http://learn2learn.net/

[2]

[n.d.]. RaySGD: Distributed Deep Learning. Retrieved April 20, 2021 from https://docs.ray.io/en/releases-0.8.1/raysgd/raysgd.html

[3]

Antreas Antoniou, Harisson Edwards, and Amos Storkey. 2018. How to train your MAML. arXiv:1810.09502

[4]

Abul. Bashar. 2019. Survey on evolving deep learning neural network architectures. Journal of Artificial Intelligence 1, 2 (Feb. 2019), 73--82.

[5]

Harkirat Singh Behl, Atilim Güneş Baydin, and Philip HS. Torr. 2019. Alpha maml: Adaptive model-agnostic meta-learning. arXiv:1905.07435

[6]

Jan Bollenbacher, Florian Soulier, Beate Rhein, and Laurenz Wiskott. 2020. Investigating Parallelization of MAML. In International Conference on Machine Learning. 294--306.

[7]

Karanbir Singh Chahal, Manraj Singh Grover, Kuntal Dey, and Rajiv Ratn Shah. 2020. A hitchhiker's guide on distributed training of deep neural networks. J. Parallel and Distrib. Comput. 137 (2020), 65--76.

Digital Library

[8]

Jiasi Chen and Xukan Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674.

[9]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126--1135.

[10]

Timothy Hospedales, Antreasm Antoniou, Paul Micaelli, and Amos Storkey. 2020. Meta-learning in neural networks: A survey. arXiv:2004.05439

[11]

Sylvain Jeaugey. 2017. Nccl 2.0. In GPU Technology Conference (GTC).

[12]

Shruti Kunde, Amey Pandit, and Rekha Singhal. 2020. Benchmarking performance of RaySGD and Horovod for big data applications. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2757--2762.

[13]

Brenden Lake, Ruslan Salakhutdinov, Jason Gross, and Joshua Tenenbaum. 2011. One shot learning of simple visual concepts. In Proceedings of the annual meeting of the cognitive science society, Vol. 33.

[14]

Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, et al. 2020. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704 (2020).

[15]

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A distributed framework for emerging {AI} applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561--577.

Digital Library

[16]

Alex Nichol, Joshua Achiam, and John. Schulman. 2018. On first-order meta-learning algorithms. arXiv:1803.02999

[17]

Aniruddh Raghu, Maithra Raghu, Samy Bengio, and Oriol. Vinyals. 2018. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv:1909.09157

[18]

Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. (2016).

[19]

Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799

[20]

Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, and Yunhao Tang. 2019. Es-maml: Simple hessian-free meta learning. arXiv:1910.01215

[21]

Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 328--339.

[22]

Joaquin. Vanschoren. 2018. Metalearning: A survey. arXiv:1810.03548

[23]

Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K Leung, Christian Makaya, Ting He, and Kevin Chan. 2018. When edge meets learning: Adaptive control for resource-constrained distributed machine learning. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 63--71.

Digital Library

Cited By

Xiao YZhao SZhou ZHuan ZJu LZhang XWang LZhou JFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615208(4365-4369)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615208
Deshpande SKunde SSingh RBanolia CSinghal RP. B(2023)DAFTA: Distributed Architecture for Fusion-Transformer training AccelerationProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3579142.3594294(1-9)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3579142.3594294

Index Terms

Distributed training for accelerating metalearning algorithms
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms

Recommendations

Generalizing from a Few Examples: A Survey on Few-shot Learning

Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new ...
MetaFaaS: learning-to-learn on serverless
BiDEDE '22: Proceedings of the International Workshop on Big Data in Emergent Distributed Environments

Meta-learning is a technique to transfer learning from a pre-built model on known tasks to build a model for unknown tasks. Graident-based meta-learning algorithms are one such family that use the technique of gradient descent for model updates. These ...
A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities
Few-shot learning (FSL) has emerged as an effective learning method and shows great potential. Despite the recent creative works in tackling FSL tasks, learning valid information rapidly from just a few or even zero samples remains a serious challenge. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

BiDEDE '21: Proceedings of the International Workshop on Big Data in Emergent Distributed Environments

June 2021

65 pages

ISBN:9781450384650

DOI:10.1145/3460866

Conference Chairs:
Sven Groppe
University of Lübeck, Germany
,
Le Gruenwald
University of Oklahoma
,
Ching-Hsien Hsu
Asia University, Taiwan

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '21

Sponsor:

SIGMOD

SIGMOD/PODS '21: International Conference on Management of Data

June 20, 2021

Virtual Event, China

Acceptance Rates

BiDEDE '21 Paper Acceptance Rate 8 of 17 submissions, 47%;

Overall Acceptance Rate 25 of 47 submissions, 53%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xiao YZhao SZhou ZHuan ZJu LZhang XWang LZhou JFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615208(4365-4369)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615208
Deshpande SKunde SSingh RBanolia CSinghal RP. B(2023)DAFTA: Distributed Architecture for Fusion-Transformer training AccelerationProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3579142.3594294(1-9)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3579142.3594294

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten