Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3460866.3461773acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Distributed training for accelerating metalearning algorithms

Published: 20 June 2021 Publication History

Abstract

The lack of large amounts of training data diminishes the power of deep learning to train models with a high accuracy. Few shot learning (i.e. learning using few data samples) is implemented by Meta-learning, a learn to learn approach. Most gradient based metalearning approaches are hierarchical in nature and computationally expensive. Metalearning approaches generalize well across new tasks by training on very few tasks; but require multiple training iterations which lead to large training times.
In this paper, we propose a generic approach to accelerate the training process of metalearning algorithms by leveraging a distributed training setup. Training is conducted on multiple worker nodes, with each node processing a subset of the tasks, in a distributed training paradigm. We propose QMAML (Quick MAML), which is a distributed variant of the MAML (Model Agnostic Metalearning) algorithm, to illustrate the efficacy of our approach. MAML is one of the most popular meta-learning algorithms that estimates initialization parameters for a meta model to be used by similar newer tasks for faster adaptation. However, MAML being hierarchical in nature is computationally expensive. The learning-tasks in QMAML are run on multiple workers in order to accelerate the training process. Similar to the distributed training paradigm, gradients for learning-tasks are consolidated to update the meta-model. We leverage a lightweight distributed training library, Horovod, to implement QMAML. Our experiments illustrate that QMAML reduces the training time of MAML by 50% over an open source library, learn2learn, for image recognition tasks, which are quasi-benchmark tasks in the field of metalearning.

References

[1]
[n.d.]. learn2learn open source library. Retrieved March 18, 2021 from http://learn2learn.net/
[2]
[n.d.]. RaySGD: Distributed Deep Learning. Retrieved April 20, 2021 from https://docs.ray.io/en/releases-0.8.1/raysgd/raysgd.html
[3]
Antreas Antoniou, Harisson Edwards, and Amos Storkey. 2018. How to train your MAML. arXiv:1810.09502
[4]
Abul. Bashar. 2019. Survey on evolving deep learning neural network architectures. Journal of Artificial Intelligence 1, 2 (Feb. 2019), 73--82.
[5]
Harkirat Singh Behl, Atilim Güneş Baydin, and Philip HS. Torr. 2019. Alpha maml: Adaptive model-agnostic meta-learning. arXiv:1905.07435
[6]
Jan Bollenbacher, Florian Soulier, Beate Rhein, and Laurenz Wiskott. 2020. Investigating Parallelization of MAML. In International Conference on Machine Learning. 294--306.
[7]
Karanbir Singh Chahal, Manraj Singh Grover, Kuntal Dey, and Rajiv Ratn Shah. 2020. A hitchhiker's guide on distributed training of deep neural networks. J. Parallel and Distrib. Comput. 137 (2020), 65--76.
[8]
Jiasi Chen and Xukan Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674.
[9]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126--1135.
[10]
Timothy Hospedales, Antreasm Antoniou, Paul Micaelli, and Amos Storkey. 2020. Meta-learning in neural networks: A survey. arXiv:2004.05439
[11]
Sylvain Jeaugey. 2017. Nccl 2.0. In GPU Technology Conference (GTC).
[12]
Shruti Kunde, Amey Pandit, and Rekha Singhal. 2020. Benchmarking performance of RaySGD and Horovod for big data applications. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2757--2762.
[13]
Brenden Lake, Ruslan Salakhutdinov, Jason Gross, and Joshua Tenenbaum. 2011. One shot learning of simple visual concepts. In Proceedings of the annual meeting of the cognitive science society, Vol. 33.
[14]
Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, et al. 2020. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704 (2020).
[15]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A distributed framework for emerging {AI} applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561--577.
[16]
Alex Nichol, Joshua Achiam, and John. Schulman. 2018. On first-order meta-learning algorithms. arXiv:1803.02999
[17]
Aniruddh Raghu, Maithra Raghu, Samy Bengio, and Oriol. Vinyals. 2018. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv:1909.09157
[18]
Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. (2016).
[19]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799
[20]
Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, and Yunhao Tang. 2019. Es-maml: Simple hessian-free meta learning. arXiv:1910.01215
[21]
Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 328--339.
[22]
Joaquin. Vanschoren. 2018. Metalearning: A survey. arXiv:1810.03548
[23]
Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K Leung, Christian Makaya, Ting He, and Kevin Chan. 2018. When edge meets learning: Adaptive control for resource-constrained distributed machine learning. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 63--71.

Cited By

View all
  • (2023)G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615208(4365-4369)Online publication date: 21-Oct-2023
  • (2023)DAFTA: Distributed Architecture for Fusion-Transformer training AccelerationProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3579142.3594294(1-9)Online publication date: 18-Jun-2023

Index Terms

  1. Distributed training for accelerating metalearning algorithms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BiDEDE '21: Proceedings of the International Workshop on Big Data in Emergent Distributed Environments
    June 2021
    65 pages
    ISBN:9781450384650
    DOI:10.1145/3460866
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. acceleration
    2. deep learning
    3. distributed training
    4. metalearning

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '21
    Sponsor:

    Acceptance Rates

    BiDEDE '21 Paper Acceptance Rate 8 of 17 submissions, 47%;
    Overall Acceptance Rate 25 of 47 submissions, 53%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615208(4365-4369)Online publication date: 21-Oct-2023
    • (2023)DAFTA: Distributed Architecture for Fusion-Transformer training AccelerationProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3579142.3594294(1-9)Online publication date: 18-Jun-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media