Acceleration of large transformer model training by sensitivity-based layer dropping
Abstract
References
Recommendations
Accelerating training of transformer-based language models with progressive layer dropping
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing SystemsRecently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current methods for ...
Learn & drop: fast learning of cnns based on layer dropping
AbstractThis paper proposes a new method to improve the training efficiency of deep convolutional neural networks. During training, the method evaluates scores to measure how much each layer’s parameters change and whether the layer will continue learning ...
DAFTA: Distributed Architecture for Fusion-Transformer training Acceleration
BiDEDE '23: Proceedings of the International Workshop on Big Data in Emergent Distributed EnvironmentsMulti-modal data fusion transformer is a deep learning model that integrates information from multiple modalities, such as text, image, audio, etc., to improve performance in various tasks, especially in the remote sensing domain. Recent efforts ...
Comments
Information & Contributors
Information
Published In
Sponsors
- Association for the Advancement of Artificial Intelligence
Publisher
AAAI Press
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0