Evolving Attention with Residual Convolutions

Wang, Yujing; Yang, Yaming; Bai, Jiangang; Zhang, Mingliang; Bai, Jing; Yu, Jing; Zhang, Ce; Huang, Gao; Tong, Yunhai

Computer Science > Machine Learning

arXiv:2102.12895 (cs)

[Submitted on 20 Feb 2021]

Title:Evolving Attention with Residual Convolutions

Authors:Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

View PDF

Abstract:Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However, they are learned independently in each layer and sometimes fail to capture precise patterns. In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers. On one hand, the attention maps in different layers share common knowledge, thus the ones in preceding layers can instruct the attention in succeeding layers through residual connections. On the other hand, low-level and high-level attentions vary in the level of abstraction, so we adopt convolutional layers to model the evolutionary process of attention maps. The proposed evolving attention mechanism achieves significant performance improvement over various state-of-the-art models for multiple tasks, including image classification, natural language understanding and machine translation.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2102.12895 [cs.LG]
	(or arXiv:2102.12895v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.12895

Submission history

From: Yujing Wang [view email]
[v1] Sat, 20 Feb 2021 15:24:06 UTC (7,029 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yujing Wang
Jing Bai
Jing Yu
Ce Zhang
Gao Huang

…

export BibTeX citation

Computer Science > Machine Learning

Title:Evolving Attention with Residual Convolutions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Evolving Attention with Residual Convolutions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators