Memformer: A Memory-Augmented Transformer for Sequence Modeling

Wu, Qingyang; Lan, Zhenzhong; Qian, Kun; Gu, Jing; Geramifard, Alborz; Yu, Zhou

Computer Science > Computation and Language

arXiv:2010.06891 (cs)

[Submitted on 14 Oct 2020 (v1), last revised 12 Apr 2022 (this version, v2)]

Title:Memformer: A Memory-Augmented Transformer for Sequence Modeling

Authors:Qingyang Wu, Zhenzhong Lan, Kun Qian, Jing Gu, Alborz Geramifard, Zhou Yu

View PDF

Abstract:Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new optimization scheme, memory replay back-propagation (MRBP), which promotes long-range back-propagation through time with a significantly reduced memory requirement. Experimental results show that Memformer has achieved comparable performance compared to the baselines by using 8.1x less memory space and 3.2x faster on inference. Analysis of the attention pattern shows that our external memory slots can encode and retain important information through timesteps.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.06891 [cs.CL]
	(or arXiv:2010.06891v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.06891

Submission history

From: Qingyang Wu [view email]
[v1] Wed, 14 Oct 2020 09:03:36 UTC (1,417 KB)
[v2] Tue, 12 Apr 2022 20:57:54 UTC (1,437 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Qingyang Wu
Zhenzhong Lan
Zhou Yu

export BibTeX citation

Computer Science > Computation and Language

Title:Memformer: A Memory-Augmented Transformer for Sequence Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Memformer: A Memory-Augmented Transformer for Sequence Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators