Memformer: The Memory-Augmented Transformer

Wu, Qingyang; Lan, Zhenzhong; Gu, Jing; Yu, Zhou

Computer Science > Computation and Language

arXiv:2010.06891v1 (cs)

[Submitted on 14 Oct 2020 (this version), latest version 12 Apr 2022 (v2)]

Title:Memformer: The Memory-Augmented Transformer

Authors:Qingyang Wu, Zhenzhong Lan, Jing Gu, Zhou Yu

View PDF

Abstract:Transformer models have obtained remarkable accomplishments in various NLP tasks. However, these models have efficiency issues on long sequences, as the complexity of their self-attention module scales quadratically with the sequence length. To remedy the limitation, we present Memformer, a novel language model that utilizes a single unified memory to encode and retrieve past information. It includes a new optimization scheme, Memory Replay Back-Propagation, which promotes long-range back-propagation through time with a significantly reduced memory requirement. Memformer achieves $\mathcal{O}(n)$ time complexity and $\mathcal{O}(1)$ space complexity in processing long sequences, meaning that the model can handle an infinite length sequence during inference. Our model is also compatible with other self-supervised tasks to further improve the performance on language modeling. Experimental results show that Memformer outperforms the previous long-range sequence models on WikiText-103, including Transformer-XL and compressive Transformer.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.06891 [cs.CL]
	(or arXiv:2010.06891v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.06891

Submission history

From: Qingyang Wu [view email]
[v1] Wed, 14 Oct 2020 09:03:36 UTC (1,417 KB)
[v2] Tue, 12 Apr 2022 20:57:54 UTC (1,437 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Qingyang Wu
Zhenzhong Lan
Zhou Yu

export BibTeX citation

Computer Science > Computation and Language

Title:Memformer: The Memory-Augmented Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Memformer: The Memory-Augmented Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators