DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Ben-Kish, Assaf; Zimerman, Itamar; Abu-Hussein, Shady; Cohen, Nadav; Globerson, Amir; Wolf, Lior; Giryes, Raja

Computer Science > Machine Learning

arXiv:2406.14528 (cs)

[Submitted on 20 Jun 2024]

Title:DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Authors:Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes

View PDF HTML (experimental)

Abstract:Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are 25x times longer than the ones seen during training, and does so without utilizing additional computational resources. We will release our code and models.

Comments:	Link To Official Implementation: this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.14528 [cs.LG]
	(or arXiv:2406.14528v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.14528

Submission history

From: Assaf Ben-Kish [view email]
[v1] Thu, 20 Jun 2024 17:40:18 UTC (4,189 KB)

Computer Science > Machine Learning

Title:DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators