Zamba: A Compact 7B SSM Hybrid Model

Glorioso, Paolo; Anthony, Quentin; Tokpanov, Yury; Whittington, James; Pilault, Jonathan; Ibrahim, Adam; Millidge, Beren

Computer Science > Machine Learning

arXiv:2405.16712 (cs)

[Submitted on 26 May 2024]

Title:Zamba: A Compact 7B SSM Hybrid Model

Authors:Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge

View PDF HTML (experimental)

Abstract:In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable scale. Zamba is trained on 1T tokens from openly available datasets and is the best non-transformer model at this scale. Zamba pioneers a unique architecture combining a Mamba backbone with a single shared attention module, thus obtaining the benefits of attention at minimal parameter cost. Due to its architecture, Zamba is significantly faster at inference than comparable transformer models and requires substantially less memory for generation of long sequences. Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay. We open-source the weights and all checkpoints for Zamba, through both phase 1 and annealing phases.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2405.16712 [cs.LG]
	(or arXiv:2405.16712v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.16712

Submission history

From: Quentin Anthony [view email]
[v1] Sun, 26 May 2024 22:23:02 UTC (995 KB)

Computer Science > Machine Learning

Title:Zamba: A Compact 7B SSM Hybrid Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Zamba: A Compact 7B SSM Hybrid Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators