Seal: Advancing Speech Language Models to be Few-Shot Learners

Lei, Shuyu; Liu, Lingen; Yang, Jiaolong; Jiao, Yasen; Yang, Yuxiang; Yang, Yushu; Guo, Xiang

Computer Science > Computation and Language

arXiv:2407.14875v1 (cs)

[Submitted on 20 Jul 2024]

Title:Seal: Advancing Speech Language Models to be Few-Shot Learners

Authors:Shuyu Lei, Lingen Liu, Jiaolong Yang, Yasen Jiao, Yuxiang Yang, Yushu Yang, Xiang Guo

View PDF HTML (experimental)

Abstract:Existing auto-regressive language models have demonstrated a remarkable capability to perform a new task with just a few examples in prompt, without requiring any additional training. In order to extend this capability to a multi-modal setting (i.e. speech and language), this paper introduces the Seal model, an abbreviation for speech language model. It incorporates a novel alignment method, in which Kullback-Leibler divergence loss is performed to train a projector that bridges a frozen speech encoder with a frozen language model decoder. The resulting Seal model exhibits robust performance as a few-shot learner on two speech understanding tasks. Additionally, consistency experiments are conducted to validate its robustness on different pre-trained language models.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.14875 [cs.CL]
	(or arXiv:2407.14875v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.14875

Submission history

From: Shuyu Lei [view email]
[v1] Sat, 20 Jul 2024 13:28:12 UTC (1,657 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-07

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Seal: Advancing Speech Language Models to be Few-Shot Learners

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Seal: Advancing Speech Language Models to be Few-Shot Learners

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators