MCSD: An Efficient Language Model with Diverse Fusion

Yang, Hua; Li, Duohai; Li, Shiman

Computer Science > Computation and Language

arXiv:2406.12230 (cs)

[Submitted on 18 Jun 2024 (v1), last revised 11 Jul 2024 (this version, v2)]

Title:MCSD: An Efficient Language Model with Diverse Fusion

Authors:Hua Yang, Duohai Li, Shiman Li

View PDF HTML (experimental)

Abstract:Transformers excel in Natural Language Processing (NLP) due to their prowess in capturing long-term dependencies but suffer from exponential resource consumption with increasing sequence lengths. To address these challenges, we propose MCSD model, an efficient language model with linear scaling and fast inference speed. MCSD model leverages diverse feature fusion, primarily through the multi-channel slope and decay (MCSD) block, to robustly represent features. This block comprises slope and decay sections that extract features across diverse temporal receptive fields, facilitating capture of both local and global information. In addition, MCSD block conducts element-wise fusion of diverse features to further enhance the delicate feature extraction capability. For inference, we formulate the inference process into a recurrent representation, slashing space complexity to $O(1)$ and time complexity to $O(N)$ respectively. Our experiments show that MCSD attains higher throughput and lower GPU memory consumption compared to Transformers, while maintaining comparable performance to larger-scale language learning models on benchmark tests. These attributes position MCSD as a promising base for edge deployment and embodied intelligence.

Comments:	8 pages, 9 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.12230 [cs.CL]
	(or arXiv:2406.12230v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.12230

Submission history

From: Hua Yang [view email]
[v1] Tue, 18 Jun 2024 03:08:01 UTC (794 KB)
[v2] Thu, 11 Jul 2024 03:29:19 UTC (794 KB)

Computer Science > Computation and Language

Title:MCSD: An Efficient Language Model with Diverse Fusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MCSD: An Efficient Language Model with Diverse Fusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators