Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Xue, Jintang; Wang, Yun-Cheng; Wei, Chengwei; Kuo, C. -C. Jay

Computer Science > Computation and Language

arXiv:2407.12342 (cs)

[Submitted on 17 Jul 2024 (v1), last revised 4 Nov 2024 (this version, v2)]

Title:Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Authors:Jintang Xue, Yun-Cheng Wang, Chengwei Wei, C.-C. Jay Kuo

View PDF HTML (experimental)

Abstract:As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.12342 [cs.CL]
	(or arXiv:2407.12342v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.12342

Submission history

From: Jintang Xue [view email]
[v1] Wed, 17 Jul 2024 06:36:09 UTC (68 KB)
[v2] Mon, 4 Nov 2024 09:52:25 UTC (946 KB)

Computer Science > Computation and Language

Title:Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators