Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces

Aboagye, Prince O; Zheng, Yan; Yeh, Michael; Wang, Junpeng; Zhuang, Zhongfang; Chen, Huiyuan; Wang, Liang; Zhang, Wei; Phillips, Jeff

Computer Science > Computation and Language

arXiv:2212.02468 (cs)

[Submitted on 5 Dec 2022]

Title:Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces

Authors:Prince O Aboagye, Yan Zheng, Michael Yeh, Junpeng Wang, Zhongfang Zhuang, Huiyuan Chen, Liang Wang, Wei Zhang, Jeff Phillips

View PDF

Abstract:Optimal Transport (OT) provides a useful geometric framework to estimate the permutation matrix under unsupervised cross-lingual word embedding (CLWE) models that pose the alignment task as a Wasserstein-Procrustes problem. However, linear programming algorithms and approximate OT solvers via Sinkhorn for computing the permutation matrix come with a significant computational burden since they scale cubically and quadratically, respectively, in the input size. This makes it slow and infeasible to compute OT distances exactly for a larger input size, resulting in a poor approximation quality of the permutation matrix and subsequently a less robust learned transfer function or mapper. This paper proposes an unsupervised projection-based CLWE model called quantized Wasserstein Procrustes (qWP). qWP relies on a quantization step of both the source and target monolingual embedding space to estimate the permutation matrix given a cheap sampling procedure. This approach substantially improves the approximation quality of empirical OT solvers given fixed computational cost. We demonstrate that qWP achieves state-of-the-art results on the Bilingual lexicon Induction (BLI) task.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.02468 [cs.CL]
	(or arXiv:2212.02468v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.02468
Journal reference:	AMTA 2022

Submission history

From: Prince Aboagye [view email]
[v1] Mon, 5 Dec 2022 18:23:59 UTC (204 KB)

Computer Science > Computation and Language

Title:Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators