Word Rotator's Distance

Yokoi, Sho; Takahashi, Ryo; Akama, Reina; Suzuki, Jun; Inui, Kentaro

Computer Science > Computation and Language

arXiv:2004.15003 (cs)

[Submitted on 30 Apr 2020 (v1), last revised 16 Nov 2020 (this version, v3)]

Title:Word Rotator's Distance

Authors:Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui

View PDF

Abstract:A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover's distance (i.e., optimal transport cost), which we refer to as word rotator's distance. Besides, we find how to grow the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines. The source code is available at this https URL

Comments:	17 pages, accepted at EMNLP 2020
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2004.15003 [cs.CL]
	(or arXiv:2004.15003v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2004.15003
Journal reference:	EMNLP 2020

Submission history

From: Sho Yokoi [view email]
[v1] Thu, 30 Apr 2020 17:48:42 UTC (769 KB)
[v2] Wed, 7 Oct 2020 17:56:57 UTC (149 KB)
[v3] Mon, 16 Nov 2020 17:57:08 UTC (154 KB)

Computer Science > Computation and Language

Title:Word Rotator's Distance

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Word Rotator's Distance

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators