Inconsistency Ranking-based Noisy Label Detection for High-quality Data

Yuan, Ruibin; Yin, Hanzhi; Wang, Yi; He, Yifan; Ye, Yushi; Zhang, Lei; Wu, Zhizheng

Computer Science > Computation and Language

arXiv:2212.00239 (cs)

[Submitted on 1 Dec 2022 (v1), last revised 15 Jun 2023 (this version, v2)]

Title:Inconsistency Ranking-based Noisy Label Detection for High-quality Data

Authors:Ruibin Yuan, Hanzhi Yin, Yi Wang, Yifan He, Yushi Ye, Lei Zhang, Zhizheng Wu

View PDF

Abstract:The success of deep learning requires high-quality annotated and massive data. However, the size and the quality of a dataset are usually a trade-off in practice, as data collection and cleaning are expensive and time-consuming. In real-world applications, especially those using crowdsourcing datasets, it is important to exclude noisy labels. To address this, this paper proposes an automatic noisy label detection (NLD) technique with inconsistency ranking for high-quality data. We apply this technique to the automatic speaker verification (ASV) task as a proof of concept. We investigate both inter-class and intra-class inconsistency ranking and compare several metric learning loss functions under different noise settings. Experimental results confirm that the proposed solution could increase both the efficient and effective cleaning of large-scale speaker recognition datasets.

Comments:	5 pages
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2212.00239 [cs.CL]
	(or arXiv:2212.00239v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.00239

Submission history

From: Hanzhi Yin [view email]
[v1] Thu, 1 Dec 2022 03:09:33 UTC (1,574 KB)
[v2] Thu, 15 Jun 2023 14:08:55 UTC (2,374 KB)

Computer Science > Computation and Language

Title:Inconsistency Ranking-based Noisy Label Detection for High-quality Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Inconsistency Ranking-based Noisy Label Detection for High-quality Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators