ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Ma, Yi; Wang, Shuai; Liu, Tianchi; Li, Haizhou

Computer Science > Sound

arXiv:2501.05729 (cs)

[Submitted on 10 Jan 2025 (v1), last revised 14 Jan 2025 (this version, v2)]

Title:ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Authors:Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li

View PDF HTML (experimental)

Abstract:In speaker verification, we use computational method to verify if an utterance matches the identity of an enrolled speaker. This task is similar to the manual task of forensic voice comparison, where linguistic analysis is combined with auditory measurements to compare and evaluate voice samples. Despite much success, we have yet to develop a speaker verification system that offers explainable results comparable to those from manual forensic voice comparison. A novel approach, Explainable Phonetic Trait-Oriented (ExPO) network, is proposed in this paper to introduce the speaker's phonetic trait which describes the speaker's characteristics at the phonetic level, resembling what forensic comparison does. ExPO not only generates utterance-level speaker embeddings but also allows for fine-grained analysis and visualization of phonetic traits, offering an explainable speaker verification process. Furthermore, we investigate phonetic traits from within-speaker and between-speaker variation perspectives to determine which trait is most effective for speaker verification, marking an important step towards explainable speaker verification. Our code is available at this https URL.

Comments:	Accepted by IEEE Signal Processing Letters
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2501.05729 [cs.SD]
	(or arXiv:2501.05729v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2501.05729

Submission history

From: Yi Ma [view email]
[v1] Fri, 10 Jan 2025 05:53:37 UTC (39,472 KB)
[v2] Tue, 14 Jan 2025 07:28:10 UTC (36,312 KB)

Computer Science > Sound

Title:ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators