Unified Hypersphere Embedding for Speaker Recognition

Hajibabaei, Mahdi; Dai, Dengxin

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1807.08312 (eess)

[Submitted on 22 Jul 2018]

Title:Unified Hypersphere Embedding for Speaker Recognition

Authors:Mahdi Hajibabaei, Dengxin Dai

View PDF

Abstract:Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Results of experiments on VoxCeleb dataset suggest that: (i) Simple repetition and random time-reversion of utterances can reduce prediction errors by up to 18%. (ii) Lower dimensional embeddings are more suitable for verification. (iii) Use of proposed logistic margin loss function leads to unified embeddings with state-of-the-art identification and competitive verification accuracies.

Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:1807.08312 [eess.AS]
	(or arXiv:1807.08312v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1807.08312

Submission history

From: Mahdi Hajibabaei [view email]
[v1] Sun, 22 Jul 2018 16:26:31 UTC (19 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unified Hypersphere Embedding for Speaker Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unified Hypersphere Embedding for Speaker Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators