Showing 1–2 of 2 results for author: Kuzmin, N

Search v0.5.6 released 2020-02-24

arXiv:2302.09523 [pdf, other]

eess.AS cs.LG cs.SD eess.SP

Probabilistic Back-ends for Online Speaker Recognition and Clustering

Authors: Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng

Abstract: This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an ex… ▽ More This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an extremely constrained version of probabilistic linear discriminant analysis (PLDA). The proposed model improves over the cosine scoring for multi-enrollment recognition while keeping the same performance in the case of one-to-one comparisons. Finally, we consider an online speaker clustering task where each step naturally involves multi-enrollment recognition. We propose an online clustering algorithm allowing us to take benefits from the PLDA model such as the ability to handle uncertainty and better score calibration. Our experiments demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: Accepted to ICASSP 2023
arXiv:2202.13826 [pdf, ps, other]

eess.AS cs.LG cs.SD

doi 10.21437/Odyssey.2022-1

Magnitude-aware Probabilistic Speaker Embeddings

Authors: Nikita Kuzmin, Igor Fedorov, Alexey Sholokhov

Abstract: Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignoring the magnitude. However, recent studies have shown that the magnitudes of the embeddings extracted by deep neural networks may indicate the quality o… ▽ More Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignoring the magnitude. However, recent studies have shown that the magnitudes of the embeddings extracted by deep neural networks may indicate the quality of the corresponding inputs. This paper explores the properties of the magnitudes of the embeddings related to quality assessment and out-of-distribution detection. We propose a new probabilistic speaker embedding extractor using the information encoded in the embedding magnitude and leverage it in the speaker verification pipeline. We also propose several quality-aware diarization methods and incorporate the magnitudes in those. Our results indicate significant improvements over magnitude-agnostic baselines both in speaker verification and diarization tasks. △ Less

Submitted 23 October, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: Accepted to Odyssey 2022: The Speaker and Language Recognition Workshop, camera-ready version

Search v0.5.6 released 2020-02-24