View article

[PDF] from academia.edu

Unsupervised spoken word retrieval using Gaussian-Bernoulli restricted Boltzmann machines

Authors

Raghavendra Reddy Pappagari, Shekhar Nayak, K Sri Rama Murty

Publication date

2014

Conference

Fifteenth Annual Conference of the International Speech Communication Association

Description

The objective of this work is to explore a novel unsupervised framework, using Restricted Boltzmann machines, for Spoken Word Retrieval (SWR). In the absence of labelled speech data, SWR is typically performed by matching sequence of feature vectors of query and test utterances using dynamic time warping (DTW). In such a scenario, performance of SWR system critically depends on representation of the speech signal. Typical features, like mel-frequency cepstral coefficients (MFCC), carry significant speaker-specific information, and hence may not be used directly in SWR system. To overcome this issue, we propose to capture the joint density of the acoustic space spanned by MFCCs using Gaussian-Bernoulli restricted Boltzmann machine (GBRBM). In this work, we have used hidden activations of the GBRBM as features for SWR system. Since the GBRBM is trained with speech data collected from large number of speakers, the hidden activations are more robust to the speaker-variability compared to MFCCs. The performance of the proposed features is evaluated on Telugu broadcast news data, and an absolute improvement of 12% was observed compared to MFCCs.

Total citations

Cited by 10

20152016201720182019202020211 1 4 1 1 1 1

Scholar articles

Unsupervised spoken word retrieval using Gaussian-Bernoulli restricted Boltzmann machines

RR Pappagari, S Nayak, KSR Murty - Fifteenth Annual Conference of the International …, 2014