[PDF][PDF] Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification.

HB Sailor, DM Agrawal, HA Patil - InterSpeech, 2017 - isca-archive.org
InterSpeech, 2017isca-archive.org
In this paper, we propose to use Convolutional Restricted Boltzmann Machine (ConvRBM)
to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an
unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using
annealed dropout technique and parameters are optimized using Adam optimization. The
subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in
the mid-frequency range while some of the low-frequency subband filters resemble …
Abstract
In this paper, we propose to use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in the mid-frequency range while some of the low-frequency subband filters resemble Gammatone basis. The auditory-like filterbank scale is nonlinear wrt the center frequencies of the subband filters and follows the standard auditory scales. We have used our proposed model as a front-end for the Environmental Sound Classification (ESC) task with supervised Convolutional Neural Network (CNN) as a back-end. Using CNN classifier, the ConvRBM filterbank (ConvRBM-BANK) and its score-level fusion with the Mel filterbank energies (FBEs) gave an absolute improvement of 10.65%, and 18.70% in the classification accuracy, respectively, over FBEs alone on the ESC-50 database. This shows that the proposed ConvRBM filterbank also contains highly complementary information over the Mel filterbank, which is helpful in the ESC task.
isca-archive.org