Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2964284.2970377acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper
Public Access

A Discriminative and Compact Audio Representation for Event Detection

Published: 01 October 2016 Publication History

Abstract

This paper presents a novel two-phase method for audio representation: Discriminative and Compact Audio Representation (DCAR). In the first phase, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discriminative and compact by formulating an optimization problem on Grassmannian manifolds, which we found represents the structure of audio effectively. Experimental results on the YLI-MED dataset show that the proposed DCAR representation consistently outperforms state-of-the-art audio representations: i-vector, mv-vector, and GMM.

References

[1]
P. Absil, R. Mahony, and R. Sepulcher, editors. Optimization algorithms on matrix manifolds. Princeton University Press, 2008.
[2]
V. Arsigny, P. Fillard, X. Pennec, and N. Apache. Geometric means in a novel vector space structure on symmetric positive definite matrices. SIAM on Matrix Analysis, 29(1):328--347, 2007.
[3]
D. Barchiesi, D. Giannoulis, D. Stowell, and M. Plumbley. Acoustic scene classification: classifying environments from the sounds they produce. Signal Processing Magazine, 32(3):16--34, 2015.
[4]
J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The YLI-MED corpus: Characteristics, procedures, and plans (TR-15-001). Technical report, ICSI, 2015. arXiv:1503.04250.
[5]
N. Dehak, P. Kenny, R. Dehak, P. Dumouchefl, and P. Ouellet. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 19(4):788--798, 2011.
[6]
B. Elizalde, H. Lei, and G. Friedland. An i-vector representation of acoustic environment for audio-based video event detection on user generated content. In Proceedings of the IEEE International Symposium on Multimedia, 2013.
[7]
A. Eronen, J. Tuomi, A. Klapuri, and S. Fagerlund. Audio-based context awareness - acoustic modeling and perceptual evaluation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 529--532, 2003.
[8]
Z. Huang, Y. Cheng, K. Li, V. Hautamaki, and C. Lee. A blind segmentation approach to acoustic event detection asked on i-vector. In Proceedings of INTERSPEECH, 2013.
[9]
Q. Jin, P. Schulman, S. Rabat, S. Burger, and D. Ding. Event-based video retrieval using audio. In Proceedings of INTERSPEECH, pages 2085--2088, 2012.
[10]
L. Jing, B. Liu, J. Choi, A. Janin, J. Bernd, M. W. Mahoney, and G. Friedland. DCAR: A discriminative and compact audio representation to improve event detection. http://arxiv.org/abs/1607.04378, 2016.
[11]
L. Jing, C. Zhang, and M. Ng. SNMFCA: Supervised NMF-based image classification and annotation. IEEE Transactions on Image Processing, 21(11):4508--4521, 2012.
[12]
M. Lan, C. Tan, J. Su, and Y. Lu. Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4):721--735, 2009.
[13]
H. Li, T. Jiang, and K. Zhang. Efficient and robust feature extraction by maximum margin criterion. In Proceedings of the Conference on Neural Information Processing Systems (NIPS), 2004.
[14]
G. McLachlan, editor. Discriminant analysis and statistical pattern recognition. Wiley Interscience, 2004.
[15]
R. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. Divakaran. Acoustic super models for large scale videos event detection. In Proceedings of ACM Multimedia, 2011.
[16]
G. Roma, W. Nogueira, and P. Herrera. Recurrence quantification analysis features for auditory scene classification. In Proceedings of IEEE AASP Challenge on DCASE, 2013.
[17]
B. Scholkopf and A. Smola, editors. Learning with Kernels. MIT Press, 2002.
[18]
D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. Plumbley. Detection and classification of audio scenes and events. IEEE Transactions on Multimedia, 17(10):1733--1746, 2015.
[19]
B. Thomee, D. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L. Li. YFCC100M: The new data in multimedia research. Communications of the ACM, 59(2):64--73, 2016.

Cited By

View all
  • (2020)Speeding up training of automated bird recognizers by data reduction of audio featuresPeerJ10.7717/peerj.84078(e8407)Online publication date: 27-Jan-2020
  • (2020)Recurrent Compressed Convolutional Networks for Short Video Event DetectionIEEE Access10.1109/ACCESS.2020.30039398(114162-114171)Online publication date: 2020
  • (2017)DCAR: A Discriminative and Compact Audio Representation for Audio ProcessingIEEE Transactions on Multimedia10.1109/TMM.2017.270393919:12(2637-2650)Online publication date: Dec-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. audio data
  2. discriminative and compact representation
  3. event detection

Qualifiers

  • Short-paper

Funding Sources

  • NSFC
  • PCSIRT
  • LDRD

Conference

MM '16
Sponsor:
MM '16: ACM Multimedia Conference
October 15 - 19, 2016
Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)6
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Speeding up training of automated bird recognizers by data reduction of audio featuresPeerJ10.7717/peerj.84078(e8407)Online publication date: 27-Jan-2020
  • (2020)Recurrent Compressed Convolutional Networks for Short Video Event DetectionIEEE Access10.1109/ACCESS.2020.30039398(114162-114171)Online publication date: 2020
  • (2017)DCAR: A Discriminative and Compact Audio Representation for Audio ProcessingIEEE Transactions on Multimedia10.1109/TMM.2017.270393919:12(2637-2650)Online publication date: Dec-2017
  • (2017)Unified Embedding and Metric Learning for Zero-Exemplar Event Detection2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2017.225(2087-2096)Online publication date: Jul-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media