Sound event detection in real-life audio using joint spectral and temporal features

Yang, Wenjun; Krishnan, Sridhar

doi:10.1007/s11760-018-1288-7

Sound event detection in real-life audio using joint spectral and temporal features

Original Paper
Published: 28 April 2018

Volume 12, pages 1345–1352, (2018)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

354 Accesses
2 Altmetric
Explore all metrics

Abstract

Typical methods for overlapping sound event detection (SED) do not fully consider the joint spectral and temporal transition characteristics of the audio signal. They are generally based on training models using either separate data from each event class or mixed signals containing simultaneous sound events. This paper introduced a new approach for SED in real-life audio using Nonnegative Matrix Factor 2-D Deconvolution and RUSBoost techniques. The idea is to capture the two-dimensional joint spectral and temporal information from the time-frequency representation while possibly separating the sound mixture into several sources. In addition, the RUSBoost technique is utilized to address the class imbalance problem of the training data. The proposed approach is evaluated using the TUT Sound Event 2016 and 2017 datasets. The results showed that the proposed method outperformed the baseline methods. For the TUT Sound Event 2016 dataset, the proposed method reduced the total error rate by 5% while increasing the F1 score by 13.8%. For the TUT Sound Event 2017 dataset, the proposed method reduced the total error rate by 3% while increasing the F1 score by 8.1%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Brown, J.C.: Calculation of a constant q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)
Article Google Scholar
Bucak, S.S., Gunsel, B.: Online video scene clustering by competitive incremental nmf. Signal Image Video Process. 7(4), 723–739 (2013)
Article Google Scholar
Cakir, E., Heittola, T., Huttunen, H., Virtanen, T.: Polyphonic sound event detection using multi label deep neural networks. In: International Joint Conference on Neural Networks (IJCNN), 2015 , pp. 1–7. IEEE (2015)
Cotton, CV., Ellis, D.P.W.: Spectral vs. spectro-temporal features for acoustic event detection. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 69–72 (2011). http://dx.doi.org/10.1109/ASPAA.2011.6082331
Dennis, J.W.: Sound Event Recognition in Unstructured Environments Using Spectrogram Image Processing. Nanyang Technological University, Singapore (2014)
Google Scholar
El Aziz, M.E., Khidr, W.: Nonnegative matrix factorization based on projected hybrid conjugate gradient algorithm. Signal Image Video Process. 9(8), 1825 (2015)
Article Google Scholar
Heittola, T., Mesaros, A., Eronen, A., Virtanen, T.: Context-dependent sound event detection. EURASIP J. Audio Speech Music Process. 2013(1), 1–13 (2013)
Article Google Scholar
Heittola, T., Mesaros, A., Virtanen, T., Eronen, A.: Sound event detection in multisource environments using source separation. In: Proceedings on CHiME pp. 36–40 (2011)
Innami, S., Kasai, H.: Nmf-based environmental sound source separation using time-variant gain features. Comput. Math. Appl. 64(5), 1333–1342 (2012)
Article Google Scholar
Logan, B., et al.: Mel frequency cepstral coefficients for music modeling. ISMIR 270, 1–11 (2000)
Google Scholar
Mesaros, A., Heittola, T., Virtanen, T., Fagerlund, E., Hiltunen, A., Heittola, T.: Tut acoustic scenes 2016, development dataset (2016). http://dx.doi.org/10.5281/zenodo.45739
Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C., Allerhand, M.: Complex sounds and auditory images. Audit. Physiol. Percept. 83, 429–446 (1992)
Article Google Scholar
Phuong, N.C., Do Dat, T.: Sound classification for event detection: Application into medical telemonitoring. In: International Conference on Computing, Management and Telecommunications (ComManTel), 2013, pp. 330–333. IEEE (2013)
Rabaoui, A., Davy, M., Rossignol, S., Ellouze, N.: Using one-class svms and wavelets for audio surveillance. IEEE Trans. Inf. Forensics Secur. 3(4), 763–775 (2008)
Article Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: Improving classification performance when training data is skewed. In: 19th International Conference on Pattern Recognition, 2008. ICPR 2008. pp. 1–4. IEEE (2008)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Humans 40(1), 185–197 (2010)
Article Google Scholar
Shah, M., Mears, B., Chakrabarti, C., Spanias, A.: Lifelogging: Archival and retrieval of continuously recorded audio using wearable devices. In: 2012 IEEE International Conference on Emerging Signal Processing Applications (ESPA), pp. 99–102. IEEE (2012)
Shokrollahi, M., Krishnan, S.: Non-stationary signal feature characterization using adaptive dictionaries and non-negative matrix factorization. Signal Image Video Process. 10(6), 1025–1032 (2016)
Article Google Scholar
Wichern, G., Xue, J., Thornburg, H., Mechtley, B., Spanias, A.: Segmentation, indexing, and retrieval for environmental and natural sounds. IEEE Trans. Audio Speech Lang. Process. 18(3), 688–707 (2010)
Article Google Scholar
Yin, P., Sun, Y., Xin, J.: A geometric blind source separation method based on facet component analysis. Signal Image Video Process. 10(1), 19–28 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Ryerson University, Toronto, M5B 2K3, Canada
Wenjun Yang & Sridhar Krishnan

Authors

Wenjun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Sridhar Krishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenjun Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, W., Krishnan, S. Sound event detection in real-life audio using joint spectral and temporal features. SIViP 12, 1345–1352 (2018). https://doi.org/10.1007/s11760-018-1288-7

Download citation

Received: 13 November 2017
Revised: 27 March 2018
Accepted: 21 April 2018
Published: 28 April 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11760-018-1288-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sound event detection in real-life audio using joint spectral and temporal features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech and music separation approaches - a survey

Acoustic domain mismatch compensation in bird audio detection

The 2018 Signal Separation Evaluation Campaign

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Sound event detection in real-life audio using joint spectral and temporal features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech and music separation approaches - a survey

Acoustic domain mismatch compensation in bird audio detection

The 2018 Signal Separation Evaluation Campaign

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now