Robust scream sound detection via sound event partitioning

Lei, Baiying; Mak, Man-Wai

doi:10.1007/s11042-015-2555-z

Robust scream sound detection via sound event partitioning

Published: 25 March 2015

Volume 75, pages 6071–6089, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Baiying Lei¹ &
Man-Wai Mak²

665 Accesses
Explore all metrics

Abstract

This paper proposes a robust scream-sound detection scheme for acoustic surveillance applications. To enhance the discriminability between scream and non-scream sounds, a sound-event partitioning (SEP) method that facilitates the extraction of multiple acoustic vectors from a single sound event is developed. Regularized principal component analysis (PCA) and normalization are applied to the acoustic vectors, which are then classified by support vector machines (SVMs). Experimental results based on 1000 sound events show that the proposed scheme is effective even if there are severe mismatches between the training and testing conditions. The experimental results also show that the proposed scheme can reduce the equal error rate (EER) by up to 60 % when compared to a classical approach that uses mel-frequency cepstral coefficients (MFCC) as features. Extensive analyses on different processing stages of the proposed sound detection scheme also suggest that sound partitioning and feature normalization play important roles in boosting the detection performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing Feature Kernel and Nonparametric Window Subband Power Distribution for Robust Sound Event Classification

Sound event detection in real-life audio using joint spectral and temporal features

Article 28 April 2018

A parametric survey on polyphonic sound event detection and localization

Article 02 August 2024

Notes

It is important to note that individual frames do not contain sufficient information for differentiating scream and non-scream sounds. In fact, individual frames of scream and non-scream sound are highly overlapped in the feature space, which will cause problems if they are directly used for training SVM classifiers.

References

addnoise. http://www.mathworks.com/matlabcentral/fileexchange/32136-add-noise/content/addnoise/addnoise.m
Ali S, Smith-Miles KA (2006) Improved support vector machine generalization using normalized input space. In: Proc. of 19th Australian Joint Conference on Artificial Intelligence. pp 362–371
Atrey PK, Maddage NC, Kankanhalli MS (2006) Audio based event detection for multimedia surveillance. In: Proc.of IEEE International Conference on Acoustics, Speech and Signal Processing. pp V813-V816
Chu S, Narayanan S, Kuo CCJ (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio, Speech Lang Process 17(6):1142–1158
Article Google Scholar
Clavel C, Ehrette T, Richard G (2005) Events detection for an audio-based surveillance system. In: Proc.of IEEE International Conference on Multimedia and Expo. pp 1306–1309
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Dennis J, Tran HD, Chng E-S (2013) Image feature representation of the subband power distribution for robust sound event classification. IEEE Trans Audio, Speech Lang Process 21(2):367–377
Article Google Scholar
Dennis J, Tran HD, Chng ES (2013) Overlapping sound event recognition using local spectrogram features and the generalised hough transform. Pattern Recogn Lett 34(9):1085–1093
Article Google Scholar
Dennis J, Tran HD, Li H (2011) Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process Lett 18(2):130–133
Article Google Scholar
Ferrer L, Bratt H, Burget L, Cernocky H, Glembek O, Graciarena M, Lawson A, Lei Y, Matejka P, Plchot O (2011) Promoting robustness for speaker modeling in the community: the PRISM evaluation set. In: Proc.of NIST 2011 Workshop
Ghoraani B, Krishnan S (2011) Time-frequency matrix feature extraction and classification of environmental audio signals. IEEE Trans Audio, Speech Lang Process 19(7):2197–2209
Article Google Scholar
Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215
Article Google Scholar
Hautamaki V, Kinnunen T, Sedlak F, Lee KA, Ma B, Li H (2013) Sparse classifier fusion for speaker verification. IEEE Trans Audio, Speech Lang Process 21(8):1622–1631
Article Google Scholar
Huang W, Chiew T-K, Li H, Kok TS, Biswas J (2010) Scream detection for home applications. In: Proc.of 6th IEEE Conference on Industrial Electronics and Applications. pp 2115–2120
Human Sound Effects. http://www.sound-ideas.com/
Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: Proc.of European Conference on Computer Vision. pp 774–787
Kim MJ, Kim H (2011) Automatic extraction of pornographic contents using radon transform based audio features. In: Prof. of 9th International Workshop onContent-Based Multimedia Indexing. pp 205–210
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52(1):12–40
Article Google Scholar
Kotus J, Lopatka K, Czyzewski A (2014) Detection and localization of selected acoustic events in acoustic field for smart surveillance applications. Multimedia Tools Appl 68(1):5–21
Article Google Scholar
Lei B, Rahman SA, Song I (2014) Content-based classification of breath sound with enhanced features. Neurocomputing 141:139–147
Article Google Scholar
Liao W-H, Lin Y-K (2009) Classification of non-speech human sounds: Feature selection and snoring sound analysis. In: Proc. of IEEE International Conference on on Systems, Man and Cybernetics. pp 2695–2700
Mak M-W, Kung S-Y (2012) Low-power SVM classifiers for sound event classification on mobile devices. In: Proc.of IEEE International Conference on Acoustics, Speech and Signal Processing pp 1985–1988
Mak M-W, Rao W (2011) Utterance partitioning with acoustic vector resampling for GMM–SVM speaker verification. Speech Comm 53(1):119–130
Article Google Scholar
Mak M-W, Yu H-B (2014) A study of voice activity detection techniques for NIST speaker recognition evaluations. Comput Speech Lang 28(1):295–313
Article Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Book MATH Google Scholar
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. In: Proc.of 5th European Conference on Speech Communication and Technology. pp 1895–1898
Ntalampiras S, Potamitis I, Fakotakis N (2009) On acoustic surveillance of hazardous situations. In: Proc.of IEEE International Conference on Acoustics, Speech and Signal Processing. pp 165–168
Penet C, Demarty C-H, Gravier G, Gros P (2014) Variability modelling for audio events detection in movies. Multimedia Tools and Applications 1–31
PRISM-SET. https://code.google.com/p/prism-set/
Ralf H, Thore G (2002) A PAC-Bayesian margin bound for linear classifiers. IEEE Trans Inf Theory 48(12):3140–3150
Article MathSciNet MATH Google Scholar
Rao W, Mak M-W (2013) Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Trans Audio, Speech Lang Process 21(5):1012–1022
Article Google Scholar
rir. http://sgm-audio.com/research/rir/rir.html
Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Article MathSciNet MATH Google Scholar
Simonyan K, Parkhi OM, Vedaldi A, Zisserman A (2013) Fisher Vector Faces in the Wild. In: Proc. of British Machine Vision Conference. pp 8.1-8.12
Tran HD, Li H (2011) Sound event recognition with probabilistic distance SVMs. IEEE Trans Audio, Speech Lang Process 19(6):1556–1568
Article Google Scholar
Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F, Sarti A (2007) Scream and gunshot detection and localization for audio-surveillance systems. In: Proc.of IEEE Conference on Advanced Video and Signal Based Surveillance. pp 21–26
Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm 12(3):247–251
Article Google Scholar
Wang Y, Han K, Wang D (2013) Exploring monaural features for classification-based speech segregation. IEEE Trans Audio, Speech Lang Process 21(2):270–279
Article Google Scholar
Zhao X, Shao Y, Wang D (2012) CASA-based robust speaker identification. IEEE Trans Audio, Speech Lang Process 20(5):1608–1616
Article Google Scholar
Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: Proc.of IEEE International Conference on Acoustics, Speech and Signal Processing. pp 7204–7208

Download references

Acknowledgments

The work was supported partly by National Natural Science Foundation of China (No. 61402296), Motorola Solutions Foundation (ID: 7186445) and the Hong Kong Polytechnic University Grant No. G-YL78. The authors would like to thank Wing-Lung Leung for developing the sound recording system and part of the Android App.

Author information

Authors and Affiliations

Department of Biomedical Engineering, School of Medicine, National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, and Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Shenzhen University, Shenzhen, 518060, Guangdong, China
Baiying Lei
Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Man-Wai Mak

Authors

Baiying Lei
View author publications
You can also search for this author in PubMed Google Scholar
Man-Wai Mak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baiying Lei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lei, B., Mak, MW. Robust scream sound detection via sound event partitioning. Multimed Tools Appl 75, 6071–6089 (2016). https://doi.org/10.1007/s11042-015-2555-z

Download citation

Received: 08 September 2014
Revised: 13 January 2015
Accepted: 06 March 2015
Published: 25 March 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11042-015-2555-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust scream sound detection via sound event partitioning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Missing Feature Kernel and Nonparametric Window Subband Power Distribution for Robust Sound Event Classification

Sound event detection in real-life audio using joint spectral and temporal features

A parametric survey on polyphonic sound event detection and localization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Robust scream sound detection via sound event partitioning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Missing Feature Kernel and Nonparametric Window Subband Power Distribution for Robust Sound Event Classification

Sound event detection in real-life audio using joint spectral and temporal features

A parametric survey on polyphonic sound event detection and localization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation