Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Xie, Lei; Fu, Zhong-Hua; Feng, Wei; Luo, Yong

doi:10.1007/s00530-010-0205-x

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Original Research
Published: 28 September 2010

Volume 17, pages 101–112, (2011)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Lei Xie¹,
Zhong-Hua Fu¹,
Wei Feng² &
…
Yong Luo¹

437 Accesses
Explore all metrics

Abstract

Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on sound classification based on SVM

Article 16 April 2019

Feature Engineering for Music/Speech Detection in Costa Rica Radio Broadcast

A Classification of an Audio Signal Using the Wold-Cramer Decomposition

Notes

References

Androutsos, D., Guan, L., Venetsanopoulos, A.N.: Semantic retrieval of multimedia. IEEE Signal Process. Mag. 14, 237–253 (2006)
Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Article Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. LNCS, vol. 5768, pp. 894–C903 (2009)
Carey, M.J., Parris, E.S., Lloyd-Thomas, H.: A comparison of features for speech, music discrimination. In: ICASSP, vol. 1, pp. 149–152. Phoenix, USA (1999)
Chen, L., Gunduz, S., Ozsu, M.T.: Mixed type audio classification with support vector machine. In: International Conference on Multimedia and Expo, pp. 781–784. Toronto, Canada (2006)
Cheong, S., Oh, S.H., Lee, S.Y.: Support vector machines with binary tree architecture for multi-class classification. Neural Inf. Process. 2(3), 47–51 (2004)
Google Scholar
Childers, D.G., Skinner, D.P., Kemerait, R.C.: The cepstrum: a guide to processing. Proc. IEEE 65(10), 1428–1443 (1977)
Article Google Scholar
Choi, M.Y., Song, H.J., Kim, H.S.: Discrimination for robust speech recognition in robots. In: International Symposium on Robot and Human Interactive Communication, vol. 1, pp. 118–121. Jeju, Korea (2007)
Cortes, C., Vapnik, V.: Support network vectors. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Feng, W., Jia, J., Liu, Z.Q.: Self-validated labeling of Markov random fields for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2010)
Gerhard, D.: Pitch extraction and fundamental frequency: History and current techniques. Tech. rep., University of Regina (2003)
Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighborhood component analysis. Adv. Neural Inf. Process. Syst. 17, 513–520 (2005)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Article MATH Google Scholar
Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 18(6), 607–616 (1996)
Google Scholar
Jiang, H., Bai, J., Zhang, S., Xu, B.: Svm-based audio scene classification. In: NLP-KE, vol. 131–136, pp. 897–900 (2005)
Keum, J.S., Lee, H.S.: Speech/music discrimination using spectral peak feature for speaker indexing. In: International Symposium on Intelligent Signal Processing and Communication Systems, pp. 323–326 (2006)
Khan, M.K.S., Al-Khatib, W.G.: Machine-learning based classification of speech and music. Multimedia Syst. 12(1), 55–67 (2006)
Article Google Scholar
Li, D., Sethi, I.K., Dimitrova, N., McGee, T.: Classification of general audio data for content-based retrieval. Pattern Recognit. Lett. 22, 533–544 (2001)
Article MATH Google Scholar
Li, Y., Dorai, C.: Svm-based audio classification for instructional video analysis. In: ICASSP, vol. 5, pp. 897–900. Toronto, Canada (2004)
Liu, C., Xie, L., Meng, H.: Classification of music and speech in mandarin news broadcasts. In: National Conference on Man–Machine Speech Communication. Huangshan, China (2007)
Lu, L., Zhang, H.J.: Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10(7), 504–516 (2002)
Article Google Scholar
Lu, L., Zhang, H.J., Li, Z.: Content-based audio classification and segmentation by using support vector machines. Multimedia Syst. 8, 482–491 (2003)
Article Google Scholar
Mckinney, M., Breebaart, J.: Features for audio and music classification. In: Proceedings of the International Symposium on Music Information Retrieval, pp. 151–158 (2003)
Panagiotakis, C., Tziritaz, G.: A speech/music discriminator based on rms and zero-crossings. IEEE Trans. Multimedia 7(1), 155–166 (2005)
Article Google Scholar
Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: A speech/music discriminator of radio recordings based on dynamic programming and bayesian networks. IEEE Trans. Multimedia 10(5), 846–857 (2008)
Article Google Scholar
Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: ICASSP, vol. 2, pp. 1331–1334 (1997)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wang, J., Wu, Q., Deng, H., Yan, Q.: Real-time speech/music classification with a hierarchical oblique decision tree. In: ICASSP, pp. 2033–2036 (2008)
Wang, W.Q., Gao, W., Ying, D.W.: A fast and robust speech/music discrimination approach. Inf. Commun. Signal Process. 3, 1325–1329 (2003)
Google Scholar
Weston, J., Watkins, C.: Multi-class support vector machines. Tech. Rep. CSD-TR-98-04, University of London, Egham, UK (1998)
Wu, Q., Yan, Q., Deng, H., Wang, J.: A combination of data mining method with decision trees building for speech/music discrimination. Comput. Speech Lang. 24(7), 257–272 (2010)
Article Google Scholar
Xie, L.: Discovering salient prosodic cues and their interactions for automatic story segmenation in Mandarin broadcast news. Multimedia Syst. 14, 237–253 (2008)
Article Google Scholar
Xie, L., Wang, G.: A two-stage multi-feature integration approach to unsupervised speaker change detection in real-time news broadcasting. In: International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 350–353 (2008)
Zhang, T., Jay Kuo, C.C.: Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech Audio Process. 9(4), 441–457 (2001)
Article Google Scholar
Zheng, L., Xie, L., Wang, X., Lu, M., Yang, Y., Zhang, Y.: An antomatic caption generator for mandarin broadcast news. In: 5th Joint Conference on Harmonious Human Machine Environment. Xi’an, China (2009)
Zhu, Y., Sun, Q., Rahardja, S.: Detecting musical sounds in broadcast audio based on pitch tuning analysis. In: International Conference on Multimedia and Expo, pp. 13–16. Toronto, Canada (2006)

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (60802085), the Program for New Century Excellent Talents in University (2008) supported by the Ministry of Education (MOE) of China, the Research Fund for the Doctoral Program of Higher Education in China (20070699015), the Natural Science Basic Research Plan of Shaanxi Province (2007F15) and the NPU Foundation for Fundamental Research (W018103).

Author information

Authors and Affiliations

Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Science, Northwestern Polytechnical University, Xi’an, China
Lei Xie, Zhong-Hua Fu & Yong Luo
Media Computing Group, School of Creative Media, City University of Hong Kong, Kowloon, Hong Kong, China
Wei Feng

Authors

Lei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhong-Hua Fu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yong Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Xie.

Additional information

Communicated by T. Haenselmann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, L., Fu, ZH., Feng, W. et al. Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news. Multimedia Systems 17, 101–112 (2011). https://doi.org/10.1007/s00530-010-0205-x

Download citation

Received: 01 April 2010
Accepted: 10 September 2010
Published: 28 September 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s00530-010-0205-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research on sound classification based on SVM

Feature Engineering for Music/Speech Detection in Costa Rica Radio Broadcast

A Classification of an Audio Signal Using the Wold-Cramer Decomposition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research on sound classification based on SVM

Feature Engineering for Music/Speech Detection in Costa Rica Radio Broadcast

A Classification of an Audio Signal Using the Wold-Cramer Decomposition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation