Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

An overview of the CATE algorithms for real-time pitch determination

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In this paper, we present a recent algorithm for pitch detection based on an implicit circular autocorrelation of the glottal excitation signal. This algorithm operates in real time without the use of any post-processing technique. This article focuses on the correction of the pitch contours estimated and on the reduction in classification errors in speech signals using simple voicing decision techniques. To evaluate the performance of our algorithms, we used the Bagshaw and Keele databases. We show in this study that the sum of the percentage of the unvoiced errors and the percentage of the voiced errors, for the male Bagshaw corpus, reaches a very good score of 14.67. For the female corpus, our results are also competitive compared to other algorithms using the same database. Concerning the Keele database, we succeed to obtain very good gross pitch error, voicing decision error and F0 frame error rates, respectively, 0.44, 0.65 and 1.55 % in the whole corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Bagshaw, P.C., Hiller, S.M., Jack, M.A.: Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching. In: Proceedings of the European Conference on Speech Technology, Berlin, 2, pp. 1000–1003 (1993)

  2. Bahja, F., Di Martino, J., Ibn Elhaj, E.: Real-time pitch tracking using the eCATE algorithm. Presented at the ISIVC, Rabat, Morocco, 1–2 Oct 2010

  3. Bahja, F., Di Martino, J., Ibn Elhaj, E., Aboutajdine, D.: An improvement of the eCATE algorithm for F0 detection. Presented at the 10th International Symposium on Communications and Information Technologies, Tokyo, Japan, 26–29 Oct 2010

  4. Camacho, A.: SWIPE: a sawtooth waveform inspired pitch estimator for speech and music. PhD thesis, University of Florida, USA (2007)

  5. Chu, W., Alwan, A.: Reducing F0 frame error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend. ICASSP (2009)

  6. Chu, W., Alwan, A.: SAFE: a statistical approach to F0 estimation under clean and noisy conditions. IEEE Trans. Audio Speech Lang. Process. 20(3), 933–967 (2012)

    Article  Google Scholar 

  7. De Cheveigne, A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)

    Article  Google Scholar 

  8. Di Martino, J., Laprie, Y.: An efficient F0 determination algorithm based on the implicit calculation of the autocorrelation of the temporal excitation signal. Presented at the 6th European Conference on Speech Communication and Technology EUROSPEECH, Budapest, Hungary (1999)

  9. Gold, B., Rabiner, L.R.: Parallel processing techniques for estimating pitch periods of speech in the time domain. J. Acoust. Soc. Am. 46(2), 442–448 (1969)

    Google Scholar 

  10. Krusback, D., Niederjohn, R.: An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech. IEEE Trans. Signal Process. 39(2), 319–329 (1991)

    Article  Google Scholar 

  11. Mahadevan, V., Espy-Wilson, C.Y.: Maximum likelihood pitch estimation using sinusoidal modeling. Presented at the International Conference on Communications and Signal Processing (ICCSP) (2011)

  12. Markel, J.D.: The SIFT algorithm for fundamental frequency estimation. IEEE Trans. Audio Electroacoust. 20, 367–377 (1972)

    Google Scholar 

  13. Medan, Y., Yair, E., Chazan, D.: Super resolution pitch determination of speech signals. IEEE Trans. Signal Process. ASSP-39 1, 40–48 (1991)

    Google Scholar 

  14. Messaoud, M.A.B., Bouzid, A., Ellouze, N.: Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Process. J. 5(3), 344–355 (2011)

    Article  Google Scholar 

  15. Messaoud, M.A.B., Bouzid, A., Ellouze, N.: Pitch estimation and voiced decision by spectral autocorrelation compression of multi-scale product. In: JEP-TALN-RECITAL conference, Grenoble, June 4–8, 2012, vol. 1: JEP, Grenoble, Juin 4–8, pp. 201–208 (2012)

  16. Nakatani, T., Amano, S., Irino, T., Ishizuka, K., Kondo, T.: A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments. Speech Commun. 50(3), 203–214 (2008)

    Article  Google Scholar 

  17. Ney, H.: A dynamic programming algorithm for nonlinear smoothing. Signal Process. 5(2), 163–173 (1983)

    Article  MathSciNet  Google Scholar 

  18. Noll, A.M.: Cepstrum pitch determination. J. Acoust. Soc. Am. 41(2), 293–309 (1967)

    Article  MathSciNet  Google Scholar 

  19. Noll, A.M.: Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum and a maximum likelihood estimate. In: Proceedings of the Symposium on Computer Processing in, Communication, pp. 779–798 (1969)

  20. Oppenheim, A.V.: A speech analysis synthesis system based on homomorphic filtering. J. Acoust. Soc. Am. 45, 458–465 (1969)

    Article  Google Scholar 

  21. Oppenheim, A.V., Schafer, R.W.: Digital signal processing. Prentice Hall, Englewood Cliffs (1975)

  22. Philips, M.S.: A feature-based time domain pitch tracker. J. Acoust. Soc. Am. 77, S9–S10 (1985)

    Article  Google Scholar 

  23. Plante, F., Meyer, G., Ainsworth, W.A.: A pitch extraction reference database. In: Proceedings of the Eurospeech, pp. 837–840 (1995)

  24. Rabiner, L.R., Sambur, M.R.: Voiced-unvoiced-silence detection using the Itakura LPC distance measure. In: Proceedings of ICASSP, pp. 323–326 (1977)

  25. Saul, L.K., Lee, D.D., Isbell, C.L., LeCun, Y.: Real time voice processing with audiovisual feedback: toward autonomous agents with perfect pitch. In: Proceedings of NIPS (2002)

  26. Schroeder, M.R.: Period histogram and product spectrum: new methods for fundamental frequency measurement. J. Acoust. Soc. Am. 43(4), 829–834 (1968)

    Article  Google Scholar 

  27. Secrest, B.G., Doddington, G.R.: An integrated pitch tracking algorithm for speech systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Boston, pp. 1352–1355 (1983)

  28. Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis, chapter 14. Elsevier Science B.V, Amsterdam (1995)

    Google Scholar 

  29. Yegnanarayana, B., Murty, K.S.R.: Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang Process. 17(4), 614–624 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fadoua Bahja.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bahja, F., Di Martino, J., Ibn Elhaj, E. et al. An overview of the CATE algorithms for real-time pitch determination. SIViP 9, 589–599 (2015). https://doi.org/10.1007/s11760-013-0488-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-013-0488-4

Keywords