Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

The Effects of Windowing on the Calculation of MFCCs for Different Types of Speech Sounds

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

Abstract

Unit selection speech synthesis involves concatenating segments of speech contained in a large database in such a way as to create novel utterances. The sequence of speech segments is chosen using a cost function. In particular the join cost determines how well consecutive speech segments fit together by extracting acoustic parameters from frames of speech on either side of a potential join point and calculating the distance between them. Although many different metrics have been proposed, there is very little agreement on what constitutes an appropriate window length, with values in the literature ranging from 5 ms to 30 ms. Clearly it is not possible to compare the performance of different metrics when the role of such a fundamental parameter such as window length is not properly investigated with real speech signals. Here we address this short-coming by focusing on one of the most common metrics, the mel-frequency cepstral coefficient (MFCC) [1] and show with experimental results that the choice of window length has a direct impact on the MFCC values calculated, and that the ability of the distance measure to predict discontinuity differs with respect to both the width of the windowing function and the whether the sounds are vowels, voiceless fricatives and voiced fricatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)

    Article  Google Scholar 

  2. Hunt, A., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. Acoustics, Speech, and Signal Processing 1, 373–376 (1996)

    Google Scholar 

  3. O’Shaughnessy, D.: Speech communication: human and machine, p. 150. Addison-Wesley (1987)

    Google Scholar 

  4. Memon, S., Lech, M., Maddage, N., He, L.: Application of the Vector Quantization Methods and the Fused MFCC-IMFCC Features in the GMM based Speaker Recognition. In: Zaher, A.A. (ed.) Recent Advances in Signal Processing. InTech (2009)

    Google Scholar 

  5. Kirkpatrick, B., O’Brien, D., Scaife, R.: A comparison of spectral continuity measures as a join cost in concatenative speech synthesis. In: Proceedings of the IET Irish Signals and Systems Conference, ISSC (2006)

    Google Scholar 

  6. Kelly, A.C.: Join Cost Optimisation for Unit Selection Speech Synthesis, Sao Paulo School of Advanced Studies in Speech Dynamics, Brazil (2010), poster, http://www.dinafon.iel.unicamp.br/spsassd_files/posterAmeliaKelly.pdf

  7. Wouters, J., Macon, M.W.: Perceptual evaluation of distance measures for concatenative speech synthesis. In: International Conference on Spoken Language Processing, ICSLP (1998)

    Google Scholar 

  8. Klabbers, E., Veldhuis, R.: On the Reduction of Concatenation Artefacts in Diphone Synthesis. In: ICASLP 1998: Proceedings of the Acoustics, Speech, and Language Processing (1998)

    Google Scholar 

  9. Chen, J.D., Campbell, N.: Objective distance measures for assessing concatenative speech synthesis. In: Proceedings of Eurospeech (1999)

    Google Scholar 

  10. Styliano, Y., Syrdal, A.: Perceptual and Objective Detection of Discontinuities in Concatenative Speech Synthesis. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (2001)

    Google Scholar 

  11. Vepa, J., King, S., Taylor, P.: New objective distance measures for spectral discontinuities in concatenative speech synthesis. In: Proceedings of 2002 IEEE Workshop (2002)

    Google Scholar 

  12. Pantazis, Y., Stylianou, Y., Klabbers, E.: Discontinuity Detection in Concatenated Speech Synthesis based on Nonlinear Speech Analysis. In: Interspeech (2005)

    Google Scholar 

  13. Kominek, J., Black, A.: The CMU ARCTIC speech databases for speech synthesis research. Tech. Rep. CMU-LTI-03-177. Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA (2003), http://www.festvox.org/cmuarctic/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kelly, A.C., Gobl, C. (2011). The Effects of Windowing on the Calculation of MFCCs for Different Types of Speech Sounds. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25020-0_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25019-4

  • Online ISBN: 978-3-642-25020-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics