Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Consistent Modeling of the Static and Time-Derivative Cepstrums for Speech Recognition Using HSPTM

  • Conference paper
Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

  • 1600 Accesses

Abstract

Most speech models represent the static and derivative cepstral features with separate models that can be inconsistent with each other. In our previous work, we proposed the hidden spectral peak trajectory model (HSPTM) in which the static cepstral trajectories are derived from a set of hidden trajectories of the spectral peaks (captured as spectral poles) in the time-frequency domain. In this work, the HSPTM is generalized such that both the static and derivative features are derived from a single set of hidden pole trajectories using the well-known relationship between the spectral poles and cepstral coefficients. As the pole trajectories represent the resonance frequencies across time, they can be interpreted as formant tracks in voiced speech which have been shown to contain important cues for phonemic identification. To preserve the common recognition framework, the likelihood functions are still defined in the cepstral domain with the acoustic models defined by the static and derivative cepstral trajectories. However, these trajectories are no longer separately estimated but jointly derived, and thus are ensured to be consistent with each other. Vowel classification experiments were performed on the TIMIT corpus, using low complexity models (2-mixture). They showed 3% (absolute) classification error reduction compared to the standard HMM of the same complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gish, H., Ng, K.: A segmental speech model with applications to word spotting. In: Proc. of the IEEE Inter. Conf. on Acoust., Speech and Signal Proc., pp. 447–450 (1993)

    Google Scholar 

  2. Homes, W., Russell, M.: Probabilistic-trajectory segmental HMMs. Computer Speech and Language 13, 3–37 (1999)

    Article  Google Scholar 

  3. Siu, M., Iyer, R., Gish, H., Quillen, C.: Parametric trajectory mixtures for lvcsr. In: Proc. of the Inter. Conf. on Spoken Language Processing (1998)

    Google Scholar 

  4. Goldberger, J., Burshtein, D., Franco, H.: Segmental modeling using a continuous mixture of nonparametric models. IEEE Trans. on Speech and Audio Processing 7, 262–271 (1999)

    Article  Google Scholar 

  5. Deng, L., Aksmanovic, M., Sun, X., Wu, C.: Speech recognition using hidden markov models with polynomial regression functions as nonstationary states. IEEE Trans. on Speech and Audio Processing, 507–520 (1994)

    Google Scholar 

  6. Li, C., Siu, M.: An efficient incremental likelihood evaluation for polynomial trajectory model using with application to model training and recognition. In: Proc. of the IEEE Inter. Conf. on Acoust., Speech and Signal Proc., pp. 756–759 (2003)

    Google Scholar 

  7. Zen, H., Tokuda, K., Kitamura, T.: Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features. In: Proc. of the IEEE Inter. Conf. on Acoust., Speech and Signal Proc., pp. 837–840 (2004)

    Google Scholar 

  8. Deng, L., Yu, D., Acero, A.: A bidirectional target-filtering model of speech coarticulation and reduction: Two-stage implementation for phonetic recognition. IEEE Trans. on Speech and Audio Processing 14, 256–265 (2006)

    Article  Google Scholar 

  9. Lai, Y., Siu, M.: Hidden spectral peak trajectory model for phone classification. In: Proc. of the IEEE Inter. Conf. on Acoust., Speech and Signal Proc., pp. 17–21 (2004)

    Google Scholar 

  10. Au Yeung, S., Li, C., Siu, M.: Sub-phonetic polynomial segment model for large vocabulary continuous speech recognition. In: Proc. of the IEEE Inter. Conf. on Acoust., Speech and Signal Proc., pp. 193–196 (2005)

    Google Scholar 

  11. Huang, X.D., Acero, A., Hon, H.W.: Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall Inc., Upper Saddle River (2000)

    Google Scholar 

  12. Fukada, T., Sagisaka, Y., Paliwal, K.K.: Model parameter estimation for mixture density polynomial segment models. In: Proc. of the IEEE Inter. Conf. on Acoust., Speech and Signal Proc., pp. 1403–1406 (1997)

    Google Scholar 

  13. Gish, H., Ng, K.: Parametric trajectory models for speech recognition. In: Proc. of the Inter. Conf. on Spoken Language Processing, pp. 466–469 (1996)

    Google Scholar 

  14. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK book (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lai, YP., Siu, MH. (2006). Consistent Modeling of the Static and Time-Derivative Cepstrums for Speech Recognition Using HSPTM. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_34

Download citation

  • DOI: https://doi.org/10.1007/11939993_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics