Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2167540.2167547guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

Identification of nonlinear oscillator models for speech analysis and synthesis

Published: 01 January 2005 Publication History

Abstract

More than ten years ago the first successful application of a nonlinear oscillator model to high-quality speech signal processing was reported (Kubin and Kleijn, 1994). Since then, numerous developments have been initiated to turn nonlinear oscillators into a standard tool for speech technology. The present contribution will review and compare several of these attempts with a special emphasis on adaptive model identification from data and the approaches to the associated machine learning problems. This includes Bayesian methods for the regularization of the parameter estimation problem (including the pruning of irrelevant parameters) and Ansatz library (Lainscsek et al., 2001) based methods (structure selection of the model). We conclude with the observation that these advanced identification methods need to be combined with a thorough background from speech science to succeed in practical modeling tasks.

References

[1]
Kubin, G.: Nonlinear processing of speech. In Kleijn, W.B., Paliwal, K.K., eds.: Speech Coding and Synthesis. Elsevier, Amsterdam etc. (1995) 557-610.
[2]
Kubin, G.: Synthesis and coding of continuous speech with the nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA (1996) 267-270.
[3]
Kubin, G., Kleijn, W.B.: Time-scale modification of speech based on a nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Adelaide, South Australia (1994) 453-456.
[4]
Sauer, T.: A noise reduction method for signals from nonlinear systems. Physica D 52 (1992) 193-201.
[5]
Hegger, R., Kantz, H., Matassini, L.: Noise reduction for human speech signals by local projection in embedding spaces. IEEE Transactions on Circuits and Systems 48 (2001) 1454-1461.
[6]
Terez, D.E.: Robust pitch determination using nonlinear state-space embedding. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Orlando (FL), USA (2002) 345-348.
[7]
Mann, I., McLaughlin, S.: A nonlinear algorithm for epoch marking in speech signals using Poincaré maps. In: Proceedings of the European Signal Processing Conference. Volume 2. (1998) 701-704.
[8]
Lindgren, A.C., Johnson, M.T., Povinelli, R.J.: Joint frequency domain and reconstructed phase space features for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Montreal, Quebec, Canada (2004) 533-536.
[9]
Birgmeier, M.: A fully Kalman-trained radial basis function network for nonlinear speech modeling. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia (1995) 259-264.
[10]
Kubin, G.: Synthesis and coding of continuous speech with the nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Atlanta (GA) (1996) 267-270.
[11]
Haas, H., Kubin, G.: A multi-band nonlinear oscillator model for speech. In: Proceedings of the 32nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA (1998).
[12]
Mann, I., McLaughlin, S.: Stable speech synthesis using recurrent radial basis functions. In: Proceedings of the European Conference on Speech Communication and Technology. Volume 5., Budapest, Hungary (1999) 2315-2318.
[13]
Narasimhan, K., Príncipe, J.C., Childers, D.G.: Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, Arizona (1999) 389-392.
[14]
Rank, E., Kubin, G.: Nonlinear synthesis of vowels in the LP residual domain with a regularized RBF network. In Mira, J., Prieto, A., eds.: Lecture Notes in Computer Science. Volume 2085., Springer (2001) 746-753, part II.
[15]
Mann, I., McLaughlin, S.: Synthesising natural-sounding vowels using a nonlinear dynamical model. Signal Processing 81 (2001) 1743-1756.
[16]
Rank, E.: Application of Bayesian trained RBF networks to nonlinear time-series modeling. Signal Processing 83 (2003) 1393-1410.
[17]
Takens, F.: Detecting strange attractors in turbulence. Lecture Notes in Mathematics 898 (1981) 366.
[18]
Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. Journal of Statistical Physics 65 (1991) 579-616.
[19]
Haykin, S., Príncipe, J.: Making sense of a complex world. IEEE Signal Processing Magazine 15 (1998) 66-81.
[20]
Judd, K., Mees, A.: Embedding as a modeling problem. Physica D 120 (1998) 273-286.
[21]
Bernhard, H.P.: The Mutual Information Function and its Application to Signal Processing. PhD thesis, Vienna University of Technology (1997).
[22]
Hegger, R., Kantz, H., Schreiber, T.: Practical implementation of nonlinear time series methods: The TISEAN package. CHAOS 9 (1999) 413-435.
[23]
Bernhard, H.P., Kubin, G.: Detection of chaotic behaviour in speech signals using Fraser's mutual information algorithm. In: Proc. 13th GRETSI Symp. Signal and Image Process., Juan-les-Pins, France (1991) 1301-1311.
[24]
Mann, I.: An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques. PhD thesis, University of Edinburgh (1999).
[25]
Rank, E., Kubin, G.: Nonlinear synthesis of vowels in the LP residual domain with a regularized RBF network. In Mira, J., Prieto, A., eds.: Lecture Notes in Computer Science. Volume 2085. Springer (2001) 746-753, part II.
[26]
Li, J., Zhang, B., Lin, F.: Nonlinear speech model based on support vector machine and wavelet transform. In: Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03), Sacramento, CA (2003) 259-264.
[27]
Haas, H., Kubin, G.: A multi-band nonlinear oscillator model for speech. In: Proc. 32nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA (1998).
[28]
Townshend, B.: Nonlinear prediction of speech. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. (1991) 425-428.
[29]
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. W.H. Winston (1977).
[30]
Poggio, T., Girosi, F.: A theory of networks for approximation and learning. A.I. Memo 1140, Massachusetts Institute of Technology (1989).
[31]
Stone, M.: Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society B 36 (1974) 111-147.
[32]
MacKay, D.J.: Bayesian interpolation. Neural Computation 4 (1992) 415-447.
[33]
MacKay, D.J.: A practical Bayesian framework for backprop networks. Neural Computation 4 (1992) 448-472.
[34]
MacKay, D.J.: The evidence framework applied to classification networks. Neural Computation 4 (1992) 698-714.
[35]
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelyhood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39 (1977) 1-38.
[36]
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1 (2001) 211-244.
[37]
Fant, G., Liljencrants, J., Lin, Q.G.: A four parameter model of glottal flow. Quarterly Progress Status Report 4, Speech Transmission Laboratory/Royal Institute of Technology, Stockholm, Sweden (1985).
[38]
Köppl, H., Kubin, G., Paoli, G.: Bayesian methods for sparse RLS adaptive filters. In: Thirty-Seventh IEEE Asilomar Conference on Signals, Systems and Computers. Volume 2. (2003) 1273-1277.
[39]
Kubin, G., Atal, B.S., Kleijn, W.B.: Performance of noise excitation for unvoiced speech. In: Proc. IEEE Workshop on Speech Coding for Telecommunication, St.Jovite, Québec, Canada (1993) 1-2.
[40]
Holm, S.: Automatic generation of mixed excitation in a linear predictive speech synthesizer. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 6., Atlanta (GA) (1981) 118-120.
[41]
Hermes, D.J.: Synthesis of breathy vowels: Some research methods. Speech Communication 10 (1991) 497-502.
[42]
Skoglund, J., Kleijn, W.B.: On the significance of temporal masking in speech coding. In: Proceedings of the International Conference on Spoken Language Processing. Volume 5., Sydney (1998) 1791-1794.
[43]
Jackson, P.J., Shadle, C.H.: Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data. In: Proceedings of 5th Speech Production Seminar, Kloster Seeon, Germany (2000) 185-188.
[44]
Jackson, P.J., Shadle, C.H.: Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. Journal of the Acoustic Society of America 108 (2000) 1421-1434.
[45]
Stylianou, Y., Laroche, J., Moulines, E.: High-quality speech modification based on a harmonic + noise model. In: Proceedings of the European Conference on Speech Communication and Technology, Madrid, Spain (1995) 451-454.
[46]
Bailly, G.: A parametric harmonic+noise model. In Keller, E., Bailly, G., Monaghan, A., Terken, J., Huckvale, M., eds.: Improvements in Speech Synthesis. Wiley (2002) 22-38.
[47]
Rank, E., Kubin, G.: An oscillator-plus-noise model for speech synthesis. Speech Communication (2005) Accepted for publication.
[48]
Lu, H.L., Smith, III, J.O.: Glottal source modeling for singing voice. In: Proc. International Computer Music Conference, Berlin, Germany (2000) 90-97.
[49]
Lainscsek, C., Letellier, C., Schürrer, F.: Ansatz library for global modeling with a structure selection. Physical Review E 64 (2001) 016206:1-15.
[50]
Lainscsek, C., Letellier, C., Gorodnitsky, I.: Global modeling of the Rössler system from the z-variable. Physics Letters A 314(5-6) (2003) 409-127.
[51]
Judd, K., Mees, A.: On selecting models for nonlinear time series. Physica D 82 (1995) 426-444.
[52]
Gouesbet, G., Letellier, C.: Global vector-field reconstruction by using a multivariate polynomial l 2 approximation on nets. Phys. Rev. E 49 (1994) 4955.
[53]
Press, W., Flannery, B., Teukolsky, S., Vetterling, W.: Numerical Recipes in C. Cambridge University Press (1990).
[54]
Lainscsek, C., Gorodnitsky, I.: Ansatz libraries for systems with quadratic and cubic non-linearities. http://cloe.ucsd.edu/claudia/poster DD 2002.pdf (2002).
[55]
Eichhorn, R., Linz, S., Hänggi, P.: Transformations of nonlinear dynamical systems to jerky motion and its application to minimal chaotic flows. Physical Review E 58 (6) (1998) 7151-7164.
[56]
Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley (1998).
[57]
Holland, J.H.: Adaptation in natural and artificial systems. MIT Press (1992).
[58]
Ishizaka, K., Flanagan, J.L.: Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell Systems Technical Journal 51 (1972) 1233-1267.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide books
Nonlinear Speech Modeling and Applications: advanced Lectures and Revised Selected Papers
January 2005
431 pages
ISBN:3540274413
  • Editors:
  • Gérard Chollet,
  • Anna Esposito,
  • Marcos Faundez-Zanuy,
  • Maria Marinaro

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2005

Qualifiers

  • Chapter

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media