chapter

Identification of nonlinear oscillator models for speech analysis and synthesis

Authors:

Claudia Lainscsek,

Erhard RankAuthors Info & Claims

Nonlinear Speech Modeling and Applications: advanced Lectures and Revised Selected Papers

January 2005

Pages 74 - 113

Published: 01 January 2005 Publication History

Abstract

More than ten years ago the first successful application of a nonlinear oscillator model to high-quality speech signal processing was reported (Kubin and Kleijn, 1994). Since then, numerous developments have been initiated to turn nonlinear oscillators into a standard tool for speech technology. The present contribution will review and compare several of these attempts with a special emphasis on adaptive model identification from data and the approaches to the associated machine learning problems. This includes Bayesian methods for the regularization of the parameter estimation problem (including the pruning of irrelevant parameters) and Ansatz library (Lainscsek et al., 2001) based methods (structure selection of the model). We conclude with the observation that these advanced identification methods need to be combined with a thorough background from speech science to succeed in practical modeling tasks.

References

[1]

Kubin, G.: Nonlinear processing of speech. In Kleijn, W.B., Paliwal, K.K., eds.: Speech Coding and Synthesis. Elsevier, Amsterdam etc. (1995) 557-610.

[2]

Kubin, G.: Synthesis and coding of continuous speech with the nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA (1996) 267-270.

Digital Library

[3]

Kubin, G., Kleijn, W.B.: Time-scale modification of speech based on a nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Adelaide, South Australia (1994) 453-456.

[4]

Sauer, T.: A noise reduction method for signals from nonlinear systems. Physica D 52 (1992) 193-201.

Digital Library

[5]

Hegger, R., Kantz, H., Matassini, L.: Noise reduction for human speech signals by local projection in embedding spaces. IEEE Transactions on Circuits and Systems 48 (2001) 1454-1461.

[6]

Terez, D.E.: Robust pitch determination using nonlinear state-space embedding. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Orlando (FL), USA (2002) 345-348.

[7]

Mann, I., McLaughlin, S.: A nonlinear algorithm for epoch marking in speech signals using Poincaré maps. In: Proceedings of the European Signal Processing Conference. Volume 2. (1998) 701-704.

[8]

Lindgren, A.C., Johnson, M.T., Povinelli, R.J.: Joint frequency domain and reconstructed phase space features for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Montreal, Quebec, Canada (2004) 533-536.

[9]

Birgmeier, M.: A fully Kalman-trained radial basis function network for nonlinear speech modeling. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia (1995) 259-264.

[10]

Kubin, G.: Synthesis and coding of continuous speech with the nonlinear oscillator model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 1., Atlanta (GA) (1996) 267-270.

Digital Library

[11]

Haas, H., Kubin, G.: A multi-band nonlinear oscillator model for speech. In: Proceedings of the 32nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA (1998).

[12]

Mann, I., McLaughlin, S.: Stable speech synthesis using recurrent radial basis functions. In: Proceedings of the European Conference on Speech Communication and Technology. Volume 5., Budapest, Hungary (1999) 2315-2318.

[13]

Narasimhan, K., Príncipe, J.C., Childers, D.G.: Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, Arizona (1999) 389-392.

Digital Library

[14]

Rank, E., Kubin, G.: Nonlinear synthesis of vowels in the LP residual domain with a regularized RBF network. In Mira, J., Prieto, A., eds.: Lecture Notes in Computer Science. Volume 2085., Springer (2001) 746-753, part II.

Digital Library

[15]

Mann, I., McLaughlin, S.: Synthesising natural-sounding vowels using a nonlinear dynamical model. Signal Processing 81 (2001) 1743-1756.

[16]

Rank, E.: Application of Bayesian trained RBF networks to nonlinear time-series modeling. Signal Processing 83 (2003) 1393-1410.

Digital Library

[17]

Takens, F.: Detecting strange attractors in turbulence. Lecture Notes in Mathematics 898 (1981) 366.

[18]

Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. Journal of Statistical Physics 65 (1991) 579-616.

[19]

Haykin, S., Príncipe, J.: Making sense of a complex world. IEEE Signal Processing Magazine 15 (1998) 66-81.

[20]

Judd, K., Mees, A.: Embedding as a modeling problem. Physica D 120 (1998) 273-286.

Digital Library

[21]

Bernhard, H.P.: The Mutual Information Function and its Application to Signal Processing. PhD thesis, Vienna University of Technology (1997).

[22]

Hegger, R., Kantz, H., Schreiber, T.: Practical implementation of nonlinear time series methods: The TISEAN package. CHAOS 9 (1999) 413-435.

[23]

Bernhard, H.P., Kubin, G.: Detection of chaotic behaviour in speech signals using Fraser's mutual information algorithm. In: Proc. 13th GRETSI Symp. Signal and Image Process., Juan-les-Pins, France (1991) 1301-1311.

[24]

Mann, I.: An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques. PhD thesis, University of Edinburgh (1999).

[25]

Rank, E., Kubin, G.: Nonlinear synthesis of vowels in the LP residual domain with a regularized RBF network. In Mira, J., Prieto, A., eds.: Lecture Notes in Computer Science. Volume 2085. Springer (2001) 746-753, part II.

Digital Library

[26]

Li, J., Zhang, B., Lin, F.: Nonlinear speech model based on support vector machine and wavelet transform. In: Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03), Sacramento, CA (2003) 259-264.

Digital Library

[27]

Haas, H., Kubin, G.: A multi-band nonlinear oscillator model for speech. In: Proc. 32nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA (1998).

[28]

Townshend, B.: Nonlinear prediction of speech. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. (1991) 425-428.

Digital Library

[29]

Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. W.H. Winston (1977).

[30]

Poggio, T., Girosi, F.: A theory of networks for approximation and learning. A.I. Memo 1140, Massachusetts Institute of Technology (1989).

Digital Library

[31]

Stone, M.: Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society B 36 (1974) 111-147.

[32]

MacKay, D.J.: Bayesian interpolation. Neural Computation 4 (1992) 415-447.

Digital Library

[33]

MacKay, D.J.: A practical Bayesian framework for backprop networks. Neural Computation 4 (1992) 448-472.

Digital Library

[34]

MacKay, D.J.: The evidence framework applied to classification networks. Neural Computation 4 (1992) 698-714.

Digital Library

[35]

Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelyhood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39 (1977) 1-38.

[36]

Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1 (2001) 211-244.

Digital Library

[37]

Fant, G., Liljencrants, J., Lin, Q.G.: A four parameter model of glottal flow. Quarterly Progress Status Report 4, Speech Transmission Laboratory/Royal Institute of Technology, Stockholm, Sweden (1985).

[38]

Köppl, H., Kubin, G., Paoli, G.: Bayesian methods for sparse RLS adaptive filters. In: Thirty-Seventh IEEE Asilomar Conference on Signals, Systems and Computers. Volume 2. (2003) 1273-1277.

[39]

Kubin, G., Atal, B.S., Kleijn, W.B.: Performance of noise excitation for unvoiced speech. In: Proc. IEEE Workshop on Speech Coding for Telecommunication, St.Jovite, Québec, Canada (1993) 1-2.

[40]

Holm, S.: Automatic generation of mixed excitation in a linear predictive speech synthesizer. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Volume 6., Atlanta (GA) (1981) 118-120.

[41]

Hermes, D.J.: Synthesis of breathy vowels: Some research methods. Speech Communication 10 (1991) 497-502.

Digital Library

[42]

Skoglund, J., Kleijn, W.B.: On the significance of temporal masking in speech coding. In: Proceedings of the International Conference on Spoken Language Processing. Volume 5., Sydney (1998) 1791-1794.

[43]

Jackson, P.J., Shadle, C.H.: Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data. In: Proceedings of 5th Speech Production Seminar, Kloster Seeon, Germany (2000) 185-188.

[44]

Jackson, P.J., Shadle, C.H.: Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. Journal of the Acoustic Society of America 108 (2000) 1421-1434.

[45]

Stylianou, Y., Laroche, J., Moulines, E.: High-quality speech modification based on a harmonic + noise model. In: Proceedings of the European Conference on Speech Communication and Technology, Madrid, Spain (1995) 451-454.

Digital Library

[46]

Bailly, G.: A parametric harmonic+noise model. In Keller, E., Bailly, G., Monaghan, A., Terken, J., Huckvale, M., eds.: Improvements in Speech Synthesis. Wiley (2002) 22-38.

[47]

Rank, E., Kubin, G.: An oscillator-plus-noise model for speech synthesis. Speech Communication (2005) Accepted for publication.

[48]

Lu, H.L., Smith, III, J.O.: Glottal source modeling for singing voice. In: Proc. International Computer Music Conference, Berlin, Germany (2000) 90-97.

[49]

Lainscsek, C., Letellier, C., Schürrer, F.: Ansatz library for global modeling with a structure selection. Physical Review E 64 (2001) 016206:1-15.

[50]

Lainscsek, C., Letellier, C., Gorodnitsky, I.: Global modeling of the Rössler system from the z-variable. Physics Letters A 314(5-6) (2003) 409-127.

[51]

Judd, K., Mees, A.: On selecting models for nonlinear time series. Physica D 82 (1995) 426-444.

Digital Library

[52]

Gouesbet, G., Letellier, C.: Global vector-field reconstruction by using a multivariate polynomial l ₂ approximation on nets. Phys. Rev. E 49 (1994) 4955.

[53]

Press, W., Flannery, B., Teukolsky, S., Vetterling, W.: Numerical Recipes in C. Cambridge University Press (1990).

[54]

Lainscsek, C., Gorodnitsky, I.: Ansatz libraries for systems with quadratic and cubic non-linearities. http://cloe.ucsd.edu/claudia/poster DD 2002.pdf (2002).

[55]

Eichhorn, R., Linz, S., Hänggi, P.: Transformations of nonlinear dynamical systems to jerky motion and its application to minimal chaotic flows. Physical Review E 58 (6) (1998) 7151-7164.

[56]

Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley (1998).

Digital Library

[57]

Holland, J.H.: Adaptation in natural and artificial systems. MIT Press (1992).

Digital Library

[58]

Ishizaka, K., Flanagan, J.L.: Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell Systems Technical Journal 51 (1972) 1233-1267.

Cited By

Esposito AMarinaro M(2005)Some notes on nonlinearities of speechNonlinear Speech Modeling and Applications10.5555/2167540.2167542(1-14)Online publication date: 1-Jan-2005
https://dl.acm.org/doi/10.5555/2167540.2167542

Identification of nonlinear oscillator models for speech analysis and synthesis
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
2. Hardware
  1. Communication hardware, interfaces and storage

Recommendations

Analysis, synthesis, and recognition of stressed speech
A formant-based linear prediction speech synthesis/analysis system
Speaker identification based on nonlinear speech models
ASILOMAR '95: Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers (2-Volume Set)

Some of the work on speech processing has focused on modeling speech as an AM-FM signal. The success of the AM-FM model motivated us to investigate a similar nonlinear model and examine its application in speaker identification. Tests are carried out to ...

Comments

Information & Contributors

Information

Published In

cover image Guide books

Nonlinear Speech Modeling and Applications: advanced Lectures and Revised Selected Papers

January 2005

431 pages

ISBN:3540274413

Editors:
Gérard Chollet
CNRS LTCI/TSI Paris, 46 rue Barrault, Paris Cedex 13, France
,
Anna Esposito
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, Vietri sul Mare, SA, Italy
,
Marcos Faundez-Zanuy
Escola Universitària Politècnica de Mataró, Universitat Politècnica de Catalunya, Via Pellegrino 19, Barcelona, SA, Spain
,
Maria Marinaro
Dipartimento di Fisica "E.R. Caianiello", Università degli Studi di Salerno, Via S. Allende, Baronissi, SA, Italy

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2005

Qualifiers

Chapter

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Esposito AMarinaro M(2005)Some notes on nonlinearities of speechNonlinear Speech Modeling and Applications10.5555/2167540.2167542(1-14)Online publication date: 1-Jan-2005
https://dl.acm.org/doi/10.5555/2167540.2167542

View Options

View options

Media

Figures

Other

Tables

View Table of Contents