Speaker Individualities in Speech Spectral Envelopes and Fundamental Frequency Contours

Kitamura, Tatsuya; Akagi, Masato

doi:10.1007/978-3-540-74122-0_14

Tatsuya Kitamura¹ &
Masato Akagi²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4441))

1297 Accesses

Abstract

Perceptual cues for speaker individualities embedded in spectral envelopes of vowels and fundamental frequency (F0) contours of words were investigated through psychoacoustic experiments. First, the frequency bands having speaker individualities are estimated using stimuli created by systematically varying the spectral shape in specific frequency bands. The results suggest that speaker individualities of vowel spectral envelopes mainly exist in higher frequency regions including and above the peak around 20–23 ERB rate (1,740–2,489 Hz). Second, three experiments are performed to clarify the relationship physical characteristics of F0 contours extracted using Fujisaki and Hirose’s F0 model and the perception of speaker identity. The results indicate that some specific parameters related to the dynamics of F0 contours have many speaker individuality features. The results also show that although there are speaker individuality features in the time-averaged F0, they help to improve speaker identification less than the dynamics of the F0 contours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Formant-Frequency Variation and Its Effects on Across-Formant Grouping in Speech Perception

Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech

Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling

References

Takahashi, M., Yamamoto, G.: On the physical characteristics of Japanese vowels. Res. Electrotech. Lab. 326 (1931)
Google Scholar
Furui, S., Akagi, M.: Perception of voice individuality and physical correlates. Trans. Tech. Com. Psychol. Physiol. Acoust. H85-18 (1985)
Google Scholar
Zhu, W., Kasuya, H.: Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality. IEICE Trans. Fundamentals E81-A, 268–274 (1998)
Google Scholar
Imai, S.: Log magnitude approximation (LMA) filter. IEICE Trans. Fundamentals J63-A, 886–893 (1980)
Google Scholar
Fujisaki, H., Hirose, K.: Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust Soc. Jpn (E) 5, 233–242 (1984)
Google Scholar
Takeda, K., Sagisaka, Y., Katagiri, S., Abe, M., Kuwabara, H.: Speech database user’s manual. ATR Tech. Rep. TR-I-0028 (1988)
Google Scholar
Glasberg, B.R., Moore, B.C.J.: Derivation of auditory filter shapes from notched-noise data. Hear. Res. 47, 103–138 (1990)
Article Google Scholar
Li, K.-P., Hughes, G.W.: Talker differences as they appear in correlation matrices of continuous speech spectra. J. Acoust. Soc. Am. 55, 833–873 (1974)
Article Google Scholar
Mokhtari, P., Clermont, F.: Contributions of selected spectral regions to vowel classification accuracy. In: Proc. ICSLP 1994, pp 1923–1926 (1994)
Google Scholar
Kitamura, T., Akagi, M.: Speaker individualities in speech spectral envelopes. J. Acoust. Soc. Jpn (E) 16, 283–289 (1995)
Google Scholar
Kitamura, T., Akagi, M.: Speaker individualities of vowels in continuous speech. Trans. Tech. Com. Psycho. Physio. Acoust. H-96-98 (1996)
Google Scholar
Kitamura, T., Akagi, M.: Frequency bands suited to speaker identification by simple similarity method. In: Proc. Autumn Meet. Acoust. Soc. Jpn, pp. 237–238 (1996)
Google Scholar
Iijima, T.: Theory of Pattern Recognition. Morikita, Tokyo (1989)
Google Scholar
Kitamura, Y., Iwaki, M., Iijima, T.: Pluralizing method of similarity for speaker-independent vowel recognition. IEICE Tech. Rep. Sp. 95, 47–54 (1996)
Google Scholar
Hayakawa, S., Itakura, F.: Text-dependent speaker recognition using the information in the higher frequency band. In: Proc. ICSLP 1994, pp. 137–140 (1994)
Google Scholar
Lin, Q., Jan, E.-E., Che, C.-W., Yuk, D.-S., Flanagan, J.: Selective use of the speech spectrum and a VQGMM method for speaker identification. In: Proc. ICSLP 1996, pp 2415–2418 (1996)
Google Scholar
Sivakumaran, P., Ariyaeeinia, A.M., Loomes, M.J.: Sub-band based text-dependent speaker verification. Speech Commun. 41, 485–509 (2003)
Article Google Scholar
Kitamura, T., Honda, K., Takemoto, H.: Individual variation of the hypopharyngeal cavities and its acoustic effects. Acoust. Sci. & Tech. 26, 16–26 (2005)
Article Google Scholar
Akagi, M., Ienaga, T.: Speaker individuality in fundamental frequency contours and its control. J. Acoust. Soc. Jpn (E) 18, 73–80 (1997)
Google Scholar
Fujisaki, H., Ohno, S., Nakamura, K., Guirao, M., Gurlekian, J.: Analysis of accent and intonation in Spanish based on a quantitative model. In: Proc. ICSLP 1994, pp. 355–358 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

ATR Cognitive Information Science Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
Tatsuya Kitamura
School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi-shi, Ishikawa 923-1292, Japan
Masato Akagi

Authors

Tatsuya Kitamura
View author publications
You can also search for this author in PubMed Google Scholar
Masato Akagi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kitamura, T., Akagi, M. (2007). Speaker Individualities in Speech Spectral Envelopes and Fundamental Frequency Contours. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-74122-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74121-3
Online ISBN: 978-3-540-74122-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Speaker Individualities in Speech Spectral Envelopes and Fundamental Frequency Contours

Abstract

Access this chapter

Preview

Similar content being viewed by others

Formant-Frequency Variation and Its Effects on Across-Formant Grouping in Speech Perception

Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech

Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Speaker Individualities in Speech Spectral Envelopes and Fundamental Frequency Contours

Abstract

Access this chapter

Preview

Similar content being viewed by others

Formant-Frequency Variation and Its Effects on Across-Formant Grouping in Speech Perception

Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech

Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation