Abstract
Perceptual cues for speaker individualities embedded in spectral envelopes of vowels and fundamental frequency (F0) contours of words were investigated through psychoacoustic experiments. First, the frequency bands having speaker individualities are estimated using stimuli created by systematically varying the spectral shape in specific frequency bands. The results suggest that speaker individualities of vowel spectral envelopes mainly exist in higher frequency regions including and above the peak around 20–23 ERB rate (1,740–2,489 Hz). Second, three experiments are performed to clarify the relationship physical characteristics of F0 contours extracted using Fujisaki and Hirose’s F0 model and the perception of speaker identity. The results indicate that some specific parameters related to the dynamics of F0 contours have many speaker individuality features. The results also show that although there are speaker individuality features in the time-averaged F0, they help to improve speaker identification less than the dynamics of the F0 contours.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Takahashi, M., Yamamoto, G.: On the physical characteristics of Japanese vowels. Res. Electrotech. Lab. 326 (1931)
Furui, S., Akagi, M.: Perception of voice individuality and physical correlates. Trans. Tech. Com. Psychol. Physiol. Acoust. H85-18 (1985)
Zhu, W., Kasuya, H.: Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality. IEICE Trans. Fundamentals E81-A, 268–274 (1998)
Imai, S.: Log magnitude approximation (LMA) filter. IEICE Trans. Fundamentals J63-A, 886–893 (1980)
Fujisaki, H., Hirose, K.: Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust Soc. Jpn (E) 5, 233–242 (1984)
Takeda, K., Sagisaka, Y., Katagiri, S., Abe, M., Kuwabara, H.: Speech database user’s manual. ATR Tech. Rep. TR-I-0028 (1988)
Glasberg, B.R., Moore, B.C.J.: Derivation of auditory filter shapes from notched-noise data. Hear. Res. 47, 103–138 (1990)
Li, K.-P., Hughes, G.W.: Talker differences as they appear in correlation matrices of continuous speech spectra. J. Acoust. Soc. Am. 55, 833–873 (1974)
Mokhtari, P., Clermont, F.: Contributions of selected spectral regions to vowel classification accuracy. In: Proc. ICSLP 1994, pp 1923–1926 (1994)
Kitamura, T., Akagi, M.: Speaker individualities in speech spectral envelopes. J. Acoust. Soc. Jpn (E) 16, 283–289 (1995)
Kitamura, T., Akagi, M.: Speaker individualities of vowels in continuous speech. Trans. Tech. Com. Psycho. Physio. Acoust. H-96-98 (1996)
Kitamura, T., Akagi, M.: Frequency bands suited to speaker identification by simple similarity method. In: Proc. Autumn Meet. Acoust. Soc. Jpn, pp. 237–238 (1996)
Iijima, T.: Theory of Pattern Recognition. Morikita, Tokyo (1989)
Kitamura, Y., Iwaki, M., Iijima, T.: Pluralizing method of similarity for speaker-independent vowel recognition. IEICE Tech. Rep. Sp. 95, 47–54 (1996)
Hayakawa, S., Itakura, F.: Text-dependent speaker recognition using the information in the higher frequency band. In: Proc. ICSLP 1994, pp. 137–140 (1994)
Lin, Q., Jan, E.-E., Che, C.-W., Yuk, D.-S., Flanagan, J.: Selective use of the speech spectrum and a VQGMM method for speaker identification. In: Proc. ICSLP 1996, pp 2415–2418 (1996)
Sivakumaran, P., Ariyaeeinia, A.M., Loomes, M.J.: Sub-band based text-dependent speaker verification. Speech Commun. 41, 485–509 (2003)
Kitamura, T., Honda, K., Takemoto, H.: Individual variation of the hypopharyngeal cavities and its acoustic effects. Acoust. Sci. & Tech. 26, 16–26 (2005)
Akagi, M., Ienaga, T.: Speaker individuality in fundamental frequency contours and its control. J. Acoust. Soc. Jpn (E) 18, 73–80 (1997)
Fujisaki, H., Ohno, S., Nakamura, K., Guirao, M., Gurlekian, J.: Analysis of accent and intonation in Spanish based on a quantitative model. In: Proc. ICSLP 1994, pp. 355–358 (1994)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kitamura, T., Akagi, M. (2007). Speaker Individualities in Speech Spectral Envelopes and Fundamental Frequency Contours. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-74122-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74121-3
Online ISBN: 978-3-540-74122-0
eBook Packages: Computer ScienceComputer Science (R0)