Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2708463.2709042acmotherconferencesArticle/Chapter ViewAbstractPublication PagesperminConference Proceedingsconference-collections
research-article

Combining Evidences from Bark Scale and Mel Scale Warped Features for VTLN

Published: 26 February 2015 Publication History

Abstract

Vocal tract length normalization (VTLN) is a process of reducing the effects of VTL differences among speakers. In this paper, frequency warping based on Bark scale is presented to obtain VTL normalized features and Bark scale based features (BSBF) have been proposed. BSBFs are found to improve the accuracy of vowel recognition task by 1.05 % over MFCCs. A feature-level fusion of BSBFs with MFCCs performs better than MFCCs alone under matched and mismatched conditions in phoneme recognition.

References

[1]
L. R. Rabiner, Fundamentals of Speech Recognition. PTR Prentice Hall, 1993.
[2]
H. Wakita, "Normalization of vowels by vocal tract length and its applications to vowel identification," IEEE Trans. Acoust., Speech and Signal Process., vol. 25, no. 2, pp. 183--192, April 1977.
[3]
A. Andreou, T. Kamm, and J. Cohen, "Experiments in vocal tract length normalization," Proc. the CAIP Workshop: Frontiers in Speech Recognition II, 1944.
[4]
L. Lee and R. Rose, "A frequency warping approach to speaker normalization," in IEEE Trans. on Speech and Audio Process., vol. 6, no.1, pp. 49--60, 1998.
[5]
H. Gish and E. Eide, "A parametric approach to vocal tract length normalization," Inter. Conf. on Acoust., Speech and Signal Proc. (ICASSP), vol. 1, pp. 346--349, 1996.
[6]
S. Umesh, L. Cohen, N. Marinovik, and D. Nelson, "Scale transform in speech analysis," IEEE Trans. on Speech and Audio Process., vol. 7, no. 1, pp. 40--45, January, 1999.
[7]
A. Mertins and J. Rademacher, "Vocal tract length invariant features for automatic speech recognition," in Auto. Speech Recog. and Understanding workshop (ASRU), pp. 308--312, 2005.
[8]
S. Umesh, D. R. Sanand, and G. Praveen, "Speaker-invariant features for automatic speech recognition," in Int. Joint Conf. on Artificial Intelligence (IJCAI), pp. 1738--1743, 2007.
[9]
E. Zwicker, "Subdivision of the audible frequency range into critical bands (Frequenzgruppen)," J. of the Acoust. Soc. of Amer., vol. 33, no. 2, pp. 248--248, 1961.
[10]
E. Zwicker and E. Terhardt, "Analytical expressions for critical-band rate and critical bandwidth as a function of frequency," J. of the Acoust. Soc. of Amer., vol. 68, no. 5, pp. 1523--1525, Nov, 1980.
[11]
J. Garofolo et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1, Philadelphia: Linguistic Data Consortium, 1993.
[12]
S. Umesh, L. Cohen, N. Marinovik and D. Nelson, "Frequency warping in speech," Proc. Int. Conf. Spoken Lang. Process., vol. 1, pp. 414--417, 1996.
[13]
J. O. Smith III and J. S. Abel, "Bark and ERB bilinear transforms," in Int. Conf. on Acoust., Speech and Signal Process (ICASSP), vol. 7, pp. 697--708, 1999.
[14]
H. A. Murty and B. Yegnanarayana, "Group delay functions and its applications in speech technology," Sadhana, vol. 36, no. Part 5, pp. 745--782, October 2011.
[15]
M. Trzos, "Frequency warping via warped linear prediction," in Telecomm. and Signal Process. (TSP), pp. 348--350, August, 2011.
[16]
L. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," in Proceedings of the IEEE, vol. 77, no.2, pp. 257--286, 1989.
[17]
S. B. Mermelstein and P. Davis, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," in IEEE Trans. Acoust., Speech and Signal Process., vol. 28, no. 4, pp. 357- 366, August 1980.
[18]
Cambridge University Engineering Department. The Hidden Markov Model Toolkit. {Available Online}. http://htk.eng.cam.ac.uk/ {Last accessed: 3rd June, 2014}.

Cited By

View all
  • (2020)Identification of High Risk and Low Risk Preterm Neonates in NICUBiomedical and Clinical Engineering for Healthcare Advancement10.4018/978-1-7998-0326-3.ch007(119-140)Online publication date: 2020

Index Terms

  1. Combining Evidences from Bark Scale and Mel Scale Warped Features for VTLN

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    PerMIn '15: Proceedings of the 2nd International Conference on Perception and Machine Intelligence
    February 2015
    269 pages
    ISBN:9781450320023
    DOI:10.1145/2708463
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Dept. of Science and Techn., Government of India: Department of Science and Technology, Government of India

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 February 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bark scale
    2. speech recognition
    3. vocal tract length normalization
    4. warped linear prediction

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    PerMIn '15

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Identification of High Risk and Low Risk Preterm Neonates in NICUBiomedical and Clinical Engineering for Healthcare Advancement10.4018/978-1-7998-0326-3.ch007(119-140)Online publication date: 2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media