Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Analysis and Recognition of NAM Speech Using HMM Distances and Visual Information

Published: 01 August 2010 Publication History

Abstract

Non-audible murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with the use of special acoustic sensors (i.e., NAM microphones) attached behind the talker's ear. The authors had previously reported experimental results for NAM recognition using a stethoscopic and a silicon NAM microphone. Using a small amount of training data from a single speaker and adaptation approaches, 93.9% of word accuracy was achieved for a 20 k Japanese vocabulary dictation task. In this paper, further analysis of NAM speech is made using distance measures between hidden Markov models (HMMs). It has been shown that owing to the reduced spectral space of NAM speech, the HMM distances are also reduced when compared with those of normal speech. In the case of Japanese vowels and fricatives, the distance measures in NAM speech follow the same relative inter-phoneme relationship as that in normal speech without significant differences. However, significant differences have been found in the case of Japanese plosives. More specifically, in NAM speech, the distances between voiced/unvoiced consonant pairs articulated in the same place drastically decreased. As a result, the inter-phoneme relationship as compared to normal-speech changed significantly, causing a substantial decrease in the recognition accuracy. A speaker-dependent phoneme recognition experiment has been conducted, obtained 81.5% NAM phoneme correct, showing a relationship between HMM distance measures and phoneme accuracy. In a NAM microphone, body transmission and loss of lip radiation act as a low-pass filter. As a result, higher frequency components are attenuated in a NAM signal. Because of spectral reduction, NAM's unvoiced nature, and the type of articulation, NAM sounds become similar, causing a larger number of confusions when compared with normal speech. Yet many of those sounds are visually different on face/mouth/lips, and the integration of visual information increases their discrim- - ination. As a result, recognition accuracy increases as well. In this article, the visual information extracted from the talkers' facial movements is fused with NAM speech. The experimental results reveal a relative improvement of 10.5% on average when fused NAM speech and facial information were used compared with using only NAM speech.
  1. Analysis and Recognition of NAM Speech Using HMM Distances and Visual Information

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Audio, Speech, and Language Processing
    IEEE Transactions on Audio, Speech, and Language Processing  Volume 18, Issue 6
    August 2010
    573 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 August 2010

    Author Tags

    1. Audio-visual NAM speech recognition
    2. hidden Markov model (HMM) distance measures
    3. non-audible murmur (NAM)
    4. speech recognition

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media