Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 159, Issue CApr 2024Current Issue
Bibliometrics
Skip Table Of Content Section
research-article
Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering
Abstract

Weighted linear prediction (WLP) has demonstrated its significance in voice inverse filtering, contributing to enhanced methods for estimating both the vocal tract filter and the glottal source. WLP provides a mechanism to mitigate the effect on ...

Highlights

  • Study of linear prediction (LP) with Gaussian weighting for voice inverse filtering.
  • Gaussian attenuation minimizes the adverse effects in LP due to glottal excitations.
  • Pitch synchronous symmetric/asymmetric Gaussian windows are ...

research-article
Speech-driven head motion generation from waveforms
Abstract

Head motion generation task for speech-driven virtual agent animation is commonly explored with handcrafted audio features, such as MFCCs as input features, plus additional features, such as energy and F0 in the literature. In this paper, we ...

Highlights

  • Handcrafted audio features correlate weakly to head motion in virtual agent animation.
  • Direct use of waveform to create high correlated features for head motion estimation.
  • The proposed feature outperforms MFCC for objective and ...

research-article
The effect of musical expertise on whistled vowel identification
Highlights

  • Musicians and non-musicians process the whistled speech signal differently.
  • Musical expertise affects whistled vowel perception with advantages for lower vowels.
  • Inter-whistler variation affects both musicians and non-musicians.

Abstract

In this paper, we looked at the impact of musical experience on whistled vowel categorization by native French speakers. Whistled speech, a natural, yet modified speech type, augments speech amplitude while transposing the signal to a range of ...

research-article
A multimodal model for predicting feedback position and type during conversation
Highlights

  • Study of conversational feedback spontaneous human-human conversations.
  • A new fine-grained classification of multimodal generic and specific feedback.
  • Automatic classification based on prosodic, morpho-syntactic and gestural ...

Abstract

This study investigates conversational feedback, that is, a listener's reaction in response to a speaker, a phenomenon which occurs in all natural interactions. Feedback depends on the main speaker's productions and in return supports the ...

research-article
An ensemble technique to predict Parkinson's disease using machine learning algorithms
Highlights

  • Parkinson's Disease (PD) prediction is based on optimised machine learning algorithms and an ensemble feature selection algorithm.
  • Three datasets containing voice samples were utilised to analyse performance metrics comprehensively.

Abstract

Parkinson's Disease (PD) is a progressive neurodegenerative disorder affecting motor and non-motor symptoms. Its symptoms develop slowly, making early identification difficult. Machine learning has a significant potential to predict Parkinson's ...

research-article
Speech intelligibility prediction using generalized ESTOI with fine-tuned parameters
Abstract

In this article, a lightweight and interpretable speech intelligibility prediction network is proposed. It is based on the ESTOI metric with several extensions: learned modulation filterbank, temporal attention, and taking into account robustness ...

Highlights

  • Fine-tuning of ESTOI results with improved speech intelligibility prediction.
  • Simple extensions improve performance more at a small cost in complexity.
  • FT-GESTOI has low memory requirements and can be used to train speech ...

research-article
Automatic speaker and age identification of children from raw speech using sincNet over ERB scale
Abstract

This paper presents the newly developed non-native children’s English speech (NNCES) corpus to reveal the findings of automatic speaker and age recognition from raw speech. Convolutional neural networks (CNN), which have the ability to learn low-...

Highlights

  • The study proposes the use of the SincNet model to extract significant speech cues from children’s raw speech, and evaluates its effectiveness for automatic speaker and age identification tasks.
  • The article highlights the benefits of ...

Comments