Sudarsana Kadiri

Motivation The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides... more

Motivation The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation se...

Publisher: Oxford University Press (OUP)

Publication Date: 2018

Publication Name: Bioinformatics

Research Interests:
Bioinformatics, Computer Science, Medicine, Biological Sciences, Mathematical Sciences, and 2 moreCluster Analysis and Convolutional Neural Network

Download (.pdf)

Publication Date: Sep 18, 2022

Publication Name: Interspeech 2022

Research Interests:
Computer Science, Speech Recognition, Accelerometer, and Convolutional Neural Network

Download (.pdf)

Publisher: IEEE

Publication Name: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Download (.pdf)

Publisher: IEEE

Publication Name: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Download (.pdf)

Publisher: Elsevier BV

Publication Name: Computer Speech & Language

Research Interests:
Cognitive Science, Computer Science, Artificial Intelligence, Deep Learning, and Formant

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: IEEE Access

Research Interests:
Computer Science, Speech Recognition, Phonation, Voice, Singing, and Breathy Voice

Download (.pdf)

Publisher: Elsevier BV

Publication Name: Journal of Voice

Research Interests:
Computer Science, Artificial Intelligence, Speech Recognition, Voice, Feature Extraction, and Clinical Sciences

Publisher: ISCA

Publication Name: Interspeech 2022

Research Interests:
Computer Science, Artificial Intelligence, Deep Learning, and Feature Learning

Publisher: Elsevier BV

Publication Name: Computer Speech & Language

Research Interests:
Cognitive Science, Computer Science, and Speech Recognition

The goal of this study is to investigate advanced signal processing approaches [single frequency filtering (SFF) and zero-time windowing (ZTW)] with modern deep neural networks (DNNs) [convolution neural networks (CNNs), temporal... more

The goal of this study is to investigate advanced signal processing approaches [single frequency filtering (SFF) and zero-time windowing (ZTW)] with modern deep neural networks (DNNs) [convolution neural networks (CNNs), temporal convolution neural networks (TCN), time-delay neural network (TDNN), and emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN)] for dialect classification of major dialects of English. Previous studies indicated that SFF and ZTW methods provide higher spectro-temporal resolution. To capture the intrinsic variations in articulations among dialects, four feature representations [spectrogram (SPEC), cepstral coefficients, mel filter-bank energies, and mel-frequency cepstral coefficients (MFCCs)] are derived from SFF and ZTW methods. Experiments with and without data augmentation using CNN classifiers revealed that the proposed features performed better than baseline short-time Fourier transform (STFT)-based features on the UT-Podcast d...

Publisher: The Journal of the Acoustical Society of America

Publication Date: 2022

Publication Name: The Journal of the Acoustical Society of America

Research Interests:
Computer Science, Artificial Intelligence, Speech Recognition, Medicine, Spectrogram, and 3 moreCepstrum, Artificial Neural Network, and Convolutional Neural Network

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: IEEE Open Journal of Signal Processing

Research Interests:
Computer Science, Artificial Intelligence, Speech Recognition, Binary Classification, Support vector machine, and Voice disorder

Download (.pdf)

Publisher: Cornell University

Publication Date: Oct 28, 2022

Publication Name: arXiv (Cornell University)

Research Interests:
Computer Science, Artificial Intelligence, Feature Selection, Artificial Neural Network, and software portability

Download (.pdf)

In low resource children automatic speech recognition (ASR) the performance is degraded due to limited acoustic and speaker variability available in small datasets. In this paper, we propose a spectral warping based data augmentation... more

In low resource children automatic speech recognition (ASR) the performance is degraded due to limited acoustic and speaker variability available in small datasets. In this paper, we propose a spectral warping based data augmentation method to capture more acoustic and speaker variability. This is carried out by warping the linear prediction (LP) spectra computed from speech data. The warped LP spectra computed in a frame-based manner are used with the corresponding LP residuals to synthesize speech to capture more variability. The proposed augmentation method is shown to improve the ASR system performance over the baseline system. We have compared the proposed method with four well-known data augmentation methods: pitch scaling, speaking rate, SpecAug and vocal tract length perturbation (VTLP), and found that the proposed method performs the best. Further, we have combined the proposed method with these existing data augmentation methods to improve the ASR system performance even m...

Publisher: Springer Science and Business Media LLC

Publication Name: Journal of Signal Processing Systems

Research Interests:
Computer Science, Speech Recognition, Dynamic Time Warping, Vocal Tract, Image Warping, and 2 moreElectrical And Electronic Engineering and Word error rate

Download (.pdf)

Publication Date: Oct 10, 2022

Publication Name: Proceedings of the 30th ACM International Conference on Multimedia

Research Interests:
Computer Science, Stuttering, and Paralanguage

Understanding of the perception of emotions or affective states in humans is important to develop emotion-aware systems that work in realistic scenarios. In this paper, the perception of emotions in naturalistic human interaction... more

Understanding of the perception of emotions or affective states in humans is important to develop emotion-aware systems that work in realistic scenarios. In this paper, the perception of emotions in naturalistic human interaction (audio–visual data) is studied using perceptual evaluation. For this purpose, a naturalistic audio–visual emotion database collected from TV broadcasts such as soap-operas and movies, called the IIIT-H Audio–Visual Emotion (IIIT-H AVE) database, is used. The database consists of audio-alone, video-alone, and audio–visual data in English. Using data of all three modes, perceptual tests are conducted for four basic emotions (angry, happy, neutral, and sad) based on category labeling and for two dimensions, namely arousal (active or passive) and valence (positive or negative), based on dimensional labeling. The results indicated that the participants’ perception of emotions was remarkably different between the audio-alone, video-alone, and audio–video data. Th...

Publisher: MDPI AG

Publication Name: Sensors

Research Interests:
Analytical Chemistry, Sensors, and Electrical And Electronic Engineering

Download (.pdf)

End-to-end neural network models (E2E) have shown significant performance benefits on different INTERSPEECH ComParE tasks. Prior work has applied either a single instance of an E2E model for a task or the same E2E architecture for... more

End-to-end neural network models (E2E) have shown significant performance benefits on different INTERSPEECH ComParE tasks. Prior work has applied either a single instance of an E2E model for a task or the same E2E architecture for different tasks. However, applying a single model is unstable or using the same architecture under-utilizes task-specific information. On ComParE 2020 tasks, we investigate applying an ensemble of E2E models for robust performance and developing task-specific modifications for each task. ComParE 2020 introduces three sub-challenges: the breathing sub-challenge to predict the output of a respiratory belt worn by a patient while speaking, the elderly sub-challenge to estimate the elderly speaker's arousal and valence levels and the mask sub-challenge to classify if the speaker is wearing a mask or not. On each of these tasks, an ensemble outperforms the single E2E model. On the breathing sub-challenge, we study the impact of multi-loss strategies on task...

Publisher: ArXiv

Publication Date: 2020

Research Interests:
Computer Science

Download (.pdf)

Publisher: Elsevier BV

Publication Name: Applied Acoustics

Research Interests:
Mechanical Engineering, Architecture, Classical Physics, and Applied Acoustics

Publisher: Elsevier BV

Publication Name: Speech Communication

Research Interests:
Cognitive Science, Linguistics, and Speech Communication

Publisher: ISCA

Publication Name: Interspeech 2018

Research Interests:
Computer Science

Download (.pdf)

During production of emotional speech there are deviations in the components of speech production mechanism when compared to normal speech. The objective of this study is to capture the deviations in features related to the excitation... more

During production of emotional speech there are deviations in the components of speech production mechanism when compared to normal speech. The objective of this study is to capture the deviations in features related to the excitation source component of speech, and to develop a system for automatic recognition of emotions based on these deviations. The emotions considered for this study are: anger, happy, neutral and sad. The study shows that there are useful features in the deviations of the excitation source features at subsegmental level, and they can be exploited to develop an emotion recognition system. A hierarchical binary decision tree approach is used for classification.

Publisher: ISCA

Publication Name: Interspeech 2015

Research Interests:
Computer Science

In this paper, we present a multispeaker localization method using the time delay estimates obtained from the spectral features derived from the single frequency filter (SFF) representation. The mixture signals are transformed into SFF... more

In this paper, we present a multispeaker localization method using the time delay estimates obtained from the spectral features derived from the single frequency filter (SFF) representation. The mixture signals are transformed into SFF domain from which the temporal envelopes are extracted at each frequency. Subsequently, the spectral features such as mean and variance of temporal envelopes across frequencies are correlated for extracting the time delay estimates. Since these features emphasize the high SNR regions of the mixtures, correlation of the corresponding features across the channels leads to robust delay estimates in real acoustic environments. We study the efficacy of the developed approach by comparing its performance with the existing correlation based time delay estimation techniques. Both, a standard data set recorded in real-room acoustic environments and simulated data set are used for evaluations. It is observed that the localization performance of the proposed alg...

Publisher: 2020 National Conference on Communications (NCC)

Publication Date: 2020

Publication Name: 2020 National Conference on Communications (NCC)

Research Interests:
Computer Science

Download (.pdf)

In this paper, we address the issue of speaker-specific emotion detection (neutral vs emotion) from speech signals with models for neutral speech as reference. As emotional speech is produced by the human speech production mechanism, the... more

In this paper, we address the issue of speaker-specific emotion detection (neutral vs emotion) from speech signals with models for neutral speech as reference. As emotional speech is produced by the human speech production mechanism, the emotion information is expected to lie in the features of both excitation source and the vocal tract system. Linear Prediction residual is used as the excitation source component and Linear Prediction Coefficients as the vocal tract system component. A pitch synchronous analysis is performed. Separate Autoassociative Neural Network models are developed to capture the information specific to neutral speech, from the excitation and the vocal tract system components. Experimental results show that the excitation source carries more information than the vocal tract system. The accuracy neutral vs emotion classification using excitation source information is 91%, which is 8% higher than the accuracy obtained using vocal tract system information. The Berl...

Publisher: ICON

Publication Date: 2014

Research Interests:
Computer Science and Emotion Recognition

Download (.pdf)

In this paper, we address the issue of speech polarity detection using strength of impulse-like excitation around epoch. The correct detection of speech polarity is a crucial step for many speech processing algorithms to extract suitable... more

In this paper, we address the issue of speech polarity detection using strength of impulse-like excitation around epoch. The correct detection of speech polarity is a crucial step for many speech processing algorithms to extract suitable information. Occurrence of errors in the detection of speech polarity could have an impact on the performance of speech systems. Automatic detection of speech polarity has become an important preliminary step for many speech processing algorithms. We propose a method based on the knowledge of impulse-like excitation of speech production mechanism. The impulse-like excitation is reflected across all frequencies including the zero frequency (0 Hz). Using the slope around zero crossings of the zero frequency filtered signal, an automatic speech polarity detection method is proposed. Performance of the proposed method is demonstrated on 8 different speech corpora. The proposed method is compared with the three existing techniques such as gradient of the spurious glottal waveforms (GSGW), oscillating moments-based polarity detection (OMPD) and residual excitation skewness (RESKEW). From the experimental results, it is observed that the performance of the proposed method is comparable or better than the existing methods for the experiments considered.

Publisher: IEEE

Publication Name: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

During production of emotional speech there are deviations in the components of speech production mechanism when compared to normal speech. The objective of this study is to capture the deviations in features related to the excitation... more

During production of emotional speech there are deviations in the components of speech production mechanism when compared to normal speech. The objective of this study is to capture the deviations in features related to the excitation source component of speech, and to develop a system for automatic recognition of emotions based on these deviations. The emotions considered for this study are: anger, happy, neutral and sad. The study shows that there are useful features in the deviations of the excitation source features at subsegmental level, and they can be exploited to develop an emotion recognition system. A hierarchical binary decision tree approach is used for classification.

Publisher: INTERSPEECH

Publication Date: 2015

Research Interests:
Computer Science

The progress in the areas of research like emotion recognition, identification, synthesis, etc., relies heavily on the development and structure of the database. This paper addresses some of the key issues in development of the emotion... more

The progress in the areas of research like emotion recognition, identification, synthesis, etc., relies heavily on the development and structure of the database. This paper addresses some of the key issues in development of the emotion databases. A new audio-visual emotion (AVE) database is developed. The database consists of audio, video and audio-visual clips sourced from TV broadcast like movies and soapoperas in English language. The data clips are manually segregated in an emotion and speaker specific way. This database is developed to address the emotion recognition in actual human interaction. The database is structured in such a way that it might be useful in a variety of applications like emotion analysis based on speaker or gender, emotion identification in multiple emotive dialogue scenarios etc.

Publisher: ICON

Publication Date: 2014

Research Interests:
Computer Science and Audio Visual Emotion Recognition

Download (.pdf)

In this paper, we propose spectral modification by sharpening formants and by reducing the spectral tilt to recognize children’s speech by automatic speech recognition (ASR) systems developed using adult speech. In this type of mismatched... more

In this paper, we propose spectral modification by sharpening formants and by reducing the spectral tilt to recognize children’s speech by automatic speech recognition (ASR) systems developed using adult speech. In this type of mismatched condition, the ASR performance is degraded due to the acoustic and linguistic mismatch in the attributes between children and adult speakers. The proposed method is used to improve the speech intelligibility to enhance the children’s speech recognition using an acoustic model trained on adult speech. In the experiments, WSJCAM0 and PFSTAR are used as databases for adults’ and children’s speech, respectively. The proposed technique gives a significant improvement in the context of the DNN-HMM-based ASR. Furthermore, we validate the robustness of the technique by showing that it performs well also in mismatched noise conditions.

Publisher: NODALIDA

Publication Date: 2021

Research Interests:
Computer Science

Download (.pdf)

Glottal source characteristics vary between phonation types due to the tension of laryngeal muscles with the respiratory effort. Previous studies in the classification of phonation type have mainly used speech signals recorded by... more

Glottal source characteristics vary between phonation types due to the tension of laryngeal muscles with the respiratory effort. Previous studies in the classification of phonation type have mainly used speech signals recorded by microphone. Recently, two studies were published in the classification of phonation type using neck surface accelerometer (NSA) signals. However, there are no previous studies comparing the use of the acoustic speech signal vs. the NSA signal as input in classifying phonation type. Therefore, the current study investigates simultaneously recorded speech and NSA signals in the classification of three phonation types (breathy, modal, pressed). The general goal is to understand which of the two signals (speech vs. NSA) is more effective in the classification task. We hypothesize that by using the same feature set for both signals, classification accuracy is higher for the NSA signal, which is more closely related to the physical vibration of the vocal folds an...

Publisher: Comput. Speech Lang.

Publication Date: 2021

Through speech production mechanism, speech with different voice qualities such as phonations, emotions, expressive singing and other paralinguistic sounds are also produced. Most of these sounds demonstrate these features mostly due to... more

Through speech production mechanism, speech with different voice qualities such as phonations, emotions, expressive singing and other paralinguistic sounds are also produced. Most of these sounds demonstrate these features mostly due to the excitation component (vibration of the vocal folds at the glottis) whereas the dynamic vocal tract system primarily conveys the message. Hence, the excitation source processing acquires significance especially for the analysis, detection and representation of expressive voices. Most of the existing excitation source information extraction methods are not reliable especially when applied on expressive voices, mainly due to significant source-system coupling. Hence, there is a need for new signal processing methods that can capture the dynamic variations in excitation source so that different types of sounds can be better analyzed and represented. The objective of this work is to derive new signal processing methods to extract the excitation source...

Publication Date: 2017

Most of the applications in speech use mel-frequency spectral coefficients (MFSC) as features as they match the human perceptual mechanism, where the emphasis is given to vocal tract characteristics. But in accent classification,... more

Most of the applications in speech use mel-frequency spectral coefficients (MFSC) as features as they match the human perceptual mechanism, where the emphasis is given to vocal tract characteristics. But in accent classification, mel-scale distribution of filters may not always be the best representations, e.g., pitch accented languages where the emphasis should be on vocal source information too. Motivated by this, we use end-to-end classification of accents directly from waveforms which will reduce the effort of designing features specific to each corpus. The convolution neural network (CNN) model architecture is designed in such a way that the initial layers exhibit similar operation as in MFSC by initializing the weights using time approximate of MFSC. The entire network along with initial layers is trained to learn accent classification. We observed that learning directly from waveform improved the performance of accent classification when compared to CNN trained on hand-engine...

Publisher: 2020 International Joint Conference on Neural Networks (IJCNN)

Publication Date: 2020

Download (.pdf)

In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage... more

In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10–50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may be, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the $L_1$ optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100–200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrack

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Research Interests:
Computer Science

Publisher: ISCA

Publication Date: Sep 2, 2018

Publication Name: Interspeech 2018

Research Interests:
Computer Science

Download (.pdf)

Studies on the emotion recognition task indicate that there is confusion in discrimination among higher activation states like ‘anger’ and ‘happy’. In this study, features related to excitation source of speech are examined for... more

Studies on the emotion recognition task indicate that there is confusion in discrimination among higher activation states like ‘anger’ and ‘happy’. In this study, features related to excitation source of speech are examined for discriminating ‘anger’ and ‘happy’ emotions. The objective is to explore the features which are independent of lexical content, language, channel and speaker. The features like strength of excitation from zero frequency filtering method and spectral band magnitude energies from short-time spectral analysis are used. Experimental results show that these features can discriminate ‘anger’ and ‘happy’ emotion states to a good extent. Index Terms: Emotion recognition, zero frequency filtering method, KL distance measure.

Publisher: INTERSPEECH

Publication Date: 2014

Research Interests:
Computer Science and Emotion Recognition

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: Proceedings of the IEEE

Research Interests:
Biomedical Engineering and Electrical And Electronic Engineering

Download (.pdf)

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: IEEE Access

Download (.pdf)

Publisher: IEEE

Publication Name: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publisher: IEEE

Publication Name: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publisher: IEEE

Publication Name: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Download (.pdf)

Current ASR systems show poor performance in recognition of children’s speech in noisy environments because recognizers are typically trained with clean adults’ speech and therefore there are two mismatches between training and testing... more

Current ASR systems show poor performance in recognition of children’s speech in noisy environments because recognizers are typically trained with clean adults’ speech and therefore there are two mismatches between training and testing phases (i.e., clean speech in training vs. noisy speech in testing and adult speech in training vs. child speech in testing). This article studies methods to tackle the effects of these two mismatches in recognition of noisy children’s speech by investigating two techniques: data augmentation and time-scale modification. In the former, clean training data of adult speakers are corrupted with additive noise in order to obtain training data that better correspond to the noisy testing conditions. In the latter, the fundamental frequency (F0) and speaking rate of children’s speech are modified in the testing phase in order to reduce differences in the prosodic characteristics between the testing data of child speakers and the training data of adult speake...

Publisher: MDPI AG

Publication Name: Applied Sciences

Research Interests:
Applied Sciences

Download (.pdf)

Publisher: Springer International Publishing

Publication Name: Timbre: Acoustics, Perception, and Cognition

Emotional speech is produced when a speaker is in a state differ- ent from normal state. The objective of this study is to explore the deviations in the excitation source features of an emotional speech compared to normal speech. The... more

Emotional speech is produced when a speaker is in a state differ- ent from normal state. The objective of this study is to explore the deviations in the excitation source features of an emotional speech compared to normal speech. The features used for anal- ysis are extracted at subsegmental level (1-3 ms) of speech. A comparative study of these features across different emotions indicates that there are significant deviations in the subsegmen- tal level features of speech in emotional state when compared to normal state

Publisher: Elsevier BV

Publication Date: Jul 1, 2023

Publication Name: Computer Speech & Language

Research Interests: Cognitive Science, Computer Science, Artificial Intelligence, Binary Classification, and Support vector machine<div>()</div>

Publisher: Elsevier BV

Publication Date: Oct 1, 2023

Publication Name: Computer Speech & Language

Research Interests: Cognitive Science, Computer Science, Artificial Intelligence, Speech Recognition, Spectrogram, and Support vector machine<div>()</div>

Publication Date: Aug 20, 2023

Research Interests: Computer Science and Speech Recognition<div>()</div>

Publication Date: Aug 20, 2023

Research Interests: Computer Science and Speech Recognition<div>()</div>

Publication Date: Jun 4, 2023

Research Interests: Computer Science, Speech Recognition, Phonation, Speech Processing, and Spectrogram<div>()</div>

Publisher: Elsevier BV

Publication Name: Computer Speech & Language

Publisher: Oxford University Press (OUP)

Publication Date: 2018

Publication Name: Bioinformatics

Publication Date: Sep 18, 2022

Publication Name: Interspeech 2022

Research Interests: Computer Science, Speech Recognition, Accelerometer, and Convolutional Neural Network<div>()</div>

Publisher: IEEE

Publication Name: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publisher: IEEE

Publication Name: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publisher: Elsevier BV

Publication Name: Computer Speech & Language

Research Interests: Cognitive Science, Computer Science, Artificial Intelligence, Deep Learning, and Formant<div>()</div>

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: IEEE Access

Research Interests: Computer Science, Speech Recognition, Phonation, Voice, Singing, and Breathy Voice<div>()</div>

Publisher: Elsevier BV

Publication Name: Journal of Voice

Research Interests: Computer Science, Artificial Intelligence, Speech Recognition, Voice, Feature Extraction, and Clinical Sciences<div>()</div>

Publisher: ISCA

Publication Name: Interspeech 2022

Research Interests: Computer Science, Artificial Intelligence, Deep Learning, and Feature Learning<div>()</div>

Publisher: Elsevier BV

Publication Name: Computer Speech & Language

Research Interests: Cognitive Science, Computer Science, and Speech Recognition<div>()</div>

Publisher: The Journal of the Acoustical Society of America

Publication Date: 2022

Publication Name: The Journal of the Acoustical Society of America

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: IEEE Open Journal of Signal Processing

Research Interests: Computer Science, Artificial Intelligence, Speech Recognition, Binary Classification, Support vector machine, and Voice disorder<div>()</div>

Publisher: Cornell University

Publication Date: Oct 28, 2022

Publication Name: arXiv (Cornell University)

Research Interests: Computer Science, Artificial Intelligence, Feature Selection, Artificial Neural Network, and software portability<div>()</div>

Publisher: Springer Science and Business Media LLC

Publication Name: Journal of Signal Processing Systems

Publication Date: Oct 10, 2022

Publication Name: Proceedings of the 30th ACM International Conference on Multimedia

Research Interests: Computer Science, Stuttering, and Paralanguage<div>()</div>

Publisher: MDPI AG

Publication Name: Sensors

Research Interests: Analytical Chemistry, Sensors, and Electrical And Electronic Engineering<div>()</div>

Publisher: ArXiv

Publication Date: 2020

Research Interests: Computer Science<div>()</div>

Publisher: Elsevier BV

Publication Name: Applied Acoustics

Research Interests: Mechanical Engineering, Architecture, Classical Physics, and Applied Acoustics<div>()</div>

Publisher: Elsevier BV

Publication Name: Speech Communication

Research Interests: Cognitive Science, Linguistics, and Speech Communication<div>()</div>

Publisher: ISCA

Publication Name: Interspeech 2018

Research Interests: Computer Science<div>()</div>

Publisher: ISCA

Publication Name: Interspeech 2015

Research Interests: Computer Science<div>()</div>

Publisher: 2020 National Conference on Communications (NCC)

Publication Date: 2020

Publication Name: 2020 National Conference on Communications (NCC)

Research Interests: Computer Science<div>()</div>

Publisher: ICON

Research Interests:
Cognitive Science, Computer Science, Artificial Intelligence, Binary Classification, and Support vector machine

Research Interests:
Cognitive Science, Computer Science, Artificial Intelligence, Speech Recognition, Spectrogram, and Support vector machine

Research Interests:
Computer Science and Speech Recognition

Research Interests:
Computer Science and Speech Recognition

Research Interests:
Computer Science, Speech Recognition, Phonation, Speech Processing, and Spectrogram

Research Interests:
Computer Science, Speech Recognition, Accelerometer, and Convolutional Neural Network

Research Interests:
Cognitive Science, Computer Science, Artificial Intelligence, Deep Learning, and Formant

Research Interests:
Computer Science, Speech Recognition, Phonation, Voice, Singing, and Breathy Voice

Research Interests:
Computer Science, Artificial Intelligence, Speech Recognition, Voice, Feature Extraction, and Clinical Sciences

Research Interests:
Computer Science, Artificial Intelligence, Deep Learning, and Feature Learning

Research Interests:
Cognitive Science, Computer Science, and Speech Recognition

Research Interests:
Computer Science, Artificial Intelligence, Speech Recognition, Binary Classification, Support vector machine, and Voice disorder

Research Interests:
Computer Science, Artificial Intelligence, Feature Selection, Artificial Neural Network, and software portability

Research Interests:
Computer Science, Stuttering, and Paralanguage

Research Interests:
Analytical Chemistry, Sensors, and Electrical And Electronic Engineering

Research Interests:
Computer Science

Research Interests:
Mechanical Engineering, Architecture, Classical Physics, and Applied Acoustics

Research Interests:
Cognitive Science, Linguistics, and Speech Communication

Research Interests:
Computer Science

Research Interests:
Computer Science

Research Interests:
Computer Science

Research Interests:
Computer Science and Emotion Recognition

Research Interests:
Computer Science

Research Interests:
Computer Science and Audio Visual Emotion Recognition

Research Interests:
Computer Science

Research Interests:
Computer Science

Research Interests:
Computer Science

Research Interests:
Computer Science and Emotion Recognition

Research Interests:
Biomedical Engineering and Electrical And Electronic Engineering

Research Interests:
Applied Sciences

Research Interests:
Voice Emotion Recognition

Research Interests:
Electrical And Electronic Engineering