The authors propose a novel technique for unwrapping the phase of the higher-order spectrum (HOS)... more The authors propose a novel technique for unwrapping the phase of the higher-order spectrum (HOS) of an image for the purpose of reconstructing the Fourier phase of the image. This technique solves the combined problem of phase unwrapping and reconstruction, in the least-squares sense. It uses all distinct HOS samples, and is based on alternating projections onto two constraint sets. The results obtained can easily be extended to one dimension and multiple dimensions.<<ETX>>
Visual Communications and Image Processing '98, 1998
ABSTRACT This paper introduces a novel method for subpixel accuracy stabilization of unsteady dig... more ABSTRACT This paper introduces a novel method for subpixel accuracy stabilization of unsteady digital films and video sequences. The proposed method offers a near-closed-form solution to the estimation of the global subpixel displacement between two frames, that causes the misregistration of them. The criterion function used is the mean-squared error over the displaced frames, in which image intensities at subpixel locations are evaluated using bi-linear interpolation. The proposed algorithm is both faster and more accurate than the search-based solutions found in the literature. Experimental results demonstrate the superiority of the proposed method to the spatio-temporal differentiation and surface fitting algorithms, as well. Furthermore, the proposed algorithm is designed so that it is insensitive to frame-to-frame intensity variations. It is also possible to estimate any affine motion between two frames by applying the proposed algorithm on three non-collinear points in the unsteady frame.
We present a 2-D mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh... more We present a 2-D mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh for each frame and a #nal mosaic image, for video objects with mildly deformable motion in the presence of self and#or object-to-object #external# occlusion. Unlike classical mosaic representations where successive frames are registered using global motion models, we map the uncovered regions in the successive frames onto the mosaic reference frame using local a#ne models, i.e., those of the neighboring mesh patches. The proposed method to compute this mosaic representation is tightly coupled with an occlusion adaptive 2-D mesh tracking procedure, which consist of propagating the object mesh frame to frame, and updating of both object and mosaic meshes to optimize texture mapping from the mosaic to each instance of the object. The proposed representation has been applied to video object rendering and editing, including self trans#guration, synthetic trans#guration, and 2-D augmented rea...
We propose the use of the line spectral frequency (LSF) features for emotion recognition from spe... more We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates.
We present a speech signal driven emotion recognition sys- tem. Our system is trained and tested ... more We present a speech signal driven emotion recognition sys- tem. Our system is trained and tested with the INTERSPEECH 2009 Emotion Challenge corpus, which includes spontaneous and emotionally rich recordings. The challenge includes clas- sifier and feature sub-challenges with five-class and two-class classification problems. We investigate prosody related, spec- tral and HMM-based features for the evaluation of emotion recognition with Gaussian mixture model (GMM) based clas- sifiers. Spectral features consist of mel-scale cepstral coeffi- cients (MFCC), line spectral frequency (LSF) features and their derivatives, whereas prosody-related features consist of mean normalized values of pitch, first derivative of pitch and inten- sity. Unsupervised training of HMM structures are employed to define prosody related temporal features for the emotion recog- nition problem. We also investigate data fusion of different fea- tures and decision fusion of different classifiers, which are not we...
In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) fe... more In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform signific...
Training datasets containing spontaneous emotional speech are often imperfect due the ambiguities... more Training datasets containing spontaneous emotional speech are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with Mel Frequency Cepstral Coefficients (MFCC) and Line Spectral Frequency (LSF) features indicate that utilization of RANSAC in the training phase provides an improvement in the unweighted recall rates on the test set. Experimental studies performed over the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF and MFCC based classifiers provide further significant performance improvements.
Proceedings of 1st International Conference on Image Processing, 1994
... Eastman Kodak Company Rochester, NY 14650-18 16 erdem@kodak .com and misezan@kodak.com ... 2 ... more ... Eastman Kodak Company Rochester, NY 14650-18 16 erdem@kodak .com and misezan@kodak.com ... 2 (top) shows, as a function of the num-ber of generations, the average (over the sequence) number (out of a maximum of 330) of macroblocks in P Pictures whose coding ...
The authors propose a novel technique for unwrapping the phase of the higher-order spectrum (HOS)... more The authors propose a novel technique for unwrapping the phase of the higher-order spectrum (HOS) of an image for the purpose of reconstructing the Fourier phase of the image. This technique solves the combined problem of phase unwrapping and reconstruction, in the least-squares sense. It uses all distinct HOS samples, and is based on alternating projections onto two constraint sets. The results obtained can easily be extended to one dimension and multiple dimensions.<<ETX>>
Visual Communications and Image Processing '98, 1998
ABSTRACT This paper introduces a novel method for subpixel accuracy stabilization of unsteady dig... more ABSTRACT This paper introduces a novel method for subpixel accuracy stabilization of unsteady digital films and video sequences. The proposed method offers a near-closed-form solution to the estimation of the global subpixel displacement between two frames, that causes the misregistration of them. The criterion function used is the mean-squared error over the displaced frames, in which image intensities at subpixel locations are evaluated using bi-linear interpolation. The proposed algorithm is both faster and more accurate than the search-based solutions found in the literature. Experimental results demonstrate the superiority of the proposed method to the spatio-temporal differentiation and surface fitting algorithms, as well. Furthermore, the proposed algorithm is designed so that it is insensitive to frame-to-frame intensity variations. It is also possible to estimate any affine motion between two frames by applying the proposed algorithm on three non-collinear points in the unsteady frame.
We present a 2-D mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh... more We present a 2-D mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh for each frame and a #nal mosaic image, for video objects with mildly deformable motion in the presence of self and#or object-to-object #external# occlusion. Unlike classical mosaic representations where successive frames are registered using global motion models, we map the uncovered regions in the successive frames onto the mosaic reference frame using local a#ne models, i.e., those of the neighboring mesh patches. The proposed method to compute this mosaic representation is tightly coupled with an occlusion adaptive 2-D mesh tracking procedure, which consist of propagating the object mesh frame to frame, and updating of both object and mosaic meshes to optimize texture mapping from the mosaic to each instance of the object. The proposed representation has been applied to video object rendering and editing, including self trans#guration, synthetic trans#guration, and 2-D augmented rea...
We propose the use of the line spectral frequency (LSF) features for emotion recognition from spe... more We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates.
We present a speech signal driven emotion recognition sys- tem. Our system is trained and tested ... more We present a speech signal driven emotion recognition sys- tem. Our system is trained and tested with the INTERSPEECH 2009 Emotion Challenge corpus, which includes spontaneous and emotionally rich recordings. The challenge includes clas- sifier and feature sub-challenges with five-class and two-class classification problems. We investigate prosody related, spec- tral and HMM-based features for the evaluation of emotion recognition with Gaussian mixture model (GMM) based clas- sifiers. Spectral features consist of mel-scale cepstral coeffi- cients (MFCC), line spectral frequency (LSF) features and their derivatives, whereas prosody-related features consist of mean normalized values of pitch, first derivative of pitch and inten- sity. Unsupervised training of HMM structures are employed to define prosody related temporal features for the emotion recog- nition problem. We also investigate data fusion of different fea- tures and decision fusion of different classifiers, which are not we...
In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) fe... more In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform signific...
Training datasets containing spontaneous emotional speech are often imperfect due the ambiguities... more Training datasets containing spontaneous emotional speech are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with Mel Frequency Cepstral Coefficients (MFCC) and Line Spectral Frequency (LSF) features indicate that utilization of RANSAC in the training phase provides an improvement in the unweighted recall rates on the test set. Experimental studies performed over the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF and MFCC based classifiers provide further significant performance improvements.
Proceedings of 1st International Conference on Image Processing, 1994
... Eastman Kodak Company Rochester, NY 14650-18 16 erdem@kodak .com and misezan@kodak.com ... 2 ... more ... Eastman Kodak Company Rochester, NY 14650-18 16 erdem@kodak .com and misezan@kodak.com ... 2 (top) shows, as a function of the num-ber of generations, the average (over the sequence) number (out of a maximum of 330) of macroblocks in P Pictures whose coding ...
Uploads
Papers by Tanju Erdem