Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Kais Ouni

The method described in this paper deals with the problems of T-wave detection in an ECG. Determining the position of a T-wave is complicated due to the low amplitude, the ambiguous and changing form of the complex. A wavelet transform... more
The method described in this paper deals with the problems of T-wave detection in an ECG. Determining the position of a T-wave is complicated due to the low amplitude, the ambiguous and changing form of the complex. A wavelet transform approach handles these complications therefore a method based on this concept was developed. In this way we developed a detection method that is able to detect T-waves with a sensitivity of 93% and a correct-detection ratio of 93% even with a serious amount of baseline drift and noise.
Electrocardiogram (ECG) data compression algorithm is needed that will reduce the amount of data to be transmitted, stored and analyzed, but without losing the clinical information content. A wavelet ECG data codec based on the Set... more
Electrocardiogram (ECG) data compression algorithm is needed that will reduce the amount of data to be transmitted, stored and analyzed, but without losing the clinical information content. A wavelet ECG data codec based on the Set Partitioning In Hierarchical Trees (SPIHT) compression algorithm is proposed in this paper. The SPIHT algorithm has achieved notable success in still image coding. We modified the algorithm for the one-dimensional (1-D) case and applied it to compression of ECG data. By this compression method, small percent root mean square difference (PRD) and high compression ratio with low implementation complexity are achieved. Experiments on selected records from the MIT-BIH arrhythmia database revealed that the proposed codec is significantly more efficient in compression and in computation than previously proposed ECG compression schemes. Compression ratios of up to 48:1 for ECG signals lead to acceptable results for visual inspection.
— In this paper we present a new formant tracking algorithm where the formant frequencies estimation was based on local maxima detection of a time frequency representation. This representation can be shown by a scalogram issued from a... more
— In this paper we present a new formant tracking algorithm where the formant frequencies estimation was based on local maxima detection of a time frequency representation. This representation can be shown by a scalogram issued from a complex wavelet transform. The formant frequency candidates are validated as local maxima of scalogram which correspond to wavelet ridges. Then in the proposed algorithm, we have introduced the computation of center of gravity as tracking constraint. We tested our new algorithm by applying it on synthesized and natural voiced speech signals. The formant trajectories obtained by our algorithm were compared to those of manually-edited ones of our Arabic database as reference; those given by Fourier transform method and the LPC analysis used in Praat. The comparison of the results showed globally the adequacy of the first three formant trajectories using complex Morlet wavelet refers to the manually-edited formant tracks.
Phoneme is the smallest contrastive unit in the sound system of a language. Moreover, it has a meaningful role in speech recognition. In this study, we are interesting for phonemes recognition of Timit database using HTK toolkit for HMM.... more
Phoneme is the smallest contrastive unit in the sound system of a language. Moreover, it has a meaningful role in speech recognition. In this study, we are interesting for phonemes recognition of Timit database using HTK toolkit for HMM. The main goal is to determine the optimal parameters for the recognizer. For this reason, different speech analysis techniques were operated such as Mel Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC) and Perceptual Linear Prediction (PLP). These techniques were improved by adding temporal derivatives and energy to introduce temporal dynamic of parameters. Results revealed that MFCC and PLP techniques gave a reliable recognition rates using 39 coefficients. Keywords— Features extraction, HMM, HTK, LPC, MFCC, PLP, TIMIT
In this paper we present a features extractor for speech recognition. The proposed features extraction method based on auditory filter modelling. The latter uses a Gammachirp Filterbank (GcFB), where their center frequencies are selected... more
In this paper we present a features extractor for speech recognition. The proposed features extraction method based on auditory filter modelling. The latter uses a Gammachirp Filterbank (GcFB), where their center frequencies are selected according to one of the three scales: the ERB-rate scale, the MEL scale or the BARK scale. The performance of the proposed features is evaluated, in the context of isolated wordsrecognition, on the TIMIT database. The recognition rate of our features extraction method with ERB-rate scale gives interesting results vs. the other two scales. The HTK platform (HMM Toolkit) recognizer is employed for the recognition system task. It’s based on the Hidden Markov Models with Gaussian Mixture densities (HMM-GM). Keywords— Gammachirp auditory filterbank, Features extraction, Speech Recognition
Speaker's identification systems aim to identify, through a set of speech parameters, the speaker's identity. Thus, a relevant speech representation is required. For this purpose, we suggest to combine spectral parameters as the... more
Speaker's identification systems aim to identify, through a set of speech parameters, the speaker's identity. Thus, a relevant speech representation is required. For this purpose, we suggest to combine spectral parameters as the Mel frequency Cepstral coefficients (MFCC) and the perceptual linear predictive (PLP) coefficients and prosodic parameter such as the signal fundamental frequency (F0). There are two main classes for F0 estimation divided into temporal and spectral methods. We employ the sawtooth waveform inspired pitch estimator (SWIPE) algorithm for F0 estimation. It is based on the pitch estimation in the frequency domain. In addition, we evaluate the Gaussian mixture model-universal background model (GMM-UBM) for the modelling purpose. Experiments are involved in Timit database. Identification rates are promising and prove the benefit of the combination for MFCC and PLP rather than using each feature separately and this mainly for noisy data.
HMM applications show that they are an effective and powerful tool for modelling especially stochastic signals. For this reason, we use HMM for Timit phoneme recognition. The main goal is to study the performance of an HMM phoneme... more
HMM applications show that they are an effective and powerful tool for modelling especially stochastic signals. For this reason, we use HMM for Timit phoneme recognition. The main goal is to study the performance of an HMM phoneme recognizer to fix on an optimal signal parameters. So, we apply different techniques of speech parameterization such as MFCC, LPCC and PLP. Then, we compare the recognition rates obtained to check optimal features. We varied coefficient number of each sample from 12 to 39 for all features. Experimental results show that 39 PLP is the most appropriate parameters for our recognizer. Keywords— HMM, HTK, LPCC, MFCC, PLP, TIMIT
In this paper we develop a new approach to ECG analysis, combining Pitch Synchronous Wavelet Transform (PSWT) and Hidden Semi-Markov Model (HSMM) for tracking the typical ECG cycle. The combination of these two techniques was examined in... more
In this paper we develop a new approach to ECG analysis, combining Pitch Synchronous Wavelet Transform (PSWT) and Hidden Semi-Markov Model (HSMM) for tracking the typical ECG cycle. The combination of these two techniques was examined in a way that the PSWT of an ECG signal was an input for the HSMM. This approach was tested and evaluated on the manually annotated QT database. Experimental results show the accuracy of the proposed technique for all corrupted ECG tested reaching a sensitivity Se=99,95% for QRS detection and Se=97,79% for T detection.
Ce travail concerne le developpement d'un systeme de conversion de voix oesophagienne dans le but est de rendre plus intelligible celle-ci. La conversion de voix est une technique de transformation d'un signal de parole d'un... more
Ce travail concerne le developpement d'un systeme de conversion de voix oesophagienne dans le but est de rendre plus intelligible celle-ci. La conversion de voix est une technique de transformation d'un signal de parole d'un locuteur source, de maniere a ce qu'il semble, a l'ecoute, etre prononce par un locuteur cible. Etant donnee la specificite de la voix oesophagienne, nous proposons dans cette etude d'appliquer une nouvelle technique de conversion vocale en tenant compte de la particularite de l'appareil vocal des patients qui ont subi une ablation de larynx. En effet, l'ablation des cordes vocales perturbe profondement le signal glottique et par consequent la voix oesophagienne acquise par le patient laryngectomise est difficile a comprendre, rauque et faible en intensite. Dans la litterature, plusieurs techniques de conversion des voix ont ete proposees, parmi lesquelles, la technique du codage lineaire predictif pour la conversion vocale [1] et...
Despite the advances of information technology tools in the speech recognition task, the challenge to find a rapid and an efficient approach remains a principal research topic. In this paper, we apply the k-nearest neighbors (kNN)... more
Despite the advances of information technology tools in the speech recognition task, the challenge to find a rapid and an efficient approach remains a principal research topic. In this paper, we apply the k-nearest neighbors (kNN) algorithm for Timit phoneme recognition with two models: crisp and fuzzy. Essentially, we explore the contribution of the fuzzy aspect for the crisp version of the kNN algorithm. KNN algorithm is characterized by simple implementation, efficiency and speed of execution. The recognition approach consists of extracting a mean reference vector from each phoneme signal in order to assign a crisp or a fuzzy membership degree by measuring the distance to its kNN. The average recognition rate obtained show that kNN algorithm can provide a significant way for the phoneme recognition task particularly using the fuzzy variant.
Despite the significant advances noted in semantic segmentation of aerial imagery, a considerable limitation is blocking its adoption in real cases. If we test a segmentation model on a new area that is not included in its initial... more
Despite the significant advances noted in semantic segmentation of aerial imagery, a considerable limitation is blocking its adoption in real cases. If we test a segmentation model on a new area that is not included in its initial training set, accuracy will decrease remarkably. This is caused by the domain shift between the new targeted domain and the source domain used to train the model. In this paper, we addressed this challenge and proposed a new algorithm that uses Generative Adversarial Networks (GAN) architecture to minimize the domain shift and increase the ability of the model to work on new targeted domains. The proposed GAN architecture contains two GAN networks. The first GAN network converts the chosen image from the target domain into a semantic label. The second GAN network converts this generated semantic label into an image that belongs to the source domain but conserves the semantic map of the target image. This resulting image will be used by the semantic segment...
Segmenting aerial images is of great potential in surveillance and scene understanding of urban areas. It provides a mean for automatic reporting of the different events that happen in inhabited areas. This remarkably promotes public... more
Segmenting aerial images is of great potential in surveillance and scene understanding of urban areas. It provides a mean for automatic reporting of the different events that happen in inhabited areas. This remarkably promotes public safety and traffic management applications. After the wide adoption of convolutional neural networks methods, the accuracy of semantic segmentation algorithms could easily surpass 80% if a robust dataset is provided. Despite this success, the deployment of a pretrained segmentation model to survey a new city that is not included in the training set significantly decreases accuracy. This is due to the domain shift between the source dataset on which the model is trained and the new target domain of the new city images. In this paper, we address this issue and consider the challenge of domain adaptation in semantic segmentation of aerial images. We designed an algorithm that reduces the domain shift impact using generative adversarial networks (GANs). In ...
RÉSUMÉ Dans ce papier nous proposons d'étudier les performances d'une nouvelle représentation temps-frence dite la transformation en paquets de framelette serrée dans le codage de la parole. Nous avons effectué, pour cela, une... more
RÉSUMÉ Dans ce papier nous proposons d'étudier les performances d'une nouvelle représentation temps-frence dite la transformation en paquets de framelette serrée dans le codage de la parole. Nous avons effectué, pour cela, une étude comparative avec la transformation en paquets d'ondelette. L'évaluation des performances a été effectué en utilisant différents critères objectifs : le gain de codage, l'erreur quadratique moyenne à racine normalisée, le rapport signal sur bruit de crête, le rapport signal sur bruit segmental, le rapport signal sur bruit segmental à fréquence pondérée et le PESQ. Les résultats obtenus montrent que le codage de la parole basé sur la transformation en paquets de framelette fournit une qualité supérieure à celui basé sur la transformation en paquets d'ondelette. ABSTRACT Study for improving the coded speech by tight framelet packet transform In this paper we propose to study the performance of a new time-frequency representation call...
Research Interests:
This paper develops a formant tracking technique based on Fourier ridges detection. In this method we have introduced a constraint of tracking based on the computation of centre of gravity for a set of frequency formant candidates which... more
This paper develops a formant tracking technique based on Fourier ridges detection. In this method we have introduced a constraint of tracking based on the computation of centre of gravity for a set of frequency formant candidates which leads to connect a frame of speech to its neighbours and thus to improve the robustness of tracking. The formant trajectories obtained by the algorithm proposed are compared to those of a hand edited formant Arabic database, created especially for this work, and those given by Praat with LPC data.
Research Interests:
Research Interests:
In this paper we propose a comparative review between the proposed digital audio watermarking technique and those achieved by Luigi Rosa and Rolf Brigola. The performed technique operates in the frequency domain. The time-frequency... more
In this paper we propose a comparative review between the proposed digital audio watermarking technique and those achieved by Luigi Rosa and Rolf Brigola. The performed technique operates in the frequency domain. The time-frequency mapping is done using a Modified Discrete Cosine Transform (MDCT). The technique developed by Luigi Rosa operates in the frequency domain but using the Discrete Cosine
ABSTRACT
... Thus, Bladon and Fant [BLA, 78], Paliwal, Lindsay and Ainsworth [BLA, 83] used formulas to estimate F2' as a ... Thus, Bladon and Fant [BLA, 78], Paliwal, Lindsay and Ainsworth [BLA, 83] used formulas to estimate F2'... more
... Thus, Bladon and Fant [BLA, 78], Paliwal, Lindsay and Ainsworth [BLA, 83] used formulas to estimate F2' as a ... Thus, Bladon and Fant [BLA, 78], Paliwal, Lindsay and Ainsworth [BLA, 83] used formulas to estimate F2' as a weighted average of the first four formants. ...
Publication in the conference proceedings of EUSIPCO, Antalya, Turkey, 2005
In this paper we describe a systematic procedure to implement two-stage based keywords spotting system (KWS). In first stage, a phonetic decoding of continuous speech is obtained using a CD-DNN-HMM model built with the Kaldi toolkit. In... more
In this paper we describe a systematic procedure to implement two-stage based keywords spotting system (KWS). In first stage, a phonetic decoding of continuous speech is obtained using a CD-DNN-HMM model built with the Kaldi toolkit. In second stage, these results of phonetic transcriptions will serve to construct a system to search the keywords embedded in continuous speech using the classification and regression tree (CART) implemented with the software MATLAB. The work will be done using the TIMIT data base.
In this paper, we will present a watermarking technique for image. Our proposed technique operates in the frequency domain. We will apply the MDCT (Modified Discrete Cosine Transform) on the original image to pass from temporal domain to... more
In this paper, we will present a watermarking technique for image. Our proposed technique operates in the frequency domain. We will apply the MDCT (Modified Discrete Cosine Transform) on the original image to pass from temporal domain to frequency domain. We exploited the proprieties of the JND (Just Noticeable Difference) model to search the places for insertion of the watermark to optimize the imperceptibility criterion of the mark at the insertion phase of the proposed technique. To ensure maximum capacity of insertion, we will duplicate the bits of the brand N times and then each bit is inserted in the LSB (Least Significant Bit) of the components sought by JND model. In order to increase the detection rates, we used Hamming code as error correction code. We evaluated the invisibility of the technique by calculating the PSNR, the Structural Similarity Metric (SSIM). We studied the robustness of this technique against the different attacks given by checkmark. Finally, to highligh...
This paper presents a novel approach for enhancing esophageal speech using voice conversion techniques. Esophageal speech (ES) is an alternative voice that allows a patient with no vocal cords to produce sounds after total laryngectomy:... more
This paper presents a novel approach for enhancing esophageal speech using voice conversion techniques. Esophageal speech (ES) is an alternative voice that allows a patient with no vocal cords to produce sounds after total laryngectomy: although it doesn't need any external devices, this voice sounds unnatural when compared to laryngeal speech. ES is frequently described as a harsh speech with low pitch frequency and loudness. Consequently, ES has a poor degree of intelligibility and a poor quality. To improve naturalness and intelligibility of esophageal speech, we propose a speaking-aid system enhancing ES in order to clarify and make it more natural. Given the specificity of ES, in this study, we propose to apply a new voice conversion technique taking into account the particularity of the pathological vocal apparatus. The vocal tract and excitation cepstral coefficients are separately estimated. We trained deep neural networks (DNNs) and Gaussian mixture models (GMMs) to pre...
The goal of this paper is to compare the performance of two time-frequency decomposition in a context of speech coding. These decompositions are based on wavelet and wavelet frame theory. The main advantages of wavelet frame compared to... more
The goal of this paper is to compare the performance of two time-frequency decomposition in a context of speech coding. These decompositions are based on wavelet and wavelet frame theory. The main advantages of wavelet frame compared to wavelet are perfect reconstruction, resilience to quantization noise, nearly shift-invariant, symmetry and good time-frequency localization. The evaluation tests reveal that the quality of coded speech using the tight framelets packet transform outperform that of the wavelets packet transform.
In this paper, we study and evaluate the task of semantic segmentation of the spinal cord in ultrasound medical imagery. This task is useful for neurosurgeons to analyze the spinal cord movement during and after the laminectomy surgical... more
In this paper, we study and evaluate the task of semantic segmentation of the spinal cord in ultrasound medical imagery. This task is useful for neurosurgeons to analyze the spinal cord movement during and after the laminectomy surgical operation. Laminectomy is performed on patients that suffer from an abnormal pressure made on the spinal cord. The surgeon operates by cutting the bones of the laminae and the intervening ligaments to relieve this pressure. During the surgery, ultrasound waves can pass through the laminectomy area to give real-time exploitable images of the spinal cord. The surgeon uses them to confirm spinal cord decompression or, occasionally, to assess a tumor adjacent to the spinal cord. The Freely pulsating spinal cord is a sign of adequate decompression. To evaluate the semantic segmentation approaches chosen in this study, we constructed two datasets using images collected from 10 different patients performing the laminectomy surgery. We found that the best so...
The innovation of our method is to design a new technique with multiple capabilities, autonomy and automatic applications in the diagnosis of some cardiovascular parameters relative to a certain categories of patients (elderly, children,... more
The innovation of our method is to design a new technique with multiple capabilities, autonomy and automatic applications in the diagnosis of some cardiovascular parameters relative to a certain categories of patients (elderly, children, pregnant women) with cardiovascular diseases. The objective of this work is to perform automatic diagnosis by processing the bioimpedance signal or impedance cardiography signal (ICG) which represents the aorta impedance variation during the heart cycle activity. Our analysis is based on the determination of ICG cepstral parameters (seven cepstral parameters). The application of the discriminant analysis method allowed us to perform the diagnosis of 168 anonymous cases. The seven cepstral parameters are sufficient to perform the automatic diagnosis of the cardiovascular system abnormalities with 97.2% of correctly classified cases.
View the development of the Internet in the 90s and direction of the world into an era where digital is playing an increasingly important we found face serious problems of copyright, piracy, rights ... OF where watermarking comes as an... more
View the development of the Internet in the 90s and direction of the world into an era where digital is playing an increasingly important we found face serious problems of copyright, piracy, rights ... OF where watermarking comes as an effective solution faces to these problems. Watermarking a signal (audio, image, video) means inserting a message (audio, image, beep, text) in an inaudible manner in the original audio signal in order to protect it. In this paper, we present a digital audio watermarking technique operating in the frequency domain. This technique uses the modified discrete cosine transform (MDCT) to switch to the frequency domain and the characteristics of the psychoacoustic model 2 of the MPEG standard in the insertion phase. The bits of the brand are duplicated to increase the capacity of insertion and then inserted into the sample selected by the MPH. To improve reliability in the decoding phase, an error correction code (Hamming) was introduced. The inaudibility of this technique is evaluated by using the PEAQ algorithm and calculating the SNR. The robustness of this technique is shown against different types of attacks.
In this paper, we present a study of isolated word speech recognition system. The adopted system is based on the Hidden Markov Model with Gaussian Mixture (HMM-GM). We studied the recognition rate by varying the states number (3, 4, 5, 6... more
In this paper, we present a study of isolated word speech recognition system. The adopted system is based on the Hidden Markov Model with Gaussian Mixture (HMM-GM). We studied the recognition rate by varying the states number (3, 4, 5, 6 and 7 states) and the number of Gaussians per state (2, 4, 8, 12, 14 and 16 Gaussians) of Hidden Markov Model. We evaluated these recognition rates using two parameterization techniques Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP). We have introduced the dynamic coefficients and the energy of the signal in order to achieve an improvement in the recognition rate.
We report on a comparison of fuzzy and HMM phoneme recognizers of data Timit corpus. We aimed to check an optimal number of signal cepstral coefficients for both a1pproaches. For this purpose, we used different parameterization techniques... more
We report on a comparison of fuzzy and HMM phoneme recognizers of data Timit corpus. We aimed to check an optimal number of signal cepstral coefficients for both a1pproaches. For this purpose, we used different parameterization techniques such as MFCC, LPCC and PLP. Also, coefficient numbers has been varied from 12 to 39 including first and second derivatives and signal energy to introduce signal temporal variation. Results showed that an appropriate number of acoustic parameters lead to an extensive performance recognition for both systems.
Research Interests:
In this paper we present the conception and the implementation of a speech processing interface for auditory prosthesis. This module is based on a numerical speech processing algorithm which modelizes the infected ear and generates the... more
In this paper we present the conception and the implementation of a speech processing interface for auditory prosthesis. This module is based on a numerical speech processing algorithm which modelizes the infected ear and generates the stimulus signals for the cilia cells. This interface uses a gammachirp filter bank (G.F.B) constituted of 16 band pass filters based on IIR filters.
ABSTRACT
In this paper, we present a new design of a psychoacoustic model following the example model used in audio standard MPEG-1 layer 3 with the Gammachirp wavelet packet decomposition. The essential characteristic of this model is that it... more
In this paper, we present a new design of a psychoacoustic model following the example model used in audio standard MPEG-1 layer 3 with the Gammachirp wavelet packet decomposition. The essential characteristic of this model is that it proposes an analysis by wavelet packet transformation on the frequency bands that come closer the critical bands of the ear that differs from the existing model based on an analysis by a short-term Fourier transformation. This study shows the best performance of the Gammachirp coder using the highest chirp term.
In this paper we present a new design of a psychoacoustic model for audio coding following the model used in the standard MPEG-1 audio layer 3. This architecture is based on appropriate wavelet packet decomposition instead of a short term... more
In this paper we present a new design of a psychoacoustic model for audio coding following the model used in the standard MPEG-1 audio layer 3. This architecture is based on appropriate wavelet packet decomposition instead of a short term Fourier transformation. Its important characteristic is to propose an analysis of the frequency bands that come closer to the critical bands of the ear. This study shows the best performance of the Gammachirp coder.

And 8 more