Zied MNASRI

With the development of multi-modal man-machine interaction, audio signal analysis is gaining importance in a field traditionally dominated by video. In particular, anomalous sound event detection offers novel options to improve... more

With the development of multi-modal man-machine interaction, audio signal analysis is gaining importance in a field traditionally dominated by video. In particular, anomalous sound event detection offers novel options to improve audio-based man-machine interaction, in many useful applications such as surveillance systems, industrial fault detection and especially safety monitoring, either indoor or outdoor. Event detection from audio can fruitfully integrate visual information and can outperform it in some respects, thus representing a complementary perceptual modality. However, it also presents specific issues and challenges. In this paper, a comprehensive survey of anomalous sound event detection is presented, covering various aspects of the topic, ı.e.feature extraction methods, datasets, evaluation metrics, methods, applications, and some open challenges and improvement ideas that have been recently raised in the literature.

DOI: 10.1007/s11042-021-11817-9

Publication Date: 2021

Publication Name: Multimedia tools and applications

Research Interests:
Machine Learning, Anomaly Detection, Audio, Speech and Language Processing, and Audio Event Detection

Download (.pdf)

Audio signal processing is moving towards detecting and/or defining rare/anomalous sounds. The application of such an anomaly detection problem can be easily extended to audio surveillance systems. Thus, a rare sound event detection... more

Audio signal processing is moving towards detecting and/or defining rare/anomalous sounds. The application of such an anomaly detection problem can be easily extended to audio surveillance systems. Thus, a rare sound event detection method for road traffic monitoring is proposed in this paper, including detection of hazardous events, i.e., road accidents. The method is based on combining anomaly detection techniques, such as variational autoencoders (VAE) and Interval-valued fuzzy sets. The VAE is used to calculate the reconstruction error of the input audio segment. Based on this reconstruction error, a fuzzy membership function, composed of an optimistic/upper component and a pessimistic/lower component, is calculated. Finally, a probabilistic method for interval comparison is used to calculate the membership score, hence to evaluate the interval-valued fuzzy sets. Finally, classification into anomalous/normal events is obtained by defuzzification. Results show that with a careful parameter setting, the proposed method outperforms the state-of-the-art one-class SVM for anomaly detection.

Publication Date: 2021

Publication Name: Proceedings of 13th International workshop on Fuzzy logic and applications (WILF 2021)

Research Interests:
Fuzzy Logic, Anomaly Detection, Fuzzy Set, Audio Event Detection, and Variational Autoencoders

Download (.pdf)

In this paper, a novel relationship between instantaneous frequency (IF) and fundamental frequency (F0) in voiced parts of speech signals is presented. IF is calculated as the time-derivative of the phase of the analytic signal, yielding... more

In this paper, a novel relationship between instantaneous frequency (IF) and fundamental frequency (F0) in voiced parts of speech signals is presented. IF is calculated as the time-derivative of the phase of the analytic signal, yielding from Hilbert transform. Whereas F0 can be extracted using any classical pitch tracking technique (e.g. autocorrelation, cepstrum, subharmonic-to-harmonic ratio (SHR) independently of the tool used to extract F0. This relationship states that the envelope of the residual of the instantaneous frequency, defined as the difference between IF and the maximum of harmonics tends to F0. Such a direct relationship may be useful for further developments of F0 extraction directly from the speech signal, avoiding the approximation that exists in most pitch extraction techniques.

Publication Date: 2018

Download (.pdf)

Publisher: Springer Science and Business Media LLC

Publication Name: International Journal of Computational Intelligence Systems

Research Interests:
Computer Science

Download (.pdf)

Speech synthesis quality depends on its naturalness and intelligibility. These abstract concepts are the concern of phonology. In terms of phonetics, they are transmitted by prosodic components, mainly the fundamental frequency (F0)... more

Speech synthesis quality depends on its naturalness and intelligibility. These abstract concepts are the concern of phonology. In terms of phonetics, they are transmitted by prosodic components, mainly the fundamental frequency (F0) contour. F0 contour modeling is performed either by setting rules or by investigating databases, with or without parameters and following a timely sequential path or a parallel and super-positional scheme. In this study, we opted to model the F0 contour for Arabic using the Fujisaki parameters to be trained by neural networks. Statistical evaluation was carried out to measure the predicted parameters accuracy and the synthesized F0 contour closeness to the natural one. Findings concerning the adoption of Fujisaki parameters to Arabic F0 contour modeling for text-to-speech synthesis were discussed.

Publication Date: 2011

Download (.pdf)

Sound duration is responsible for rhythm and speech rate. Furthermore, in some languages phoneme length is an important phonetic and prosodic factor. For example, in Arabic, gemination and vowel quantity are two important characteristics... more

Sound duration is responsible for rhythm and speech rate. Furthermore, in some languages phoneme length is an important phonetic and prosodic factor. For example, in Arabic, gemination and vowel quantity are two important characteristics of the language. Therefore, accurate duration modelling is crucial for Arabic TTS systems. This paper is interested in improving the modelling of phone duration for Arabic statistical parametric speech synthesis using DNN-based models. In fact, since a few years, DNN have been frequently used for parametric speech synthesis, instead of HMM. Therefore, several variants of DNN-based duration models for Arabic are investigated. The novelty consists in training a specific DNN model for each class of sounds, i.e. short vowels, long vowels, simple consonants and geminated consonants. The main idea behind this choice is the improvement that we already achieved in the quality of Arabic parametric speech synthesis by the introduction of two specific features...

Publisher: Multim. Tools Appl.

Publication Date: 2021

Publication Name: Multim. Tools Appl.

Research Interests:
Information Systems, Computer Science, and Computer Software

This paper investigates statistical parametric speech synthesis of Modern Standard Arabic (MSA). Hidden Markov Models (HMM)-based speech synthesis system relies on a description of speech segments corresponding to phonemes, with a large... more

This paper investigates statistical parametric speech synthesis of Modern Standard Arabic (MSA). Hidden Markov Models (HMM)-based speech synthesis system relies on a description of speech segments corresponding to phonemes, with a large set of features that represent phonetic, phonologic, linguistic and contextual aspects. When applied to MSA two specific phenomena have to be taken in account, the vowel lengthening and the consonant gemination. This paper studies thoroughly the modeling of these phenomena through various approaches: as for example, the use of different units for modeling short vs. long vowels and the use of different units for modeling simple vs. geminated consonants. These approaches are compared to another one which merges short and long variants of a vowel into a single unit and, simple and geminated variants of a consonant into a single unit (these characteristics being handled through the features associated to the sound). Results of subjective evaluation show ...

Publication Date: 2017

Research Interests:
Computer Science

Spectrogram inversion or phase retrieval is an old topic in digital signal processing, that has been revisited since a few years for its proved relevance to many recent applications, such as source separation, speech enhancement and... more

Spectrogram inversion or phase retrieval is an old topic in digital signal processing, that has been revisited since a few years for its proved relevance to many recent applications, such as source separation, speech enhancement and compressive sensing. Spectrogram inversion aims to reconstruct a signal from partial spectral information, such as the magnitude spectrum or the phase spectrum only, which are obtained by the short-time Fourier transform (STFT). Thus, in this work the relevance of signal reconstruction is studied. First, the proposed algorithm, based on the recent theoretic relationships between STFT magnitude and phase is presented. Secondly, the proposed method is tested on clean and simulated-noisy speech. Finally, the relevance of spectrogram inversion as implemented either in our proposal or in state-of-the-art algorithms is evaluated for the particular application on speech enhancement. The results show the advantages and the limits of using spectrogram inversion i...

Publisher: 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)

Publication Date: 2021

Publication Name: 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)

Research Interests:
Computer Science

Road safety has always been a major concern, where a variety of competences is involved, ranging from government and local authorities, medical caregivers and other service provides. Prompt intervention in emergency cases is one of the... more

Road safety has always been a major concern, where a variety of competences is involved, ranging from government and local authorities, medical caregivers and other service provides. Prompt intervention in emergency cases is one of the key factors to minimize damages. Therefore, real-time surveillance is proposed as an efficient means to detect problems on roads. Video surveillance alone is not enough to detect serious accidents, since any hazardous behavior on the road may be confused with an accident, which may lead to many wrong alarms. Instead, audio processing has the potential to recognize sounds coming from different sources, such as crashes, tire skidding, harsh braking, etc. Since a few years, deep learning has become the state of the art for audio events detection. However, the usual dominance of absence of events in road surveillance would make a bias in the training process. Therefore, a novel method to initialize the neural network's weights using an autoencoder trained only on event-related data is used to balance the data distribution.

Publisher: IEEE

Publication Name: 2020 IEEE 20th Mediterranean Electrotechnical Conference ( MELECON)

Research Interests:
Computer Science

This paper describes a gemination prediction model for Arabic consonants, based on deep neural networks (DNN). Actually, though the importance of gemination to understand the right meaning of the word, the gemination sign (shadda) is very... more

This paper describes a gemination prediction model for Arabic consonants, based on deep neural networks (DNN). Actually, though the importance of gemination to understand the right meaning of the word, the gemination sign (shadda) is very often omitted in modern standard Arabic printed/typed texts, which would generate errors in automatic text applications, such as text-to-speech synthesis and automatic translation. Therefore, gemination prediction for Arabic consonants has been achieved as a part of automatic diacritization module, for DNN-based arabic text-to-speech synthesis. Different DNN models were trained using feedforward and recurrent architectures. The reported results show the ability of recurrent DNN to detect the consonants which have to be geminated in a non-diacritized arabic text, with a very high accuracy.

Publisher: IEEE

Publication Name: 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD)

Research Interests:
Computer Science

Surveillance systems are increasingly exploiting multimodal information for improved effectiveness. This paper presents an audio event detection method for road traffic surveillance, combining generative deep autoencoders and fuzzy... more

Surveillance systems are increasingly exploiting multimodal information for improved effectiveness. This paper presents an audio event detection method for road traffic surveillance, combining generative deep autoencoders and fuzzy modelling to perform anomaly detection. Baseline deep autoencoders are used to compute the reconstruction error of each audio segment, which provides a primary estimation of outlierness. To account for the uncertainty associated to this decision-making step, an interval type-2 fuzzy membership function composed of an optimistic/upper component and a pessimistic/lower component is used. The final class attribution employs a probabilistic method for interval comparison. Evaluation results obtained after defuzzification show that, with a careful parameter setting, the proposed membership function effectively improves the performance of the baseline autoencoder, and performs better than the stateof-the-art one-class SVM in anomaly detection.

Publisher: Atlantis Press

Publication Name: Joint Proceedings of the 19th World Congress of the International Fuzzy Systems Association (IFSA), the 12th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT), and the 11th International Summer School on Aggregation Operators (AGOP)

Research Interests:
Audio Signal Processing, Anomaly Detection, and Audio Event Detection

ABSTRACT Arabic text-to-speech synthesis needs to be developed, in order to be integrated to many IT applications, like email and SMS reading, automatic information delivery and helping disabled people to use such sophisicated services.... more

ABSTRACT Arabic text-to-speech synthesis needs to be developed, in order to be integrated to many IT applications, like email and SMS reading, automatic information delivery and helping disabled people to use such sophisicated services. However, a standalone text-to-speech system needs automatic generation of prosody, including F0 contour prediction. Thus, F0 contour is linked to the text data via the Fujisaki model, which divides F0 contour into phrase and accents components. Furthermore, the parametric structure of Fujisaki model reduces the problem into the estimation of parameters. Hence, regression techniques, such as MARS, are useful to map the text-retrieved features to the speech-signal-extracted parameters. Then, the overall F0 contour is reconstructed and compared to the original one, to validate the model.

Research Interests:
Computer Science, SSD, Data Model, Multivariate adaptive regression splines, and Parametric Model

Publication Date: 2011

Research Interests:
Computer Science and Computer Applications

Download (.pdf)

With the development of multi-modal man-machine interaction, audio signal analysis is gaining importance in a field traditionally dominated by video. In particular, anomalous sound event detection offers novel options to improve... more

With the development of multi-modal man-machine interaction, audio signal analysis is gaining importance in a field traditionally dominated by video. In particular, anomalous sound event detection offers novel options to improve audio-based man-machine interaction, in many useful applications such as surveillance systems, industrial fault detection and especially safety monitoring, either indoor or outdoor. Event detection from audio can fruitfully integrate visual information and can outperform it in some respects, thus representing a complementary perceptual modality. However, it also presents specific issues and challenges. In this paper, a comprehensive survey of anomalous sound event detection is presented, covering various aspects of the topic, ı.e.feature extraction methods, datasets, evaluation metrics, methods, applications, and some open challenges and improvement ideas that have been recently raised in the literature.

DOI: 10.1007/s11042-021-11817-9

Publication Date: 2021

Publication Name: Multimedia tools and applications

Research Interests:
Machine Learning, Anomaly Detection, Supervised Learning, Audio, Speech and Language Processing, Supervised/unsupervised Machine Learning, and Audio Event Detection

Download (.pdf)

In this paper, a novel pitch detection algorithm (PDA) is presented. Though pitch detection is a classical problem that has been investigated since the very beginning of speech processing, the proposed algorithm is based on a novel... more

In this paper, a novel pitch detection algorithm (PDA) is presented. Though pitch detection is a classical problem that has been investigated since the very beginning of speech processing, the proposed algorithm is based on a novel approach relying on a proposed empirical relationship between fundamental frequency (f0) and instantaneous frequency (fi). Basically, f0 is defined for periodic signals only, whereas fi can be calculated for any type of signals using the Hilbert transform. Notwithstanding this substantial difference, the relationship described in this paper shows some interaction between them, at least empirically. Once this relationship was validated on a large set of speech signals, it has been exploited to implement an algorithm in order to (a) detect voiced parts of speech and (b) extract f0 contour from fi pattern in the voiced regions. The obtained results of the proposed method were compared to those of some well-rated state-of-the-art PDA's of different backgrounds, to show that the quality of pitch detection yielded by the proposed approach is quite satisfactory, both in clean and simulated noisy speech.

Publication Date: 2021

Publication Name: 29th European Signal Processing Conference (EUSIPCO 2021)At: Dublin, Ireland, 23-27 Aug, 2021

Research Interests:
Digital Signal Processing, Speech analysis, and Audio, Speech and Language Processing

Download (.pdf)

Surveillance systems are getting more and more multimodal. The availability of audio motivates a method for anomalous audio event detection (anomalous AED) for road traffic surveillance, which is proposed in this paper. The method is... more

Surveillance systems are getting more and more multimodal. The availability of audio motivates a method for anomalous audio event detection (anomalous AED) for road traffic surveillance, which is proposed in this paper. The method is based on combining anomaly detection techniques, such as reconstruction deep autoencoders and fuzzy membership functions. A baseline deep autoencoder is used to compute the reconstruction error of each audio segment. The comparison of this error to a preset threshold provides a primary estimation of outlierness. To account for the uncertainty associated to this decision-making step, a fuzzy membership function composed of an optimistic/upper component and a pessimistic/lower component is used. Evaluation results obtained after defuzzification show that with a careful parameter setting, the proposed membership function improves the performance of the baseline autoencoder for anomaly detection, and yields better or at least similar results than other anomaly detection state-of-the-art methods such as one-class SVM.

Publication Date: 2021

Publication Name: Advances in Computational Intelligence Systems - Contributions Presented at the 20th UK Workshop on Computational Intelligence, September 8-10, 2021, Aberystwyth, Wales, UK

Research Interests:
Audio Signal Processing, Intelligent Transportation Systems, Decision Making Under Uncertainty, Anomaly Detection, and Fuzzy Clustering

Download (.pdf)

Surveillance systems are increasingly exploiting multimodal information for improved effectiveness. This paper presents an audio event detection method for road traffic surveillance, combining generative deep autoencoders and fuzzy... more

Surveillance systems are increasingly exploiting multimodal information for improved effectiveness. This paper presents an audio event detection method for road traffic surveillance, combining generative deep autoencoders and fuzzy modelling to perform anomaly detection. Baseline deep autoencoders are used to compute the reconstruction error of each audio segment, which provides a primary estimation of outlierness. To account for the uncertainty associated to this decision-making step, an interval type-2 fuzzy membership function composed of an optimistic/upper component and a pessimistic/lower component is used. The final class attribution employs a probabilistic method for interval comparison. Evaluation results obtained after defuzzification show that, with a careful parameter setting, the proposed membership function effectively improves the performance of the baseline autoencoder, and performs better than the stateof-the-art one-class SVM in anomaly detection.

DOI: 10.2991/asum.k.210827.059

Publication Date: 2021

Publication Name: Conference: 19th World Congress of the International Fuzzy Systems Association (IFSA), 12th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT), and 11th International Summer School on Aggregation Operators (AGOP)

Research Interests:
Audio Signal Processing, Anomaly Detection, and Audio Event Detection

Download (.pdf)

DOI: 10.1007/s11042-021-11817-9

Publication Date: 2021

Publication Name: Multimedia tools and applications

Research Interests: Machine Learning, Anomaly Detection, Audio, Speech and Language Processing, and Audio Event Detection<div>()</div>

Publication Date: 2021

Publication Name: Proceedings of 13th International workshop on Fuzzy logic and applications (WILF 2021)

Research Interests: Fuzzy Logic, Anomaly Detection, Fuzzy Set, Audio Event Detection, and Variational Autoencoders<div>()</div>

Publication Date: 2018

Publisher: Springer Science and Business Media LLC

Publication Name: International Journal of Computational Intelligence Systems

Research Interests: Computer Science<div>()</div>

Publication Date: 2011

Publisher: Multim. Tools Appl.

Publication Date: 2021

Publication Name: Multim. Tools Appl.

Research Interests: Information Systems, Computer Science, and Computer Software<div>()</div>

Publication Date: 2017

Research Interests: Computer Science<div>()</div>

Publisher: 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)

Publication Date: 2021

Publication Name: 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD)

Research Interests: Computer Science<div>()</div>

Publisher: IEEE

Publication Name: 2020 IEEE 20th Mediterranean Electrotechnical Conference ( MELECON)

Research Interests: Computer Science<div>()</div>

Publisher: IEEE

Publication Name: 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD)

Research Interests: Computer Science<div>()</div>

Publisher: Atlantis Press

Publication Name: Joint Proceedings of the 19th World Congress of the International Fuzzy Systems Association (IFSA), the 12th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT), and the 11th International Summer School on Aggregation Operators (AGOP)

Research Interests: Audio Signal Processing, Anomaly Detection, and Audio Event Detection<div>()</div>

Research Interests: Computer Science, SSD, Data Model, Multivariate adaptive regression splines, and Parametric Model<div>()</div>

Publication Date: 2011

Research Interests: Computer Science and Computer Applications<div>()</div>

DOI: 10.1007/s11042-021-11817-9

Publication Date: 2021

Publication Name: Multimedia tools and applications

Research Interests: Machine Learning, Anomaly Detection, Supervised Learning, Audio, Speech and Language Processing, Supervised/unsupervised Machine Learning, and Audio Event Detection<div>()</div>

Publication Date: 2021

Publication Name: 29th European Signal Processing Conference (EUSIPCO 2021)At: Dublin, Ireland, 23-27 Aug, 2021

Research Interests: Digital Signal Processing, Speech analysis, and Audio, Speech and Language Processing<div>()</div>

Publication Date: 2021

Publication Name: Advances in Computational Intelligence Systems - Contributions Presented at the 20th UK Workshop on Computational Intelligence, September 8-10, 2021, Aberystwyth, Wales, UK

Research Interests: Audio Signal Processing, Intelligent Transportation Systems, Decision Making Under Uncertainty, Anomaly Detection, and Fuzzy Clustering<div>()</div>

DOI: 10.2991/asum.k.210827.059

Publication Date: 2021

Publication Name: Conference: 19th World Congress of the International Fuzzy Systems Association (IFSA), 12th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT), and 11th International Summer School on Aggregation Operators (AGOP)

Research Interests: Audio Signal Processing, Anomaly Detection, and Audio Event Detection<div>()</div>

Publication Date: 2021

Publication Name: In book: Advances in Computational Intelligence Systems - Contributions Presented at the 20th UK Workshop on Computational Intelligence, September 8-10, 2021, Aberystwyth, Wales, UK

Research Interests: Audio Signal Processing, Decision Making Under Uncertainty, and Anomaly Detection<div>()</div>

Log In

Research Interests:
Machine Learning, Anomaly Detection, Audio, Speech and Language Processing, and Audio Event Detection

Research Interests:
Fuzzy Logic, Anomaly Detection, Fuzzy Set, Audio Event Detection, and Variational Autoencoders

Research Interests:
Computer Science

Research Interests:
Information Systems, Computer Science, and Computer Software

Research Interests:
Computer Science

Research Interests:
Computer Science

Research Interests:
Computer Science

Research Interests:
Computer Science

Research Interests:
Audio Signal Processing, Anomaly Detection, and Audio Event Detection

Research Interests:
Computer Science, SSD, Data Model, Multivariate adaptive regression splines, and Parametric Model

Research Interests:
Computer Science and Computer Applications

Research Interests:
Machine Learning, Anomaly Detection, Supervised Learning, Audio, Speech and Language Processing, Supervised/unsupervised Machine Learning, and Audio Event Detection

Research Interests:
Digital Signal Processing, Speech analysis, and Audio, Speech and Language Processing

Research Interests:
Audio Signal Processing, Intelligent Transportation Systems, Decision Making Under Uncertainty, Anomaly Detection, and Fuzzy Clustering

Research Interests:
Audio Signal Processing, Anomaly Detection, and Audio Event Detection

Research Interests:
Audio Signal Processing, Decision Making Under Uncertainty, and Anomaly Detection