MDPI - Publisher of Open Access Journals

24 pages, 11826 KiB

Open AccessArticle

Generation of High Temporal Resolution Fractional Forest Cover Data and Its Application in Accurate Time Detection of Forest Loss

by Wenxi Shi, Xiang Zhao, Hua Yang, Longping Si, Qian Wang, Siqing Zhao and Yinkun Guo

Remote Sens. 2024, 16(13), 2387; https://doi.org/10.3390/rs16132387 - 28 Jun 2024

Viewed by 730

Abstract

Fractional Forest cover holds significance in characterizing the ecological condition of forests and serves as a crucial input parameter for climate and hydrological models. This research introduces a novel approach for generating a 250 m fractional forest cover product with an 8-day temporal [...] Read more.

Fractional Forest cover holds significance in characterizing the ecological condition of forests and serves as a crucial input parameter for climate and hydrological models. This research introduces a novel approach for generating a 250 m fractional forest cover product with an 8-day temporal resolution based on the updated GLASS FVC product and the annualized MODIS VCF product, thereby facilitating the development of a high-quality, long-time-series forest cover product on a global scale. Validation of the proposed product, employing high spatial resolution GFCC data, demonstrates its high accuracy across various continents and forest cover scenarios globally. It yields an average fit coefficient of determination (R²) of 0.9085 and an average root-mean-square error of 7.22%. Furthermore, to assess the availability and credibility of forest cover data with high temporal resolution, this study integrates the CCDC algorithm to map forest disturbances and quantify the yearly and even monthly disturbed trace area within two sub-study areas of the Amazon region. The achieved sample validation accuracy is over 86%, which substantiates the reliability of the data. This investigation offers a fresh perspective on monitoring forest changes and observing forest disturbances by amalgamating data from diverse sources, enabling the mapping of dynamic forest cover over an extensive time series with high temporal resolution, thereby mitigating data gaps and enhancing the precision of existing products. Full article

(This article belongs to the Special Issue Dynamic Monitoring of Forest Resources Based on Multi-source Remote Sensing Data)

► Show Figures

Figure 1

14 pages, 3915 KiB

Open AccessArticle

Sound Identification Method for Gas and Coal Dust Explosions Based on MLP

by Xingchen Yu and Xiaowei Li

Entropy 2023, 25(8), 1184; https://doi.org/10.3390/e25081184 - 9 Aug 2023

Viewed by 992

Abstract

To solve the problems of backward gas and coal dust explosion alarm technology and single monitoring means in coal mines, and to improve the accuracy of gas and coal dust explosion identification in coal mines, a sound identification method for gas and coal [...] Read more.

To solve the problems of backward gas and coal dust explosion alarm technology and single monitoring means in coal mines, and to improve the accuracy of gas and coal dust explosion identification in coal mines, a sound identification method for gas and coal dust explosions based on MLP in coal mines is proposed, and the distributions of the mean value of the short-time energy, zero crossing rate, spectral centroid, spectral spread, roll-off, 16-dimensional time-frequency features, MFCC, GFCC, short-time Fourier coefficients of gas explosion sound, coal dust sound, and other underground sounds were analyzed. In order to select the most suitable feature vector to characterize the sound signal, the best feature extraction model of the Relief algorithm was established, and the cross-entropy distribution of the MLP model trained with the different numbers of feature values was analyzed. In order to further optimize the feature value selection, the recognition results of the recognition models trained with the different numbers of sound feature values were compared, and the first 35-dimensional feature values were finally determined as the feature vector to characterize the sound signal. The feature vectors are input into the MLP to establish the sound recognition model of coal mine gas and coal dust explosion. An analysis of the feature extraction, optimal feature extraction, model training, and time consumption for model recognition during the model establishment process shows that the proposed algorithm has high computational efficiency and meets the requirement of the real-time coal mine safety monitoring and alarm system. From the results of recognition experiments, the sound recognition algorithm can distinguish each kind of sound involved in the experiments more accurately. The average recognition rate, recall rate, and accuracy rate of the model can reach 95%, 95%, and 95.8%, respectively, which is obviously better than the comparison algorithm and can meet the requirements of coal mine gas and coal dust explosion sensing and alarming. Full article

(This article belongs to the Special Issue Entropy and Information Theory in Acoustics III)

► Show Figures

Figure 1

22 pages, 6530 KiB

Open AccessArticle

Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features

by Yara Zayed, Ahmad Hasasneh and Chakib Tadj

Diagnostics 2023, 13(12), 2107; https://doi.org/10.3390/diagnostics13122107 - 19 Jun 2023

Cited by 7 | Viewed by 3375

Abstract

Early diagnosis of medical conditions in infants is crucial for ensuring timely and effective treatment. However, infants are unable to verbalize their symptoms, making it difficult for healthcare professionals to accurately diagnose their conditions. Crying is often the only way for infants to [...] Read more.

Early diagnosis of medical conditions in infants is crucial for ensuring timely and effective treatment. However, infants are unable to verbalize their symptoms, making it difficult for healthcare professionals to accurately diagnose their conditions. Crying is often the only way for infants to communicate their needs and discomfort. In this paper, we propose a medical diagnostic system for interpreting infants’ cry audio signals (CAS) using a combination of different audio domain features and deep learning (DL) algorithms. The proposed system utilizes a dataset of labeled audio signals from infants with specific pathologies. The dataset includes two infant pathologies with high mortality rates, neonatal respiratory distress syndrome (RDS), sepsis, and crying. The system employed the harmonic ratio (HR) as a prosodic feature, the Gammatone frequency cepstral coefficients (GFCCs) as a cepstral feature, and image-based features through the spectrogram which are extracted using a convolution neural network (CNN) pretrained model and fused with the other features to benefit multiple domains in improving the classification rate and the accuracy of the model. The different combination of the fused features is then fed into multiple machine learning algorithms including random forest (RF), support vector machine (SVM), and deep neural network (DNN) models. The evaluation of the system using the accuracy, precision, recall, F1-score, confusion matrix, and receiver operating characteristic (ROC) curve, showed promising results for the early diagnosis of medical conditions in infants based on the crying signals only, where the system achieved the highest accuracy of 97.50% using the combination of the spectrogram, HR, and GFCC through the deep learning process. The finding demonstrated the importance of fusing different audio features, especially the spectrogram, through the learning process rather than a simple concatenation and the use of deep learning algorithms in extracting sparsely represented features that can be used later on in the classification problem, which improves the separation between different infants’ pathologies. The results outperformed the published benchmark paper by improving the classification problem to be multiclassification (RDS, sepsis, and healthy), investigating a new type of feature, which is the spectrogram, using a new feature fusion technique, which is fusion, through the learning process using the deep learning model. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

24 pages, 4423 KiB

Open AccessArticle

Using CCA-Fused Cepstral Features in a Deep Learning-Based Cry Diagnostic System for Detecting an Ensemble of Pathologies in Newborns

by Zahra Khalilzad and Chakib Tadj

Diagnostics 2023, 13(5), 879; https://doi.org/10.3390/diagnostics13050879 - 24 Feb 2023

Cited by 6 | Viewed by 1645

Abstract

Crying is one of the means of communication for a newborn. Newborn cry signals convey precious information about the newborn’s health condition and their emotions. In this study, cry signals of healthy and pathologic newborns were analyzed for the purpose of developing an [...] Read more.

Crying is one of the means of communication for a newborn. Newborn cry signals convey precious information about the newborn’s health condition and their emotions. In this study, cry signals of healthy and pathologic newborns were analyzed for the purpose of developing an automatic, non-invasive, and comprehensive Newborn Cry Diagnostic System (NCDS) that identifies pathologic newborns from healthy infants. For this purpose, Mel-frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) were extracted as features. These feature sets were also combined and fused through Canonical Correlation Analysis (CCA), which provides a novel manipulation of the features that have not yet been explored in the literature on NCDS designs, to the best of our knowledge. All the mentioned feature sets were fed to the Support Vector Machine (SVM) and Long Short-term Memory (LSTM). Furthermore, two Hyperparameter optimization methods, Bayesian and grid search, were examined to enhance the system’s performance. The performance of our proposed NCDS was evaluated with two different datasets of inspiratory and expiratory cries. The CCA fusion feature set using the LSTM classifier accomplished the best F-score in the study, with 99.86% for the inspiratory cry dataset. The best F-score regarding the expiratory cry dataset, 99.44%, belonged to the GFCC feature set employing the LSTM classifier. These experiments suggest the high potential and value of using the newborn cry signals in the detection of pathologies. The framework proposed in this study can be implemented as an early diagnostic tool for clinical studies and help in the identification of pathologic newborns. Full article

(This article belongs to the Special Issue Diagnosis of Neonatal Diseases)

► Show Figures

Figure 1

16 pages, 2143 KiB

Open AccessArticle

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

by Wondimu Lambamo, Ramasamy Srinivasagan and Worku Jifara

Appl. Sci. 2023, 13(1), 569; https://doi.org/10.3390/app13010569 - 31 Dec 2022

Cited by 4 | Viewed by 2374

Abstract

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in [...] Read more.

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)

► Show Figures

Figure 1

21 pages, 1647 KiB

Open AccessArticle

Newborn Cry-Based Diagnostic System to Distinguish between Sepsis and Respiratory Distress Syndrome Using Combined Acoustic Features

by Zahra Khalilzad, Ahmad Hasasneh and Chakib Tadj

Diagnostics 2022, 12(11), 2802; https://doi.org/10.3390/diagnostics12112802 - 15 Nov 2022

Cited by 11 | Viewed by 3098

Abstract

Crying is the only means of communication for a newborn baby with its surrounding environment, but it also provides significant information about the newborn’s health, emotions, and needs. The cries of newborn babies have long been known as a biomarker for the diagnosis [...] Read more.

Crying is the only means of communication for a newborn baby with its surrounding environment, but it also provides significant information about the newborn’s health, emotions, and needs. The cries of newborn babies have long been known as a biomarker for the diagnosis of pathologies. However, to the best of our knowledge, exploring the discrimination of two pathology groups by means of cry signals is unprecedented. Therefore, this study aimed to identify septic newborns with Neonatal Respiratory Distress Syndrome (RDS) by employing the Machine Learning (ML) methods of Multilayer Perceptron (MLP) and Support Vector Machine (SVM). Furthermore, the cry signal was analyzed from the following two different perspectives: 1) the musical perspective by studying the spectral feature set of Harmonic Ratio (HR), and 2) the speech processing perspective using the short-term feature set of Gammatone Frequency Cepstral Coefficients (GFCCs). In order to assess the role of employing features from both short-term and spectral modalities in distinguishing the two pathology groups, they were fused in one feature set named the combined features. The hyperparameters (HPs) of the implemented ML approaches were fine-tuned to fit each experiment. Finally, by normalizing and fusing the features originating from the two modalities, the overall performance of the proposed design was improved across all evaluation measures, achieving accuracies of 92.49% and 95.3% by the MLP and SVM classifiers, respectively. The MLP classifier was outperformed in terms of all evaluation measures presented in this study, except for the Area Under Curve of Receiver Operator Characteristics (AUC-ROC), which signifies the ability of the proposed design in class separation. The achieved results highlighted the role of combining features from different levels and modalities for a more powerful analysis of the cry signals, as well as including a neural network (NN)-based classifier. Consequently, attaining a 95.3% accuracy for the separation of two entangled pathology groups of RDS and sepsis elucidated the promising potential for further studies with larger datasets and more pathology groups. Full article

(This article belongs to the Special Issue Artificial Intelligence in Clinical Medical Imaging Analysis)

► Show Figures

Figure 1

19 pages, 12444 KiB

Open AccessArticle

Detection of COVID-19 from Deep Breathing Sounds Using Sound Spectrum with Image Augmentation and Deep Learning Techniques

by Olusola O. Abayomi-Alli, Robertas Damaševičius, Aaqif Afzaal Abbasi and Rytis Maskeliūnas

Electronics 2022, 11(16), 2520; https://doi.org/10.3390/electronics11162520 - 11 Aug 2022

Cited by 8 | Viewed by 2717

Abstract

The COVID-19 pandemic is one of the most disruptive outbreaks of the 21st century considering its impacts on our freedoms and social lifestyle. Several methods have been used to monitor and diagnose this virus, which includes the use of RT-PCR test and chest [...] Read more.

The COVID-19 pandemic is one of the most disruptive outbreaks of the 21st century considering its impacts on our freedoms and social lifestyle. Several methods have been used to monitor and diagnose this virus, which includes the use of RT-PCR test and chest CT/CXR scans. Recent studies have employed various crowdsourced sound data types such as coughing, breathing, sneezing, etc., for the detection of COVID-19. However, the application of artificial intelligence methods and machine learning algorithms on these sound datasets still suffer some limitations such as the poor performance of the test results due to increase of misclassified data, limited datasets resulting in the overfitting of deep learning methods, the high computational cost of some augmentation models, and varying quality feature-extracted images resulting in poor reliability. We propose a simple yet effective deep learning model, called DeepShufNet, for COVID-19 detection. A data augmentation method based on the color transformation and noise addition was used for generating synthetic image datasets from sound data. The efficiencies of the synthetic dataset were evaluated using two feature extraction approaches, namely Mel spectrogram and GFCC. The performance of the proposed DeepShufNet model was evaluated using a deep breathing COSWARA dataset, which shows improved performance with a lower misclassification rate of the minority class. The proposed model achieved an accuracy, precision, recall, specificity, and f-score of 90.1%, 77.1%, 62.7%, 95.98%, and 69.1%, respectively, for positive COVID-19 detection using the Mel COCOA-2 augmented training datasets. The proposed model showed an improved performance compared to some of the state-of-the-art-methods. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) for Image Processing)

► Show Figures

Figure 1

30 pages, 10696 KiB

Open AccessArticle

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation

by Mohammad Al-Qaderi, Elfituri Lahamer and Ahmad Rad

Sensors 2021, 21(15), 5097; https://doi.org/10.3390/s21155097 - 28 Jul 2021

Cited by 11 | Viewed by 2678

Abstract

We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances [...] Read more.

We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a discriminative model—support vector machine (SVM) classifiers as well as prosodic features and short-term spectral features to concurrently classify a speaker’s gender and his/her identity. The proposed architecture works in a semi-sequential manner consisting of two stages: the first classifier exploits the prosodic features to determine the speaker’s gender which in turn is used with the short-term spectral features as inputs to the second classifier system in order to identify the speaker. The second classifier system employs two types of short-term spectral features; namely mel-frequency cepstral coefficients (MFCC) and gammatone frequency cepstral coefficients (GFCC) as well as gender information as inputs to two different classifiers (GMM and GMM supervector-based SVM) which in total leads to construction of four classifiers. The outputs from the second stage classifiers; namely GMM-MFCC maximum likelihood classifier (MLC), GMM-GFCC MLC, GMM-MFCC supervector SVM, and GMM-GFCC supervector SVM are fused at score level by the weighted Borda count approach. The weight factors are computed on the fly via Mamdani fuzzy inference system that its inputs are the signal to noise ratio and the length of utterance. Experimental evaluations suggest that the proposed architecture and the fusion framework are promising and can improve the recognition performance of the system in challenging environments where the signal-to-noise ratio is low, and the length of utterance is short; such scenarios often arise in social robot interactions with humans. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

18 pages, 4543 KiB

Open AccessArticle

An Underwater Acoustic Target Recognition Method Based on Restricted Boltzmann Machine

by Xinwei Luo and Yulin Feng

Sensors 2020, 20(18), 5399; https://doi.org/10.3390/s20185399 - 21 Sep 2020

Cited by 27 | Viewed by 2989

Abstract

This article focuses on an underwater acoustic target recognition method based on target radiated noise. The difficulty of underwater acoustic target recognition is mainly the extraction of effective classification features and pattern classification. Traditional feature extraction methods based on Low Frequency Analysis Recording [...] Read more.

This article focuses on an underwater acoustic target recognition method based on target radiated noise. The difficulty of underwater acoustic target recognition is mainly the extraction of effective classification features and pattern classification. Traditional feature extraction methods based on Low Frequency Analysis Recording (LOFAR), Mel-Frequency Cepstral Coefficients (MFCC), Gammatone-Frequency Cepstral Coefficients (GFCC), etc. essentially compress data according to a certain pre-set model, artificially discarding part of the information in the data, and often losing information helpful for classification. This paper presents a target recognition method based on feature auto-encoding. This method takes the normalized frequency spectrum of the signal as input, uses a restricted Boltzmann machine to perform unsupervised automatic encoding of the data, extracts the deep data structure layer by layer, and classifies the acquired features through the BP neural network. This method was tested using actual ship radiated noise database, and the results show that proposed classification system has better recognition accuracy and adaptability than the hand-crafted feature extraction based method. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

18 pages, 2718 KiB

Open AccessArticle

Individual Violin Recognition Method Combining Tonal and Nontonal Features

by Qi Wang and Changchun Bao

Electronics 2020, 9(6), 950; https://doi.org/10.3390/electronics9060950 - 8 Jun 2020

Cited by 2 | Viewed by 2435

Abstract

Individual recognition among instruments of the same type is a challenging problem and it has been rarely investigated. In this study, the individual recognition of violins is explored. Based on the source–filter model, the spectrum can be divided into tonal content and nontonal [...] Read more.

Individual recognition among instruments of the same type is a challenging problem and it has been rarely investigated. In this study, the individual recognition of violins is explored. Based on the source–filter model, the spectrum can be divided into tonal content and nontonal content, which reflects the timbre from complementary aspects. The tonal/nontonal gammatone frequency cepstral coefficients (GFCC) are combined to describe the corresponding spectrum contents in this study. In the recognition system, Gaussian mixture models–universal background model (GMM–UBM) is employed to parameterize the distribution of the combined features. In order to evaluate the recognition task of violin individuals, a solo dataset including 86 violins is developed in this study. Compared with other features, the combined features show a better performance in both individual violin recognition and violin grade classification. Experimental results also show the GMM–UBM outperforms the CNN, especially when the training data are limited. Finally, the effect of players on the individual violin recognition is investigated. Full article

(This article belongs to the Special Issue Recent Advances in Multimedia Signal Processing and Communications)

► Show Figures

Figure 1

17 pages, 3438 KiB

Open AccessArticle

Underwater Acoustic Target Recognition: A Combination of Multi-Dimensional Fusion Features and Modified Deep Neural Network

by Xingmei Wang, Anhua Liu, Yu Zhang and Fuzhao Xue

Remote Sens. 2019, 11(16), 1888; https://doi.org/10.3390/rs11161888 - 13 Aug 2019

Cited by 64 | Viewed by 4684

Abstract

A method with a combination of multi-dimensional fusion features and a modified deep neural network (MFF-MDNN) is proposed to recognize underwater acoustic targets in this paper. Specifically, due to the complex and changeable underwater environment, it is difficult to describe underwater acoustic signals [...] Read more.

A method with a combination of multi-dimensional fusion features and a modified deep neural network (MFF-MDNN) is proposed to recognize underwater acoustic targets in this paper. Specifically, due to the complex and changeable underwater environment, it is difficult to describe underwater acoustic signals with a single feature. The Gammatone frequency cepstral coefficient (GFCC) and modified empirical mode decomposition (MEMD) are developed to extract multi-dimensional features in this paper. Moreover, to ensure the same time dimension, a dimension reduction method is proposed to obtain multi-dimensional fusion features in the original underwater acoustic signals. Then, to reduce redundant features and further improve recognition accuracy, the Gaussian mixture model (GMM) is used to modify the structure of a deep neural network (DNN). Finally, the proposed underwater acoustic target recognition method can obtain an accuracy of 94.3% under a maximum of 800 iterations when the dataset has underwater background noise with weak targets. Compared with other methods, the recognition results demonstrate that the proposed method has higher accuracy and strong adaptability. Full article

(This article belongs to the Section Ocean Remote Sensing)

► Show Figures

Graphical abstract

13 pages, 2650 KiB

Open AccessArticle

Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients

by Mohamed Tamazin, Ahmed Gouda and Mohamed Khedr

Appl. Sci. 2019, 9(10), 2166; https://doi.org/10.3390/app9102166 - 27 May 2019

Cited by 22 | Viewed by 4280

Abstract

Many new consumer applications are based on the use of automatic speech recognition (ASR) systems, such as voice command interfaces, speech-to-text applications, and data entry processes. Although ASR systems have remarkably improved in recent decades, the speech recognition system performance still significantly degrades [...] Read more.

Many new consumer applications are based on the use of automatic speech recognition (ASR) systems, such as voice command interfaces, speech-to-text applications, and data entry processes. Although ASR systems have remarkably improved in recent decades, the speech recognition system performance still significantly degrades in the presence of noisy environments. Developing a robust ASR system that can work in real-world noise and other acoustic distorting conditions is an attractive research topic. Many advanced algorithms have been developed in the literature to deal with this problem; most of these algorithms are based on modeling the behavior of the human auditory system with perceived noisy speech. In this research, the power-normalized cepstral coefficient (PNCC) system is modified to increase robustness against the different types of environmental noises, where a new technique based on gammatone channel filtering combined with channel bias minimization is used to suppress the noise effects. The TIDIGITS database is utilized to evaluate the performance of the proposed system in comparison to the state-of-the-art techniques in the presence of additive white Gaussian noise (AWGN) and seven different types of environmental noises. In this research, one word is recognized from a set containing 11 possibilities only. The experimental results showed that the proposed method provides significant improvements in the recognition accuracy at low signal to noise ratios (SNR). In the case of subway noise at SNR = 5 dB, the proposed method outperforms the mel-frequency cepstral coefficient (MFCC) and relative spectral (RASTA)–perceptual linear predictive (PLP) methods by 55% and 47%, respectively. Moreover, the recognition rate of the proposed method is higher than the gammatone frequency cepstral coefficient (GFCC) and PNCC methods in the case of car noise. It is enhanced by 40% in comparison to the GFCC method at SNR 0dB, while it is improved by 20% in comparison to the PNCC method at SNR −5dB. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

625 KiB

Open AccessArticle

A Vocal-Based Analytical Method for Goose Behaviour Recognition

by Kim Arild Steen, Ole Roland Therkildsen, Henrik Karstoft and Ole Green

Sensors 2012, 12(3), 3773-3788; https://doi.org/10.3390/s120303773 - 21 Mar 2012

Cited by 21 | Viewed by 8617

Abstract

Since human-wildlife conflicts are increasing, the development of cost-effective methods for reducing damage or conflict levels is important in wildlife management. A wide range of devices to detect and deter animals causing conflict are used for this purpose, although their effectiveness is often [...] Read more.

Since human-wildlife conflicts are increasing, the development of cost-effective methods for reducing damage or conflict levels is important in wildlife management. A wide range of devices to detect and deter animals causing conflict are used for this purpose, although their effectiveness is often highly variable, due to habituation to disruptive or disturbing stimuli. Automated recognition of behaviours could form a critical component of a system capable of altering the disruptive stimuli to avoid this. In this paper we present a novel method to automatically recognise goose behaviour based on vocalisations from flocks of free-living barnacle geese (Branta leucopsis). The geese were observed and recorded in a natural environment, using a shielded shotgun microphone. The classification used Support Vector Machines (SVMs), which had been trained with labeled data. Greenwood Function Cepstral Coefficients (GFCC) were used as features for the pattern recognition algorithm, as they can be adjusted to the hearing capabilities of different species. Three behaviours are classified based in this approach, and the method achieves a good recognition of foraging behaviour (86–97% sensitivity, 89–98% precision) and a reasonable recognition of flushing (79–86%, 66–80%) and landing behaviour(73–91%, 79–92%). The Support Vector Machine has proven to be a robust classifier for this kind of classification, as generality and non-linearcapabilities are important. We conclude that vocalisations can be used to automatically detect behaviour of conflict wildlife species, and as such, may be used as an integrated part of awildlife management system. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

239 KiB

Open AccessArticle

A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

by Yao Ren, Michael T. Johnson, Patrick J. Clemins, Michael Darre, Sharon Stuart Glaeser, Tomasz S. Osiejuk and Ebenezer Out-Nyarko

Algorithms 2009, 2(4), 1410-1428; https://doi.org/10.3390/a2041410 - 18 Nov 2009

Cited by 42 | Viewed by 12319

Abstract

Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility [...] Read more.

Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks. Full article

► Show Figures

Figure 1

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI