Submit to Special Issue Submit Abstract to Special Issue Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Related Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 25 March 2025 | Viewed by 6342

Share This Special Issue

Special Issue Editors

Dr. Mara Pistellato

E-Mail Website
Guest Editor

Department of Environmental Sciences, Informatics and Statistics, Ca’ Foscari University of Venice, 30170 Venice, Italy
Interests: computer vision; 3D reconstruction; machine learning; deep learning

Prof. Dr. Byung-Gyu Kim

E-Mail Website
Guest Editor

Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of Korea
Interests: image/video signal processing; pattern recognition; computer vision; deep learning; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recently, the entire field of signal processing has been facing new challenges and paradigm shifts due to the dramatic improvement of computational performance in hardware and an exponential increase in devices interconnected via the Internet. As a consequence, the tremendous data volume generated by such applications have to be analyzed and processed to provide useful, reliable and meaningful information.

Artificial intelligence (AI), and in particular machine (deep) learning, provides novel tools to be exploited in the field of signal processing. Consequently, new approaches, methods, theories, and tools have to be developed by the signal processing community to analyze and account for generated data volumes.

The Special Issue aims at attracting manuscripts presenting novel methods and innovative applications of AI and machine learning (including deep learning) on topics in the signal processing area. Such topics include (but are not limited to) multimedia systems, audio and video processing, and augmented and virtual reality. The objective of the Special Issue is to bring together recent high-quality works in AI to promote key advances in signal processing areas covered by the journal and to provide reviews of the state of the art in these emerging domains.

Dr. Mara Pistellato
Prof. Dr. Byung-Gyu Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Artificial Intelligence (AI)
deep learning
machine learning
signal processing
image and video processing
audio and acoustic signal processing
biomedical signal processing
speech processing
multimedia signal processing
multidimensional signal processing
augmented reality
virtual reality

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

AI, Machine Learning and Deep Learning in Signal Processing in Applied Sciences (38 articles)

Published Papers (9 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

17 pages, 2458 KiB

Open AccessArticle

Data Augmentation Method Using Room Transfer Function for Monitoring of Domestic Activities

by Minhan Kim and Seokjin Lee

Appl. Sci. 2024, 14(21), 9644; https://doi.org/10.3390/app14219644 - 22 Oct 2024

Abstract

Monitoring domestic activities helps us to understand user behaviors in indoor environments, which has garnered interest as it aids in understanding human activities in context-aware computing. In the field of acoustics, this goal has been achieved through studies employing machine learning techniques, which are widely used for classification tasks involving sound recognition and other objectives. Machine learning typically achieves better performance with large amounts of high-quality training data. Given the high cost of data collection, development datasets often suffer from imbalanced data or lack high-quality samples, leading to performance degradations in machine learning models. The present study aims to address this data issue through data augmentation techniques. Specifically, since the proposed method targets indoor activities in domestic activity detection, room transfer functions were used for data augmentation. The results show that the proposed method achieves a 0.59% improvement in the F1-Score (micro) from that of the baseline system for the development dataset. Additionally, test data including microphones that were not used during training achieved an F1-Score improvement of 0.78% over that of the baseline system. This demonstrates the enhanced model generalization performance of the proposed method on samples having different room transfer functions to those of the trained dataset. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

15 pages, 11845 KiB

Open AccessArticle

Situational Awareness Classification Based on EEG Signals and Spiking Neural Network

by Yakir Hadad, Moshe Bensimon, Yehuda Ben-Shimol and Shlomo Greenberg

Appl. Sci. 2024, 14(19), 8911; https://doi.org/10.3390/app14198911 - 3 Oct 2024

Viewed by 521

Abstract

Situational awareness detection and characterization of mental states have a vital role in medicine and many other fields. An electroencephalogram (EEG) is one of the most effective tools for identifying and analyzing cognitive stress. Yet, the measurement, interpretation, and classification of EEG sensors is a challenging task. This study introduces a novel machine learning-based approach to assist in evaluating situational awareness detection using EEG signals and spiking neural networks (SNNs) based on a unique spike continuous-time neuron (SCTN). The implemented biologically inspired SNN architecture is used for effective EEG feature extraction by applying time–frequency analysis techniques and allows adept detection and analysis of the various frequency components embedded in the different EEG sub-bands. The EEG signal undergoes encoding into spikes and is then fed into an SNN model which is well suited to the serial sequence order of the EEG data. We utilize the SCTN-based resonator for EEG feature extraction in the frequency domain which demonstrates high correlation with the classical FFT features. A new SCTN-based 2D neural network is introduced for efficient EEG feature mapping, aiming to achieve a spatial representation of each EEG sub-band. To validate and evaluate the performance of the proposed approach, a common, publicly available EEG dataset is used. The experimental results show that by using the extracted EEG frequencies features and the SCTN-based SNN classifier, the mental state can be accurately classified with an average accuracy of 96.8% for the common EEG dataset. Our proposed method outperforms existing machine learning-based methods and demonstrates the advantages of using SNNs for situational awareness detection and mental state classifications. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

18 pages, 2376 KiB

Open AccessArticle

Markov-Modulated Poisson Process Modeling for Machine-to-Machine Heterogeneous Traffic

by Ahmad Hani El Fawal, Ali Mansour and Abbass Nasser

Appl. Sci. 2024, 14(18), 8561; https://doi.org/10.3390/app14188561 - 23 Sep 2024

Viewed by 573

Abstract

Theoretical mathematics is a key evolution factor of artificial intelligence (AI). Nowadays, representing a smart system as a mathematical model helps to analyze any system under development and supports different case studies found in real life. Additionally, the Markov chain has shown itself to be an invaluable tool for decision-making systems, natural language processing, and predictive modeling. In an Internet of Things (IoT), Machine-to-Machine (M2M) traffic necessitates new traffic models due to its unique pattern and different goals. In this context, we have two types of modeling: (1) source traffic modeling, used to design stochastic processes so that they match the behavior of physical quantities of measured data traffic (e.g., video, data, voice), and (2) aggregated traffic modeling, which refers to the process of combining multiple small packets into a single packet in order to reduce the header overhead in the network. In IoT studies, balancing the accuracy of the model while managing a large number of M2M devices is a heavy challenge for academia. One the one hand, source traffic models are more competitive than aggregated traffic models because of their dependability. However, their complexity is expected to make managing the exponential growth of M2M devices difficult. In this paper, we propose to use a Markov-Modulated Poisson Process (MMPP) framework to explore Human-to-Human (H2H) traffic and M2M heterogeneous traffic effects. As a tool for stochastic processes, we employ Markov chains to characterize the coexistence of H2H and M2M traffic. Using the traditional evolved Node B (eNodeB), our simulation results show that the network’s service completion rate will suffer significantly. In the worst-case scenario, when an accumulative storm of M2M requests attempts to access the network simultaneously, the degradation reaches 8% as a completion task rate. However, using our “Coexistence of Heterogeneous traffic Analyzer and Network Architecture for Long term evolution” (CHANAL) solution, we can achieve a service completion rate of 96%. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

26 pages, 7340 KiB

Open AccessArticle

Versatile Video Coding-Post Processing Feature Fusion: A Post-Processing Convolutional Neural Network with Progressive Feature Fusion for Efficient Video Enhancement

by Tanni Das, Xilong Liang and Kiho Choi

Appl. Sci. 2024, 14(18), 8276; https://doi.org/10.3390/app14188276 - 13 Sep 2024

Viewed by 629

Abstract

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such as blockiness, blurriness, and ringing, which can detract from the viewer’s experience. To ensure a seamless and engaging video experience, it is essential to remove these artifacts, which improves viewer comfort and engagement. In this paper, we propose a deep feature fusion based convolutional neural network (CNN) architecture (VVC-PPFF) for post-processing approach to further enhance the performance of VVC. The proposed network, VVC-PPFF, harnesses the power of CNNs to enhance decoded frames, significantly improving the coding efficiency of the state-of-the-art VVC video coding standard. By combining deep features from early and later convolution layers, the network learns to extract both low-level and high-level features, resulting in more generalized outputs that adapt to different quantization parameter (QP) values. The proposed VVC-PPFF network achieves outstanding performance, with Bjøntegaard Delta Rate (BD-Rate) improvements of 5.81% and 6.98% for luma components in random access (RA) and low-delay (LD) configurations, respectively, while also boosting peak signal-to-noise ratio (PSNR). Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

19 pages, 10886 KiB

Open AccessArticle

Advancing Nighttime Object Detection through Image Enhancement and Domain Adaptation

by Chenyuan Zhang and Deokwoo Lee

Appl. Sci. 2024, 14(18), 8109; https://doi.org/10.3390/app14188109 - 10 Sep 2024

Viewed by 642

Abstract

Due to the lack of annotations for nighttime low-light images, object detection in low-light images has always been a challenging problem. Achieving high-precision results at night is also an issue. Additionally, we aim to use a single nighttime dataset to complete the knowledge distillation task while improving the detection accuracy of object detection models under nighttime low-light conditions and reducing the computational cost of the model, especially for small targets and objects contaminated by special nighttime lighting. This paper proposes a Nighttime Unsupervised Domain Adaptation Network (NUDN) based on knowledge distillation to address these issues. To improve the detection accuracy of nighttime images, high-confidence bounding box predictions from the teacher and region proposals from the student are first fused, allowing the teacher to perform better in subsequent training, thus generating a combination of high-confidence and low-confidence pseudo-labels. This combination of feature information is used to guide model training, enabling the model to extract feature information similar to that of source images in nighttime low-light images. Nighttime images and pseudo-labels undergo random size transformations before being used as input for the student, enhancing the model’s generalization across different scales. To address the scarcity of nighttime datasets, we propose a nighttime-specific augmentation pipeline called LightImg. This pipeline enhances nighttime features, transforming them into daytime features and reducing issues such as backlighting, uneven illumination, and dim nighttime light, enabling cross-domain research using existing nighttime datasets. Our experimental results show that NUDN can significantly improve nighttime low-light object detection accuracy on the SHIFT and ExDark datasets. We conduct extensive experiments and ablation studies to demonstrate the effectiveness and efficiency of our work. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

13 pages, 5330 KiB

Open AccessArticle

ISAR Imaging Analysis of Complex Aerial Targets Based on Deep Learning

by Yifeng Wang, Jiaxing Hao, Sen Yang and Hongmin Gao

Appl. Sci. 2024, 14(17), 7708; https://doi.org/10.3390/app14177708 - 31 Aug 2024

Viewed by 683

Abstract

Traditional range–instantaneous Doppler (RID) methods for maneuvering target imaging are hindered by issues related to low resolution and inadequate noise suppression. To address this, we propose a novel ISAR imaging method enhanced by deep learning, which incorporates the fundamental architecture of CapsNet along with two additional convolutional layers. Pre-training is conducted through the deep learning network to establish the mapping function for reference. Subsequently, the trained network is integrated into the electromagnetic simulation software, Feko 2019, utilizing a combination of geometric forms such as corner reflectors and Luneberg spheres for analysis. The results indicate that the derived ISAR imaging effectively identifies the ISAR program associated with complex aerial targets. A thorough analysis of the imaging results further corroborates the effectiveness and superiority of this approach. Both simulation and empirical data demonstrate that this method significantly enhances imaging resolution and noise suppression. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

20 pages, 5395 KiB

Open AccessArticle

Detection and Segmentation of Mouth Region in Stereo Stream Using YOLOv6 and DeepLab v3+ Models for Computer-Aided Speech Diagnosis in Children

by Agata Sage and Pawel Badura

Appl. Sci. 2024, 14(16), 7146; https://doi.org/10.3390/app14167146 - 14 Aug 2024

Cited by 1 | Viewed by 743

Abstract

This paper describes a multistage framework for face image analysis in computer-aided speech diagnosis and therapy. Multimodal data processing frameworks have become a significant factor in supporting speech disorders’ treatment. Synchronous and asynchronous remote speech therapy approaches can use audio and video analysis of articulation to deliver robust indicators of disordered speech. Accurate segmentation of articulators in video frames is a vital step in this agenda. We use a dedicated data acquisition system to capture the stereovision stream during speech therapy examination in children. Our goal is to detect and accurately segment four objects in the mouth area (lips, teeth, tongue, and whole mouth) during relaxed speech and speech therapy exercises. Our database contains 17,913 frames from 76 preschool children. We apply a sequence of procedures employing artificial intelligence. For detection, we train the YOLOv6 (you only look once) model to catch each of the three objects under consideration. Then, we prepare the DeepLab v3+ segmentation model in a semi-supervised training mode. As preparation of reliable expert annotations is exhausting in video labeling, we first train the network using weak labels produced by initial segmentation based on the distance-regularized level set evolution over fuzzified images. Next, we fine-tune the model using a portion of manual ground-truth delineations. Each stage is thoroughly assessed using the independent test subset. The lips are detected almost perfectly (average precision and F1 score of 0.999), whereas the segmentation Dice index exceeds 0.83 in each articulator, with a top result of 0.95 in the whole mouth. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

15 pages, 2056 KiB

Open AccessArticle

Robust DOA Estimation Using Multi-Scale Fusion Network with Attention Mask

by Yuting Yan and Qinghua Huang

Appl. Sci. 2024, 14(11), 4488; https://doi.org/10.3390/app14114488 - 24 May 2024

Viewed by 670

Abstract

To overcome the limitations of traditional methods in reverberant and noisy environments, a robust multi-scale fusion neural network with attention mask is designed to improve direction-of-arrival (DOA) estimation accuracy for acoustic sources. It combines the benefits of deep learning and complex-valued operations to effectively deal with the interference of reverberation and noise in speech signals. The unique properties of complex-valued signals are exploited to fully capture inherent features and rich information is preserved in the complex field. An attention mask module is designed to generate distinct masks for selectively focusing and masking based on the input. After that, the multi-scale fusion block efficiently captures multi-scale spatial features by stacking complex-valued convolutional layers with small size kernels, and reduces the module complexity through special branching operations. Experimental results demonstrate that the model achieves significant improvements over other methods for speaker localization in reverberant and noisy environments. It provides a new solution for DOA estimation for acoustic sources in different scenarios, which has significant theoretical and practical implications. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

Review

Jump to: Research

49 pages, 3154 KiB

Open AccessReview

An Investigation into the Utilisation of CNN with LSTM for Video Deepfake Detection

by Sarah Tipper, Hany F. Atlam and Harjinder Singh Lallie

Appl. Sci. 2024, 14(21), 9754; https://doi.org/10.3390/app14219754 - 25 Oct 2024

Abstract

Video deepfake detection has emerged as a critical field within the broader domain of digital technologies driven by the rapid proliferation of AI-generated media and the increasing threat of its misuse for deception and misinformation. The integration of Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) has proven to be a promising approach for improving video deepfake detection, achieving near-perfect accuracy. CNNs enable the effective extraction of spatial features from video frames, such as facial textures and lighting, while LSTM analyses temporal patterns, detecting inconsistencies over time. This hybrid model enhances the ability to detect deepfakes by combining spatial and temporal analysis. However, the existing research lacks systematic evaluations that comprehensively assess their effectiveness and optimal configurations. Therefore, this paper provides a comprehensive review of video deepfake detection techniques utilising hybrid CNN-LSTM models. It systematically investigates state-of-the-art techniques, highlighting common feature extraction approaches and widely used datasets for training and testing. This paper also evaluates model performance across different datasets, identifies key factors influencing detection accuracy, and explores how CNN-LSTM models can be optimised. It also compares CNN-LSTM models with non-LSTM approaches, addresses implementation challenges, and proposes solutions for them. Lastly, open issues and future research directions of video deepfake detection using CNN-LSTM will be discussed. This paper provides valuable insights for researchers and cyber security professionals by reviewing CNN-LSTM models for video deepfake detection contributing to the advancement of robust and effective deepfake detection systems. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Journal Menu

Journal Browser

AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (9 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI