Journal of Imaging

15 pages, 3743 KiB

Open AccessArticle

Blink Detection Using 3D Convolutional Neural Architectures and Analysis of Accumulated Frame Predictions

by George Nousias, Konstantinos K. Delibasis and Georgios Labiris

J. Imaging 2025, 11(1), 27; https://doi.org/10.3390/jimaging11010027 - 19 Jan 2025

Blink detection is considered a useful indicator both for clinical conditions and drowsiness state. In this work, we propose and compare deep learning architectures for the task of detecting blinks in video frame sequences. The first step is the training and application of [...] Read more.

Blink detection is considered a useful indicator both for clinical conditions and drowsiness state. In this work, we propose and compare deep learning architectures for the task of detecting blinks in video frame sequences. The first step is the training and application of an eye detector that extracts the eye regions from each video frame. The cropped eye regions are organized as three-dimensional (3D) input with the third dimension spanning time of 300 ms. Two different 3D convolutional neural networks are utilized (a simple 3D CNN and 3D ResNet), as well as a 3D autoencoder combined with a classifier coupled to the latent space. Finally, we propose the usage of a frame prediction accumulator combined with morphological processing and watershed segmentation to detect blinks and determine their start and stop frame in previously unseen videos. The proposed framework was trained on ten (9) different participants and tested on five (8) different ones, with a total of 162,400 frames and 1172 blinks for each eye. The start and end frame of each blink in the dataset has been annotate by specialized ophthalmologist. Quantitative comparison with state-of-the-art blink detection methodologies provide favorable results for the proposed neural architectures coupled with the prediction accumulator, with the 3D ResNet being the best as well as the fastest performer. Full article

(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications)

► Show Figures

Figure 1

17 pages, 7356 KiB

Open AccessArticle

Increasing Neural-Based Pedestrian Detectors’ Robustness to Adversarial Patch Attacks Using Anomaly Localization

by Olga Ilina, Maxim Tereshonok and Vadim Ziyadinov

J. Imaging 2025, 11(1), 26; https://doi.org/10.3390/jimaging11010026 - 17 Jan 2025

Abstract

Object detection in images is a fundamental component of many safety-critical systems, such as autonomous driving, video surveillance systems, and robotics. Adversarial patch attacks, being easily implemented in the real world, provide effective counteraction to object detection by state-of-the-art neural-based detectors. It poses [...] Read more.

Object detection in images is a fundamental component of many safety-critical systems, such as autonomous driving, video surveillance systems, and robotics. Adversarial patch attacks, being easily implemented in the real world, provide effective counteraction to object detection by state-of-the-art neural-based detectors. It poses a serious danger in various fields of activity. Existing defense methods against patch attacks are insufficiently effective, which underlines the need to develop new reliable solutions. In this manuscript, we propose a method which helps to increase the robustness of neural network systems to the input adversarial images. The proposed method consists of a Deep Convolutional Neural Network to reconstruct a benign image from the adversarial one; a Calculating Maximum Error block to highlight the mismatches between input and reconstructed images; a Localizing Anomalous Fragments block to extract the anomalous regions using the Isolation Forest algorithm from histograms of images’ fragments; and a Clustering and Processing block to group and evaluate the extracted anomalous regions. The proposed method, based on anomaly localization, demonstrates high resistance to adversarial patch attacks while maintaining the high quality of object detection. The experimental results show that the proposed method is effective in defending against adversarial patch attacks. Using the YOLOv3 algorithm with the proposed defensive method for pedestrian detection in the INRIAPerson dataset under the adversarial attacks, the mAP50 metric reaches 80.97% compared to 46.79% without a defensive method. The results of the research demonstrate that the proposed method is promising for improvement of object detection systems security. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

23 pages, 22211 KiB

Open AccessArticle

A Local Adversarial Attack with a Maximum Aggregated Region Sparseness Strategy for 3D Objects

by Ling Zhao, Xun Lv, Lili Zhu, Binyan Luo, Hang Cao, Jiahao Cui, Haifeng Li and Jian Peng

J. Imaging 2025, 11(1), 25; https://doi.org/10.3390/jimaging11010025 - 13 Jan 2025

Abstract

The increasing reliance on deep neural network-based object detection models in various applications has raised significant security concerns due to their vulnerability to adversarial attacks. In physical 3D environments, existing adversarial attacks that target object detection (3D-AE) face significant challenges. These attacks often [...] Read more.

The increasing reliance on deep neural network-based object detection models in various applications has raised significant security concerns due to their vulnerability to adversarial attacks. In physical 3D environments, existing adversarial attacks that target object detection (3D-AE) face significant challenges. These attacks often require large and dispersed modifications to objects, making them easily noticeable and reducing their effectiveness in real-world scenarios. To maximize the attack effectiveness, large and dispersed attack camouflages are often employed, which makes the camouflages overly conspicuous and reduces their visual stealth. The core issue is how to use minimal and concentrated camouflage to maximize the attack effect. Addressing this, our research focuses on developing more subtle and efficient attack methods that can better evade detection in practical settings. Based on these principles, this paper proposes a local 3D attack method driven by a Maximum Aggregated Region Sparseness (MARS) strategy. In simpler terms, our approach strategically concentrates the attack modifications to specific areas to enhance effectiveness while maintaining stealth. To maximize the aggregation of attack-camouflaged regions, an aggregation regularization term is designed to constrain the mask aggregation matrix based on the face-adjacency relationships. To minimize the attack camouflage regions, a sparseness regularization is designed to make the mask weights tend toward a U-shaped distribution and limit extreme values. Additionally, neural rendering is used to obtain gradient-propagating multi-angle augmented data and suppress the model’s detection to locate universal critical decision regions from multiple angles. These technical strategies ensure that the adversarial modifications remain effective across different viewpoints and conditions. We test the attack effectiveness of different region selection strategies. On the CARLA dataset, the average attack efficiency of attacking the YOLOv3 and v5 series networks reaches 1.724, which represents an improvement of 0.986 (134%) compared to baseline methods. These results demonstrate a significant enhancement in attack performance, highlighting the potential risks to real-world object detection systems. The experimental results demonstrate that our attack method achieves both stealth and aggressiveness from different viewpoints. Furthermore, we explore the transferability of the decision regions. The results indicate that our method can be effectively combined with different texture optimization methods, with the average precision decreasing by 0.488 and 0.662 across different networks, which indicates a strong attack effectiveness. Full article

► Show Figures

Figure 1

22 pages, 11474 KiB

Open AccessArticle

LittleFaceNet: A Small-Sized Face Recognition Method Based on RetinaFace and AdaFace

by Zhengwei Ren, Xinyu Liu, Jing Xu, Yongsheng Zhang and Ming Fang

J. Imaging 2025, 11(1), 24; https://doi.org/10.3390/jimaging11010024 - 13 Jan 2025

Abstract

For surveillance video management in university laboratories, issues such as occlusion and low-resolution face capture often arise. Traditional face recognition algorithms are typically static and rely heavily on clear images, resulting in inaccurate recognition for low-resolution, small-sized faces. To address the challenges of [...] Read more.

For surveillance video management in university laboratories, issues such as occlusion and low-resolution face capture often arise. Traditional face recognition algorithms are typically static and rely heavily on clear images, resulting in inaccurate recognition for low-resolution, small-sized faces. To address the challenges of occlusion and low-resolution person identification, this paper proposes a new face recognition framework by reconstructing Retinaface-Resnet and combining it with Quality-Adaptive Margin (adaface). Currently, although there are many target detection algorithms, they all require a large amount of data for training. However, datasets for low-resolution face detection are scarce, leading to poor detection performance of the models. This paper aims to solve Retinaface’s weak face recognition capability in low-resolution scenarios and its potential inaccuracies in face bounding box localization when faces are at extreme angles or partially occluded. To this end, Spatial Depth-wise Separable Convolutions are introduced. Retinaface-Resnet is designed for face detection and localization, while adaface is employed to address low-resolution face recognition by using feature norm approximation to estimate image quality and applying an adaptive margin function. Additionally, a multi-object tracking algorithm is used to solve the problem of moving occlusion. Experimental results demonstrate significant improvements, achieving an accuracy of 96.12% on the WiderFace dataset and a recognition accuracy of 84.36% in practical laboratory applications. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

20 pages, 7090 KiB

Open AccessArticle

An Infrared and Visible Image Alignment Method Based on Gradient Distribution Properties and Scale-Invariant Features in Electric Power Scenes

by Lin Zhu, Yuxing Mao, Chunxu Chen and Lanjia Ning

J. Imaging 2025, 11(1), 23; https://doi.org/10.3390/jimaging11010023 - 13 Jan 2025

Abstract

In grid intelligent inspection systems, automatic registration of infrared and visible light images in power scenes is a crucial research technology. Since there are obvious differences in key attributes between visible and infrared images, direct alignment is often difficult to achieve the expected [...] Read more.

In grid intelligent inspection systems, automatic registration of infrared and visible light images in power scenes is a crucial research technology. Since there are obvious differences in key attributes between visible and infrared images, direct alignment is often difficult to achieve the expected results. To overcome the high difficulty of aligning infrared and visible light images, an image alignment method is proposed in this paper. First, we use the Sobel operator to extract the edge information of the image pair. Second, the feature points in the edges are recognised by a curvature scale space (CSS) corner detector. Third, the Histogram of Orientation Gradients (HOG) is extracted as the gradient distribution characteristics of the feature points, which are normalised with the Scale Invariant Feature Transform (SIFT) algorithm to form feature descriptors. Finally, initial matching and accurate matching are achieved by the improved fast approximate nearest-neighbour matching method and adaptive thresholding, respectively. Experiments show that this method can robustly match the feature points of image pairs under rotation, scale, and viewpoint differences, and achieves excellent matching results. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

► Show Figures

Graphical abstract

19 pages, 4635 KiB

Open AccessArticle

ZooCNN: A Zero-Order Optimized Convolutional Neural Network for Pneumonia Classification Using Chest Radiographs

by Saravana Kumar Ganesan, Parthasarathy Velusamy, Santhosh Rajendran, Ranjithkumar Sakthivel, Manikandan Bose and Baskaran Stephen Inbaraj

J. Imaging 2025, 11(1), 22; https://doi.org/10.3390/jimaging11010022 - 13 Jan 2025

Abstract

Pneumonia, a leading cause of mortality in children under five, is usually diagnosed through chest X-ray (CXR) images due to its efficiency and cost-effectiveness. However, the shortage of radiologists in the Least Developed Countries (LDCs) emphasizes the need for automated pneumonia diagnostic systems. [...] Read more.

Pneumonia, a leading cause of mortality in children under five, is usually diagnosed through chest X-ray (CXR) images due to its efficiency and cost-effectiveness. However, the shortage of radiologists in the Least Developed Countries (LDCs) emphasizes the need for automated pneumonia diagnostic systems. This article presents a Deep Learning model, Zero-Order Optimized Convolutional Neural Network (ZooCNN), a Zero-Order Optimization (Zoo)-based CNN model for classifying CXR images into three classes, Normal Lungs (NL), Bacterial Pneumonia (BP), and Viral Pneumonia (VP); this model utilizes the Adaptive Synthetic Sampling (ADASYN) approach to ensure class balance in the Kaggle CXR Images (Pneumonia) dataset. Conventional CNN models, though promising, face challenges such as overfitting and have high computational costs. The use of ZooPlatform (ZooPT), a hyperparameter finetuning strategy, on a baseline CNN model finetunes the hyperparameters and provides a modified architecture, ZooCNN, with a 72% reduction in weights. The model was trained, tested, and validated on the Kaggle CXR Images (Pneumonia) dataset. The ZooCNN achieved an accuracy of 97.27%, a sensitivity of 97.00%, a specificity of 98.60%, and an F1 score of 97.03%. The results were compared with contemporary models to highlight the efficacy of the ZooCNN in pneumonia classification (PC), offering a potential tool to aid physicians in clinical settings. Full article

► Show Figures

Figure 1

17 pages, 2650 KiB

Open AccessArticle

Typical and Local Diagnostic Reference Levels for Chest and Abdomen Radiography Examinations in Dubai Health Sector

by Entesar Z. Dalah, Maitha M. Al Zarooni, Faryal Y. Binismail, Hashim A. Beevi, Mohammed Siraj and Subrahmanian Pottybindu

J. Imaging 2025, 11(1), 21; https://doi.org/10.3390/jimaging11010021 - 13 Jan 2025

Abstract

Chest and abdomen radiographs are the most common radiograph examinations conducted in the Dubai Health sector, with both involving exposure to several radiosensitive organs. Diagnostic reference levels (DRLs) are accepted as an effective safety, optimization, and auditing tool in clinical practice. The present [...] Read more.

Chest and abdomen radiographs are the most common radiograph examinations conducted in the Dubai Health sector, with both involving exposure to several radiosensitive organs. Diagnostic reference levels (DRLs) are accepted as an effective safety, optimization, and auditing tool in clinical practice. The present work aims to establish a comprehensive projection and weight-based structured DRL system that allows one to confidently highlight healthcare centers in need of urgent action. The data of a total of 5474 adult males and non-pregnant females who underwent chest and abdomen radiography examinations in five different healthcare centers were collected and retrospectively analyzed. The typical DRL (TDRL) for each healthcare center was established and defined per projection (chest: posterior–anterior (PA), anterior–posterior (AP) and lateral (LAT); abdomen: erect and supine) for a weight band (60–80 kg) and for the whole data (no weight band). Local DRL (LDRL) values were established per project for the selected radiograph for the whole data (no weight band) and the 60–80 kg population. Chest radiography data from 1755 (60–80 kg) images were used to build this comprehensive DRL system (PA: 1471, AP: 252, and LAT: 32). Similarly, 611 (60–80 kg) abdomen radiographs were used to establish a DRL system (erect: 286 and supine: 325). The LDRL values defined per chest and abdomen projection for the weight band group (60–80 kg) were as follows: chest—0.51 PA, 2.46 AP, and 2.13 LAT dGy·cm²; abdomen—8.08 for erect and 5.95 for supine dGy·cm². The LDRL defined per abdomen projection for the 60–80 kg weight band highlighted at least one healthcare center in need of optimization. Such a system is efficient, easy to use, and very effective clinically. Full article

(This article belongs to the Special Issue Tools and Techniques for Improving Radiological Imaging Applications)

► Show Figures

Figure 1

15 pages, 1946 KiB

Open AccessArticle

Enhanced Image Retrieval Using Multiscale Deep Feature Fusion in Supervised Hashing

by Amina Belalia, Kamel Belloulata and Adil Redaoui

J. Imaging 2025, 11(1), 20; https://doi.org/10.3390/jimaging11010020 - 12 Jan 2025

Abstract

In recent years, deep-network-based hashing has gained prominence in image retrieval for its ability to generate compact and efficient binary representations. However, most existing methods predominantly focus on high-level semantic features extracted from the final layers of networks, often neglecting structural details that [...] Read more.

In recent years, deep-network-based hashing has gained prominence in image retrieval for its ability to generate compact and efficient binary representations. However, most existing methods predominantly focus on high-level semantic features extracted from the final layers of networks, often neglecting structural details that are crucial for capturing spatial relationships within images. Achieving a balance between preserving structural information and maximizing retrieval accuracy is the key to effective image hashing and retrieval. To address this challenge, we introduce Multiscale Deep Feature Fusion for Supervised Hashing (MDFF-SH), a novel approach that integrates multiscale feature fusion into the hashing process. The hallmark of MDFF-SH lies in its ability to combine low-level structural features with high-level semantic context, synthesizing robust and compact hash codes. By leveraging multiscale features from multiple convolutional layers, MDFF-SH ensures the preservation of fine-grained image details while maintaining global semantic integrity, achieving a harmonious balance that enhances retrieval precision and recall. Our approach demonstrated a superior performance on benchmark datasets, achieving significant gains in the Mean Average Precision (MAP) compared with the state-of-the-art methods: 9.5% on CIFAR-10, 5% on NUS-WIDE, and 11.5% on MS-COCO. These results highlight the effectiveness of MDFF-SH in bridging structural and semantic information, setting a new standard for high-precision image retrieval through multiscale feature fusion. Full article

(This article belongs to the Special Issue Recent Techniques in Image Feature Extraction)

► Show Figures

Figure 1

21 pages, 6639 KiB

Open AccessArticle

Efficient Generative-Adversarial U-Net for Multi-Organ Medical Image Segmentation

by Haoran Wang, Gengshen Wu and Yi Liu

J. Imaging 2025, 11(1), 19; https://doi.org/10.3390/jimaging11010019 - 12 Jan 2025

Abstract

Manual labeling of lesions in medical image analysis presents a significant challenge due to its labor-intensive and inefficient nature, which ultimately strains essential medical resources and impedes the advancement of computer-aided diagnosis. This paper introduces a novel medical image-segmentation framework named Efficient Generative-Adversarial [...] Read more.

Manual labeling of lesions in medical image analysis presents a significant challenge due to its labor-intensive and inefficient nature, which ultimately strains essential medical resources and impedes the advancement of computer-aided diagnosis. This paper introduces a novel medical image-segmentation framework named Efficient Generative-Adversarial U-Net (EGAUNet), designed to facilitate rapid and accurate multi-organ labeling. To enhance the model’s capability to comprehend spatial information, we propose the Global Spatial-Channel Attention Mechanism (GSCA). This mechanism enables the model to concentrate more effectively on regions of interest. Additionally, we have integrated Efficient Mapping Convolutional Blocks (EMCB) into the feature-learning process, allowing for the extraction of multi-scale spatial information and the adjustment of feature map channels through optimized weight values. Moreover, the proposed framework progressively enhances its performance by utilizing a generative-adversarial learning strategy, which contributes to improvements in segmentation accuracy. Consequently, EGAUNet demonstrates exemplary segmentation performance on public multi-organ datasets while maintaining high efficiency. For instance, in evaluations on the CHAOS T2SPIR dataset, EGAUNet achieves approximately

2 %

higher performance on the Jaccard metric,

1 %

higher on the Dice metric, and nearly

3 %

higher on the precision metric in comparison to advanced networks such as Swin-Unet and TransUnet. Full article

(This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions (2nd Edition))

► Show Figures

Figure 1

14 pages, 7427 KiB

Open AccessArticle

Spectral Bidirectional Reflectance Distribution Function Simplification

by Shubham Chitnis, Aditya Sole and Sharat Chandran

J. Imaging 2025, 11(1), 18; https://doi.org/10.3390/jimaging11010018 - 11 Jan 2025

Abstract

Non-diffuse materials (e.g., metallic inks, varnishes, and paints) are widely used in real-world applications. Accurate spectral rendering relies on the bidirectional reflectance distribution function (BRDF). Current methods of capturing the BRDFs have proven to be onerous in accomplishing quick turnaround time, from conception [...] Read more.

Non-diffuse materials (e.g., metallic inks, varnishes, and paints) are widely used in real-world applications. Accurate spectral rendering relies on the bidirectional reflectance distribution function (BRDF). Current methods of capturing the BRDFs have proven to be onerous in accomplishing quick turnaround time, from conception and design to production. We propose a multi-layer perceptron for compact spectral material representations, with 31 wavelengths for four real-world packaging materials. Our neural-based scenario reduces measurement requirements while maintaining significant saliency. Unlike tristimulus BRDF acquisition, this spectral approach has not, to our knowledge, been previously explored with neural networks. We demonstrate compelling results for diffuse, glossy, and goniochromatic materials. Full article

(This article belongs to the Special Issue Imaging Technologies for Understanding Material Appearance)

► Show Figures

Figure 1

23 pages, 10925 KiB

Open AccessArticle

Supervised and Self-Supervised Learning for Assembly Line Action Recognition

by Christopher Indris, Fady Ibrahim, Hatem Ibrahem, Götz Bramesfeld, Jie Huo, Hafiz Mughees Ahmad, Syed Khizer Hayat and Guanghui Wang

J. Imaging 2025, 11(1), 17; https://doi.org/10.3390/jimaging11010017 - 10 Jan 2025

Abstract

The safety and efficiency of assembly lines are critical to manufacturing, but human supervisors cannot oversee all activities simultaneously. This study addresses this challenge by performing a comparative study to construct an initial real-time, semi-supervised temporal action recognition setup for monitoring worker actions [...] Read more.

The safety and efficiency of assembly lines are critical to manufacturing, but human supervisors cannot oversee all activities simultaneously. This study addresses this challenge by performing a comparative study to construct an initial real-time, semi-supervised temporal action recognition setup for monitoring worker actions on assembly lines. Various feature extractors and localization models were benchmarked using a new assembly dataset, with the I3D model achieving an average mAP@IoU=0.1:0.7 of 85% without optical flow or fine-tuning. The comparative study was extended to self-supervised learning via a modified SPOT model, which achieved a mAP@IoU=0.1:0.7 of 65% with just 10% of the data labeled using extractor architectures from the fully-supervised portion. Milestones include high scores for both fully and semi-supervised learning on this dataset and improved SPOT performance on ANet1.3. This study identified the particularities of the problem, which were leveraged and referenced to explain the results observed in semi-supervised scenarios. The findings highlight the potential for developing a scalable solution in the future, providing labour efficiency and safety compliance for manufacturers. Full article

(This article belongs to the Special Issue Advancing Action Recognition: Novel Approaches, Techniques and Applications)

► Show Figures

Figure 1

8 pages, 3549 KiB

Open AccessCommunication

Unmasking the Area Postrema on MRI: Utility of 3D FLAIR, 3D-T2, and 3D-DIR Sequences in a Case–Control Study

by Javier Lara-García, Jessica Romo-Martínez, Jonathan Javier De-La-Cruz-Cisneros, Marco Antonio Olvera-Olvera and Luis Jesús Márquez-Bejarano

J. Imaging 2025, 11(1), 16; https://doi.org/10.3390/jimaging11010016 - 10 Jan 2025

Abstract

The area postrema (AP) is a key circumventricular organ involved in the regulation of autonomic functions. Accurate identification of the AP via MRI is essential in neuroimaging but it is challenging. This study evaluated 3D FSE Cube T2WI, 3D FSE Cube FLAIR, and [...] Read more.

The area postrema (AP) is a key circumventricular organ involved in the regulation of autonomic functions. Accurate identification of the AP via MRI is essential in neuroimaging but it is challenging. This study evaluated 3D FSE Cube T2WI, 3D FSE Cube FLAIR, and 3D DIR sequences to improve AP detection in patients with and without multiple sclerosis (MS). A case–control study included 35 patients with MS and 35 with other non-demyelinating central nervous system diseases (ND-CNSD). MRI images were acquired employing 3D DIR, 3D FSE Cube FLAIR, and 3D FSE Cube T2WI sequences. The evaluation of AP was conducted using a 3-point scale. Statistical analysis was performed with the chi-square test used to assess group homogeneity and differences between sequences. No significant differences were found in the visualization of the AP between the MS and ND-CNSD groups across the sequences or planes. The AP was not visible in 27.6% of the 3D FSE Cube T2WI sequences, while it was visualized in 99% of the 3D FSE Cube FLAIR sequences and 100% of the 3D DIR sequences. The 3D DIR sequence showed superior performance in identifying the AP. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

28 pages, 4795 KiB

Open AccessArticle

Skin Lesion Classification Through Test Time Augmentation and Explainable Artificial Intelligence

by Loris Cino, Cosimo Distante, Alessandro Martella and Pier Luigi Mazzeo

J. Imaging 2025, 11(1), 15; https://doi.org/10.3390/jimaging11010015 - 9 Jan 2025

Abstract

Despite significant advancements in the automatic classification of skin lesions using artificial intelligence (AI) algorithms, skepticism among physicians persists. This reluctance is primarily due to the lack of transparency and explainability inherent in these models, which hinders their widespread acceptance in clinical settings. [...] Read more.

Despite significant advancements in the automatic classification of skin lesions using artificial intelligence (AI) algorithms, skepticism among physicians persists. This reluctance is primarily due to the lack of transparency and explainability inherent in these models, which hinders their widespread acceptance in clinical settings. The primary objective of this study is to develop a highly accurate AI-based algorithm for skin lesion classification that also provides visual explanations to foster trust and confidence in these novel diagnostic tools. By improving transparency, the study seeks to contribute to earlier and more reliable diagnoses. Additionally, the research investigates the impact of Test Time Augmentation (TTA) on the performance of six Convolutional Neural Network (CNN) architectures, which include models from the EfficientNet, ResNet (Residual Network), and ResNeXt (an enhanced variant of ResNet) families. To improve the interpretability of the models’ decision-making processes, techniques such as t-distributed Stochastic Neighbor Embedding (t-SNE) and Gradient-weighted Class Activation Mapping (Grad-CAM) are employed. t-SNE is utilized to visualize the high-dimensional latent features of the CNNs in a two-dimensional space, providing insights into how the models group different skin lesion classes. Grad-CAM is used to generate heatmaps that highlight the regions of input images that influence the model’s predictions. Our findings reveal that Test Time Augmentation enhances the balanced multi-class accuracy of CNN models by up to 0.3%, achieving a balanced accuracy rate of 97.58% on the International Skin Imaging Collaboration (ISIC 2019) dataset. This performance is comparable to, or marginally better than, more complex approaches such as Vision Transformers (ViTs), demonstrating the efficacy of our methodology. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications (2nd Edition))

► Show Figures

Figure 1

24 pages, 1098 KiB

Open AccessArticle

Face Boundary Formulation for Harmonic Models: Face Image Resembling

by Hung-Tsai Huang, Zi-Cai Li, Yimin Wei and Ching Yee Suen

J. Imaging 2025, 11(1), 14; https://doi.org/10.3390/jimaging11010014 - 8 Jan 2025

Abstract

This paper is devoted to numerical algorithms based on harmonic transformations with two goals: (1) face boundary formulation by blending techniques based on the known characteristic nodes and (2) some challenging examples of face resembling. The formulation of the face boundary is imperative [...] Read more.

This paper is devoted to numerical algorithms based on harmonic transformations with two goals: (1) face boundary formulation by blending techniques based on the known characteristic nodes and (2) some challenging examples of face resembling. The formulation of the face boundary is imperative for face recognition, transformation, and combination. Mapping between the source and target face boundaries with constituent pixels is explored by two approaches: cubic spline interpolation and ordinary differential equation (ODE) using Hermite interpolation. The ODE approach is more flexible and suitable for handling different boundary conditions, such as the clamped and simple support conditions. The intrinsic relations between the cubic spline and ODE methods are explored for different face boundaries, and their combinations are developed. Face combination and resembling are performed by employing blending curves for generating the face boundary, and face images are converted by numerical methods for harmonic models, such as the finite difference method (FDM), the finite element method (FEM) and the finite volume method (FVM) for harmonic models, and the splitting–integrating method (SIM) for the resampling of constituent pixels. For the second goal, the age effects of facial appearance are explored to discover that different ages of face images can be produced by integrating the photos and images of the old and the young. Then, the following challenging task is targeted. Based on the photos and images of parents and their children, can we obtain an integrated image to resemble his/her current image as closely as possible? Amazing examples of face combination and resembling are reported in this paper to give a positive answer. Furthermore, an optimal combination of face images of parents and their children in the least-squares sense is introduced to greatly facilitate face resembling. Face combination and resembling may also be used for plastic surgery, finding missing children, and identifying criminals. The boundary and numerical techniques of face images in this paper can be used not only for pattern recognition but also for face morphing, morphing attack detection (MAD), and computer animation as Sora to greatly enhance further developments in AI. Full article

(This article belongs to the Special Issue Techniques and Applications in Face Image Analysis)

► Show Figures

Figure 1

13 pages, 1390 KiB

Open AccessArticle

Combined Input Deep Learning Pipeline for Embryo Selection for In Vitro Fertilization Using Light Microscopic Images and Additional Features

by Krittapat Onthuam, Norrawee Charnpinyo, Kornrapee Suthicharoenpanich, Supphaset Engphaiboon, Punnarai Siricharoen, Ronnapee Chaichaowarat and Chanakarn Suebthawinkul

J. Imaging 2025, 11(1), 13; https://doi.org/10.3390/jimaging11010013 - 7 Jan 2025

Abstract

The current process of embryo selection in in vitro fertilization is based on morphological criteria; embryos are manually evaluated by embryologists under subjective assessment. In this study, a deep learning-based pipeline was developed to classify the viability of embryos using combined inputs, including [...] Read more.

The current process of embryo selection in in vitro fertilization is based on morphological criteria; embryos are manually evaluated by embryologists under subjective assessment. In this study, a deep learning-based pipeline was developed to classify the viability of embryos using combined inputs, including microscopic images of embryos and additional features, such as patient age and developed pseudo-features, including a continuous interpretation of Istanbul grading scores by predicting the embryo stage, inner cell mass, and trophectoderm. For viability prediction, convolution-based transferred learning models were employed, multiple pretrained models were compared, and image preprocessing techniques and hyperparameter optimization via Optuna were utilized. In addition, a custom weight was trained using a self-supervised learning framework known as the Simple Framework for Contrastive Learning of Visual Representations (SimCLR) in cooperation with generated images using generative adversarial networks (GANs). The best model was developed from the EfficientNet-B0 model using preprocessed images combined with pseudo-features generated using separate EfficientNet-B0 models, and optimized by Optuna to tune the hyperparameters of the models. The designed model’s F1 score, accuracy, sensitivity, and area under curve (AUC) were 65.02%, 69.04%, 56.76%, and 66.98%, respectively. This study also showed an advantage in accuracy and a similar AUC when compared with the recent ensemble method. Full article

► Show Figures

Figure 1

66 pages, 2123 KiB

Open AccessSystematic Review

Hybrid Quality-Based Recommender Systems: A Systematic Literature Review

by Bihi Sabiri, Amal Khtira, Bouchra El Asri and Maryem Rhanoui

J. Imaging 2025, 11(1), 12; https://doi.org/10.3390/jimaging11010012 - 7 Jan 2025

Abstract

As technology develops, consumer behavior and how people search for what they want are constantly evolving. Online shopping has fundamentally changed the e-commerce industry. Although there are more products available than ever before, only a small portion of them are noticed; as a [...] Read more.

As technology develops, consumer behavior and how people search for what they want are constantly evolving. Online shopping has fundamentally changed the e-commerce industry. Although there are more products available than ever before, only a small portion of them are noticed; as a result, a few items gain disproportionate attention. Recommender systems can help to increase the visibility of lesser-known products. Major technology businesses have adopted these technologies as essential offerings, resulting in better user experiences and more sales. As a result, recommender systems have achieved considerable economic, social, and global advancements. Companies are improving their algorithms with hybrid techniques that combine more recommendation methodologies as these systems are a major research focus. This review provides a thorough examination of several hybrid models by combining ideas from the current research and emphasizing their practical uses, strengths, and limits. The review identifies special problems and opportunities for designing and implementing hybrid recommender systems by focusing on the unique aspects of big data, notably volume, velocity, and variety. Adhering to the Cochrane Handbook and the principles developed by Kitchenham and Charters guarantees that the assessment process is transparent and high in quality. The current aim is to conduct a systematic review of several recent developments in the area of hybrid recommender systems. The study covers the state of the art of the relevant research over the last four years regarding four knowledge bases (ACM, Google Scholar, Scopus, and Springer), as well as all Web of Science articles regardless of their date of publication. This study employs ASReview, an open-source application that uses active learning to help academics filter literature efficiently. This study aims to assess the progress achieved in the field of hybrid recommender systems to identify frequently used recommender approaches, explore the technical context, highlight gaps in the existing research, and position our future research in relation to the current studies. Full article

(This article belongs to the Section Document Analysis and Processing)

► Show Figures

Figure 1

20 pages, 32805 KiB

Open AccessArticle

Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand

by Parinya Thetbanthad, Benjaporn Sathanarugsawait and Prasong Praneetpolgrang

J. Imaging 2025, 11(1), 11; https://doi.org/10.3390/jimaging11010011 - 6 Jan 2025

Abstract

This study introduces a novel AI-driven approach to support elderly patients in Thailand with medication management, focusing on accurate drug label interpretation. Two model architectures were explored: a Two-Stage Optical Character Recognition (OCR) and Large Language Model (LLM) pipeline combining EasyOCR with Qwen2-72b-instruct [...] Read more.

This study introduces a novel AI-driven approach to support elderly patients in Thailand with medication management, focusing on accurate drug label interpretation. Two model architectures were explored: a Two-Stage Optical Character Recognition (OCR) and Large Language Model (LLM) pipeline combining EasyOCR with Qwen2-72b-instruct and a Uni-Stage Visual Question Answering (VQA) model using Qwen2-72b-VL. Both models operated in a zero-shot capacity, utilizing Retrieval-Augmented Generation (RAG) with DrugBank references to ensure contextual relevance and accuracy. Performance was evaluated on a dataset of 100 diverse prescription labels from Thai healthcare facilities, using RAG Assessment (RAGAs) metrics to assess Context Recall, Factual Correctness, Faithfulness, and Semantic Similarity. The Two-Stage model achieved high accuracy (94%) and strong RAGAs scores, particularly in Context Recall (0.88) and Semantic Similarity (0.91), making it well-suited for complex medication instructions. In contrast, the Uni-Stage model delivered faster response times, making it practical for high-volume environments such as pharmacies. This study demonstrates the potential of zero-shot AI models in addressing medication management challenges for the elderly by providing clear, accurate, and contextually relevant label interpretations. The findings underscore the adaptability of AI in healthcare, balancing accuracy and efficiency to meet various real-world needs. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

13 pages, 4901 KiB

Open AccessArticle

A New Deep Learning-Based Method for Automated Identification of Thoracic Lymph Node Stations in Endobronchial Ultrasound (EBUS): A Proof-of-Concept Study

by Øyvind Ervik, Mia Rødde, Erlend Fagertun Hofstad, Ingrid Tveten, Thomas Langø, Håkon O. Leira, Tore Amundsen and Hanne Sorger

J. Imaging 2025, 11(1), 10; https://doi.org/10.3390/jimaging11010010 - 5 Jan 2025

Abstract

Endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) is a cornerstone in minimally invasive thoracic lymph node sampling. In lung cancer staging, precise assessment of lymph node position is crucial for clinical decision-making. This study aimed to demonstrate a new deep learning method to classify [...] Read more.

Endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) is a cornerstone in minimally invasive thoracic lymph node sampling. In lung cancer staging, precise assessment of lymph node position is crucial for clinical decision-making. This study aimed to demonstrate a new deep learning method to classify thoracic lymph nodes based on their anatomical location using EBUS images. Bronchoscopists labeled lymph node stations in real-time according to the Mountain Dressler nomenclature. EBUS images were then used to train and test a deep neural network (DNN) model, with intraoperative labels as ground truth. In total, 28,134 EBUS images were acquired from 56 patients. The model achieved an overall classification accuracy of 59.5 ± 5.2%. The highest precision, sensitivity, and F1 score were observed in station 4L, 77.6 ± 13.1%, 77.6 ± 15.4%, and 77.6 ± 15.4%, respectively. The lowest precision, sensitivity, and F1 score were observed in station 10L. The average processing and prediction time for a sequence of ten images was 0.65 ± 0.04 s, demonstrating the feasibility of real-time applications. In conclusion, the new DNN-based model could be used to classify lymph node stations from EBUS images. The method performance was promising with a potential for clinical use. Full article

(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)

► Show Figures

Figure 1

38 pages, 4397 KiB

Open AccessArticle

Visual Impairment Spatial Awareness System for Indoor Navigation and Daily Activities

by Xinrui Yu and Jafar Saniie

J. Imaging 2025, 11(1), 9; https://doi.org/10.3390/jimaging11010009 - 4 Jan 2025

Abstract

The integration of artificial intelligence into daily life significantly enhances the autonomy and quality of life of visually impaired individuals. This paper introduces the Visual Impairment Spatial Awareness (VISA) system, designed to holistically assist visually impaired users in indoor activities through a structured, [...] Read more.

The integration of artificial intelligence into daily life significantly enhances the autonomy and quality of life of visually impaired individuals. This paper introduces the Visual Impairment Spatial Awareness (VISA) system, designed to holistically assist visually impaired users in indoor activities through a structured, multi-level approach. At the foundational level, the system employs augmented reality (AR) markers for indoor positioning, neural networks for advanced object detection and tracking, and depth information for precise object localization. At the intermediate level, it integrates data from these technologies to aid in complex navigational tasks such as obstacle avoidance and pathfinding. The advanced level synthesizes these capabilities to enhance spatial awareness, enabling users to navigate complex environments and locate specific items. The VISA system exhibits an efficient human–machine interface (HMI), incorporating text-to-speech and speech-to-text technologies for natural and intuitive communication. Evaluations in simulated real-world environments demonstrate that the system allows users to interact naturally and with minimal effort. Our experimental results confirm that the VISA system efficiently assists visually impaired users in indoor navigation, object detection and localization, and label and text recognition, thereby significantly enhancing their spatial awareness and independence. Full article

(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)

► Show Figures

Figure 1

Journal Description

Journal of Imaging

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI