Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (944)

Search Parameters:
Keywords = imbalanced datasets

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
38 pages, 3147 KiB  
Article
A Risk-Optimized Framework for Data-Driven IPO Underperformance Prediction in Complex Financial Systems
by Mazin Alahmadi
Systems 2025, 13(3), 179; https://doi.org/10.3390/systems13030179 - 6 Mar 2025
Abstract
Accurate predictions of Initial Public Offerings (IPOs) aftermarket performance are essential for making informed investment decisions in the financial sector. This paper attempts to predict IPO short-term underperformance during a month post-listing. The current research landscape lacks modern models that address the needs [...] Read more.
Accurate predictions of Initial Public Offerings (IPOs) aftermarket performance are essential for making informed investment decisions in the financial sector. This paper attempts to predict IPO short-term underperformance during a month post-listing. The current research landscape lacks modern models that address the needs of small and imbalanced datasets relevant to emerging markets, as well as the risk preferences of investors. To fill this gap, we present a practical framework utilizing tree-based ensemble learning, including Bagging Classifier (BC), Random Forest (RF), AdaBoost (Ada), Gradient Boosting (GB), XGBoost (XG), Stacking Classifier (SC), and Extra Trees (ET), with Decision Tree (DT) as a base estimator. The framework leverages data-driven methodologies to optimize decision-making in complex financial systems, integrating ANOVA F-value for feature selection, Randomized Search for hyperparameter optimization, and SMOTE for class balance. The framework’s effectiveness is assessed using a hand-collected dataset that includes features from both pre-IPO prospectus and firm-specific financial data. We thoroughly evaluate the results using single-split evaluation and 10-fold cross-validation analysis. For the single-split validation, ET achieves the highest accuracy of 86%, while for the 10-fold validation, BC achieves the highest accuracy of 70%. Additionally, we compare the results of the proposed framework with deep-learning models such as MLP, TabNet, and ANN to assess their effectiveness in handling IPO underperformance predictions. These results demonstrate the framework’s capability to enable robust data-driven decision-making processes in complex and dynamic financial environments, even with limited and imbalanced datasets. The framework also proposes a dynamic methodology named Investor Preference Prediction Framework (IPPF) to match tree-based ensemble models to investors’ risk preferences when predicting IPO underperformance. It concludes that different models may be suitable for various risk profiles. For the dataset at hand, ET and Ada are more appropriate for risk-averse investors, while BC is suitable for risk-tolerant investors. The results underscore the framework’s importance in improving IPO underperformance predictions, which can better inform investment strategies and decision-making processes. Full article
(This article belongs to the Special Issue Data-Driven Decision Making for Complex Systems)
Show Figures

Figure 1

19 pages, 4910 KiB  
Article
A Novel SHAP-GAN Network for Interpretable Ovarian Cancer Diagnosis
by Jingxun Cai, Zne-Jung Lee, Zhihxian Lin and Ming-Ren Yang
Mathematics 2025, 13(5), 882; https://doi.org/10.3390/math13050882 - 6 Mar 2025
Abstract
Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing diagnostic methods, such as biomarker testing and imaging, [...] Read more.
Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing diagnostic methods, such as biomarker testing and imaging, can help with early diagnosis to some extent, these methods still have limitations in sensitivity and accuracy, often leading to misdiagnosis or missed diagnosis. Ovarian cancer’s high heterogeneity and complexity increase diagnostic challenges, especially in disease progression prediction and patient classification. Machine learning (ML) has outperformed traditional methods in cancer detection by processing large datasets to identify patterns missed by conventional techniques. However, existing AI models still struggle with accuracy in handling imbalanced and high-dimensional data, and their “black-box” nature limits clinical interpretability. To address these issues, this study proposes SHAP-GAN, an innovative diagnostic model for ovarian cancer that integrates Shapley Additive exPlanations (SHAP) with Generative Adversarial Networks (GANs). The SHAP module quantifies each biomarker’s contribution to the diagnosis, while the GAN component optimizes medical data generation. This approach tackles three key challenges in medical diagnosis: data scarcity, model interpretability, and diagnostic accuracy. Results show that SHAP-GAN outperforms traditional methods in sensitivity, accuracy, and interpretability, particularly with high-dimensional and imbalanced ovarian cancer datasets. The top three influential features identified are PRR11, CIAO1, and SMPD3, which exhibit wide SHAP value distributions, highlighting their significant impact on model predictions. The SHAP-GAN network has demonstrated an impressive accuracy rate of 99.34% on the ovarian cancer dataset, significantly outperforming baseline algorithms, including Support Vector Machines (SVM), Logistic Regression (LR), and XGBoost. Specifically, SVM achieved an accuracy of 72.78%, LR achieved 86.09%, and XGBoost achieved 96.69%. These results highlight the superior performance of SHAP-GAN in handling high-dimensional and imbalanced datasets. Furthermore, SHAP-GAN significantly alleviates the challenges associated with intricate genetic data analysis, empowering medical professionals to tailor personalized treatment strategies for individual patients. Full article
Show Figures

Figure 1

25 pages, 8345 KiB  
Article
Landslide Susceptibility Mapping in Xinjiang: Identifying Critical Thresholds and Interaction Effects Among Disaster-Causing Factors
by Xiangyang Feng, Zhaoqi Wu, Zihao Wu, Junping Bai, Shixiang Liu and Qingwu Yan
Land 2025, 14(3), 555; https://doi.org/10.3390/land14030555 - 6 Mar 2025
Abstract
Landslides frequently occur in the Xinjiang Uygur Autonomous Region of China due to its complex geological environment, posing serious risks to human safety and economic stability. Existing studies widely use machine learning models for landslide susceptibility prediction. However, they often fail to capture [...] Read more.
Landslides frequently occur in the Xinjiang Uygur Autonomous Region of China due to its complex geological environment, posing serious risks to human safety and economic stability. Existing studies widely use machine learning models for landslide susceptibility prediction. However, they often fail to capture the threshold and interaction effects among environmental factors, limiting their ability to accurately identify high-risk zones. To address this gap, this study employed a gradient boosting decision tree (GBDT) model to identify critical thresholds and interaction effects among disaster-causing factors, while mapping the spatial distribution of landslide susceptibility based on 20 covariates. The performance of this model was compared with that of a support vector machine and deep neural network models. Results showed that the GBDT model achieved superior performance, with the highest AUC and recall values among the tested models. After applying clustering algorithms for non-landslide sample selection, the GBDT model maintained a high recall value of 0.963, demonstrating its robustness against imbalanced datasets. The GBDT model identified that 8.86% of Xinjiang’s total area exhibits extremely high or high landslide susceptibility, mainly concentrated in the Tianshan and Altai mountain ranges. Lithology, precipitation, profile curvature, the Modified Normalized Difference Water Index (MNDWI), and vertical deformation were identified as the primary contributing factors. Threshold effects were observed in the relationships between these factors and landslide susceptibility. The probability of landslide occurrence increased sharply when precipitation exceeded 2500 mm, vertical deformation was greater than 0 mm a−1, or the MNDWI values were extreme (<−0.4, >0.2). Additionally, this study confirmed bivariate interaction effects. Most interactions between factors exhibited positive effects, suggesting that combining two factors enhances classification performance compared with using each factor independently. This finding highlights the intricate and interdependent nature of these factors in landslide susceptibility. These findings emphasize the necessity of incorporating threshold and interaction effects in landslide susceptibility assessments, offering practical insights for disaster prevention and mitigation. Full article
Show Figures

Figure 1

21 pages, 5231 KiB  
Article
Stacked Ensembles Powering Smart Farming for Imbalanced Sugarcane Disease Detection
by Sahar Qaadan, Aiman Alshare, Abdullah Ahmed and Haneen Altartouri
Appl. Sci. 2025, 15(5), 2788; https://doi.org/10.3390/app15052788 - 5 Mar 2025
Viewed by 141
Abstract
Sugarcane is a vital crop, accounting for approximately 75% of the global sugar production. Ensuring its health through the early detection and classification of diseases is essential in maximizing crop yields and productivity. While recent deep learning advancements, such as Vision Transformers, have [...] Read more.
Sugarcane is a vital crop, accounting for approximately 75% of the global sugar production. Ensuring its health through the early detection and classification of diseases is essential in maximizing crop yields and productivity. While recent deep learning advancements, such as Vision Transformers, have shown promise in sugarcane disease classification, these methods often rely on resource-intensive models, limiting their practical applicability. This study introduces a novel stacking-based ensemble framework that combines embeddings from multiple state-of-the-art deep learning methods. It offers a lightweight and accurate approach for sugarcane disease classification. Leveraging the publicly available sugarcane leaf dataset, which includes 7134 high-resolution images across 11 classes (nine diseases, healthy leaves, and dried leaves), the proposed framework integrates embeddings from InceptionV3, SqueezeNet, and DeepLoc models with stacked ensemble classifiers. This approach addresses the challenges posed by imbalanced datasets and significantly enhances the classification performance. In binary classification, the model accuracy is 98.89% and the weighted F1-score is 98.92%, while the multi-classification approach attains accuracy of 95.64% and a weighted F1-score of 95.62%. The stacking-based framework is superior to Transformer models, reducing the training time by 75% and demonstrating superior generalization across diverse and imbalanced classes. These findings directly contribute to the sustainability goals of zero hunger and responsible consumption and production by improving agricultural productivity and promoting resource-efficient farming practices. Full article
Show Figures

Figure 1

20 pages, 3815 KiB  
Article
A Benchmark for Water Surface Jet Segmentation with MobileHDC Method
by Yaojie Chen, Qing Quan, Wei Wang and Yunhan Lin
Appl. Sci. 2025, 15(5), 2755; https://doi.org/10.3390/app15052755 - 4 Mar 2025
Viewed by 133
Abstract
Intelligent jet systems are widely used in various fields, including firefighting, marine operations, and underwater exploration. Accurate extraction and prediction of jet trajectories are essential for optimizing their performance, but challenges arise due to environmental factors such as climate, wind direction, and suction [...] Read more.
Intelligent jet systems are widely used in various fields, including firefighting, marine operations, and underwater exploration. Accurate extraction and prediction of jet trajectories are essential for optimizing their performance, but challenges arise due to environmental factors such as climate, wind direction, and suction efficiency. To address these issues, we introduce two novel jet segmentation datasets, Libary and SegQinhu, which cover both indoor and outdoor environments under varying weather conditions and temporal intervals. These datasets present significant challenges, including occlusions and strong light reflections, making them ideal for evaluating jet trajectory segmentation methods. Through empirical evaluation of several state-of-the-art (SOTA) techniques on these datasets, we observe that general methods struggle with highly imbalanced pixel distributions in jet trajectory images. To overcome this, we propose a data-driven pipeline for jet trajectory extraction and segmentation. At its core is MobileHDC, a new baseline model that leverages the MobileNetV2 architecture and integrates dilated convolutions to enhance the receptive field without increasing computational cost. Additionally, we introduce a parallel convolutional block and a decoder to fuse multi-level features, enabling a better capture of contextual information and improving the continuity and accuracy of jet segmentation. The experimental results show that our method outperforms existing SOTA techniques on both jet-specific datasets, highlighting the effectiveness of our approach. Full article
Show Figures

Figure 1

22 pages, 1569 KiB  
Systematic Review
A Review of Artificial Intelligence-Based Down Syndrome Detection Techniques
by Mujeeb Ahmed Shaikh, Hazim Saleh Al-Rawashdeh and Abdul Rahaman Wahab Sait
Life 2025, 15(3), 390; https://doi.org/10.3390/life15030390 - 1 Mar 2025
Viewed by 239
Abstract
Background: Down syndrome (DS) is one of the most prevalent chromosomal abnormalities affecting global healthcare. Recent advances in artificial intelligence (AI) and machine learning (ML) have enhanced DS diagnostic accuracy. However, there is a lack of thorough evaluations analyzing the overall impact and [...] Read more.
Background: Down syndrome (DS) is one of the most prevalent chromosomal abnormalities affecting global healthcare. Recent advances in artificial intelligence (AI) and machine learning (ML) have enhanced DS diagnostic accuracy. However, there is a lack of thorough evaluations analyzing the overall impact and effectiveness of AI-based DS diagnostic approaches. Objectives: This review intends to identify methodologies and technologies used in AI-driven DS diagnostics. It evaluates the performance of AI models in terms of standard evaluation metrics, highlighting their strengths and limitations. Methodology: In order to ensure transparency and rigor, the authors followed the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines. They extracted 1175 articles from major academic databases. By leveraging inclusion and exclusion criteria, a final set of 25 articles was selected. Outcomes: The findings revealed significant advancements in AI-powered DS diagnostics across diverse data modalities. The modalities, including facial images, ultrasound scans, and genetic data, demonstrated strong potential for early DS diagnosis. Despite these advancements, this review outlined the limitations of AI approaches. Small and imbalanced datasets reduce the generalizability of the AI models. The authors present actionable strategies to enhance the clinical adoptions of these models. Full article
Show Figures

Figure 1

42 pages, 7989 KiB  
Article
Towards Robust SDN Security: A Comparative Analysis of Oversampling Techniques with ML and DL Classifiers
by Aboubakr Bajenaid, Maher Khemakhem, Fathy E. Eassa, Farid Bourennani, Junaid M. Qurashi, Abdulaziz A. Alsulami and Badraddin Alturki
Electronics 2025, 14(5), 995; https://doi.org/10.3390/electronics14050995 - 28 Feb 2025
Viewed by 250
Abstract
Software-defined networking (SDN) is becoming a predominant architecture for managing diverse networks. However, recent research has exhibited the susceptibility of SDN architectures to cyberattacks, which increases its security challenges. Many researchers have used machine learning (ML) and deep learning (DL) classifiers to mitigate [...] Read more.
Software-defined networking (SDN) is becoming a predominant architecture for managing diverse networks. However, recent research has exhibited the susceptibility of SDN architectures to cyberattacks, which increases its security challenges. Many researchers have used machine learning (ML) and deep learning (DL) classifiers to mitigate cyberattacks in SDN architectures. Since SDN datasets could suffer from class imbalance issues, the classification accuracy of predictive classifiers is undermined. Therefore, this research conducts a comparative analysis of the impact of utilizing oversampling and principal component analysis (PCA) techniques on ML and DL classifiers using publicly available SDN datasets. This approach combines mitigating the class imbalance issue and maintaining the effectiveness of the performance when reducing data dimensionality. Initially, the oversampling techniques are used to balance the classes of the SDN datasets. Then, the classification performance of ML and DL classifiers is evaluated and compared to observe the effectiveness of each oversampling technique on each classifier. PCA is applied to the balanced dataset, and the classifier’s performance is evaluated and compared. The results demonstrated that Random Oversampling outperformed the other balancing techniques. Furthermore, the XGBoost and Transformer classifiers were the most sensitive models when using oversampling and PCA algorithms. In addition, macro and weighted averages of evaluation metrics were calculated to show the impact of imbalanced class datasets on each classifier. Full article
(This article belongs to the Special Issue Security in System and Software)
Show Figures

Figure 1

22 pages, 4042 KiB  
Article
Advanced Predictive Analytics for Fetal Heart Rate Variability Using Digital Twin Integration
by Tunn Cho Lwin, Thi Thi Zin, Pyke Tin, Emi Kino and Tsuyomu Ikenoue
Sensors 2025, 25(5), 1469; https://doi.org/10.3390/s25051469 - 27 Feb 2025
Viewed by 255
Abstract
Fetal heart rate variability (FHRV) is a critical indicator of fetal well-being and autonomic nervous system development during labor. Traditional monitoring methods often provide limited insights, potentially leading to delayed interventions and suboptimal outcomes. This study proposes an advanced predictive analytics approach by [...] Read more.
Fetal heart rate variability (FHRV) is a critical indicator of fetal well-being and autonomic nervous system development during labor. Traditional monitoring methods often provide limited insights, potentially leading to delayed interventions and suboptimal outcomes. This study proposes an advanced predictive analytics approach by integrating approximate entropy analysis with a hidden Markov model (HMM) within a digital twin framework to enhance real-time fetal monitoring. We utilized a dataset of 469 fetal electrocardiogram (ECG) recordings, each exceeding one hour in duration, to ensure sufficient temporal information for reliable modeling. The FHRV data were preprocessed and partitioned into parasympathetic and sympathetic components based on downward and non-downward beat detection. Approximate entropy was calculated to quantify the complexity of FHRV patterns, revealing significant correlations with umbilical cord blood gas parameters, particularly pH levels. The HMM was developed with four hidden states representing discrete pH levels and eight observed states derived from FHRV data. By employing the Baum–Welch and Viterbi algorithms for training and decoding, respectively, the model effectively captured temporal dependencies and provided early predictions of the fetal acid–base status. Experimental results demonstrated that the model achieved 85% training and 79% testing accuracy on the balanced dataset distribution, improving from 78% and 71% on the imbalanced dataset. The integration of this predictive model into a digital twin framework offers significant benefits for timely clinical interventions, potentially improving prenatal outcomes. Full article
(This article belongs to the Special Issue Biomedical Sensing and Bioinformatics Processing)
Show Figures

Figure 1

29 pages, 4497 KiB  
Article
Imbalanced Power Spectral Generation for Respiratory Rate and Uncertainty Estimations Based on Photoplethysmography Signal
by Soojeong Lee, Mugahed A. Al-antari, Gyanendra Prasad Joshi and Yeong Hyeon Gu
Sensors 2025, 25(5), 1437; https://doi.org/10.3390/s25051437 - 26 Feb 2025
Viewed by 233
Abstract
Respiratory rate (RR) changes in the elderly can indicate serious diseases. Thus, accurate estimation of RRs for cardiopulmonary function is essential for home health monitoring systems. However, machine learning (ML) algorithm errors embedded in health monitoring systems can be problematic in medical decision-making [...] Read more.
Respiratory rate (RR) changes in the elderly can indicate serious diseases. Thus, accurate estimation of RRs for cardiopulmonary function is essential for home health monitoring systems. However, machine learning (ML) algorithm errors embedded in health monitoring systems can be problematic in medical decision-making because some data have much larger sample sizes in the training set than others. This difference in sample size implies biosignal data imbalance. Therefore, we propose a novel methodology that combines bootstrap-based imbalanced continuous power spectral generation (IPSG) with ML approaches to estimate RRs and uncertainty to address data imbalance. The sample differences between normal breathing (12–20 breaths per minute (brpm)), dyspnea (≥20 brpm), and hypopnea (<8 brpm) show significant data imbalance, which can affect the learning of ML algorithms. Hence, the normal breathing part with a large amount of data is well-trained. In contrast, the dyspnea and hypopnea parts with relatively fewer data are not well-trained, and this data imbalance makes it difficult to estimate the reference variables of the actual dyspnea and hypopnea data parts, thus generating significant errors. Hence, we apply ML models by mixing artificial feature curves generated using a bootstrap model with the original feature curves to estimate RRs and solve this problem. As a result, the nonparametric bootstrap approach significantly increases the number of artificial feature curves. The generated artificial feature curves are selectively utilized in the highly imbalanced parts. Therefore, we confirm that IPSG is efficiently trained to predict the complex nonlinear relationship between the feature vectors obtained from the photoplethysmography signal and the reference RR. The proposed methodology shows more accurate prediction performance and uncertainty. Combining the proposed Gaussian process regression (GPR) with IPSG based on the Beth Israel Deaconess Medical Center dataset, the mean absolute error of the RR is 0.79 and 1.47 brpm. Our approach achieves high stability and accuracy by randomly mixing original and artificial feature curves. The proposed GPR-IPSG model can improve the performance of clinical home-based monitoring systems and design a reliable framework. Full article
Show Figures

Figure 1

17 pages, 1923 KiB  
Article
Wind Turbine Fault Diagnosis with Imbalanced SCADA Data Using Generative Adversarial Networks
by Hong Wang, Taikun Li, Mingyang Xie, Wenfang Tian and Wei Han
Energies 2025, 18(5), 1158; https://doi.org/10.3390/en18051158 - 26 Feb 2025
Viewed by 262
Abstract
Wind turbine fault diagnostics is essential for enhancing turbine performance and lowering maintenance expenses. Supervisory control and data acquisition (SCADA) systems have been extensively recognized as a feasible technology for the realization of wind turbine fault diagnosis tasks due to their capacity to [...] Read more.
Wind turbine fault diagnostics is essential for enhancing turbine performance and lowering maintenance expenses. Supervisory control and data acquisition (SCADA) systems have been extensively recognized as a feasible technology for the realization of wind turbine fault diagnosis tasks due to their capacity to generate vast volumes of operation data. However, wind turbines generally operate normally, and fault data are rare or even impossible to collect. This makes the SCADA data distribution imbalanced, with significantly more normal data than abnormal data, resulting in a decrease in the performance of existing fault diagnosis techniques. This article presents an innovative deep learning-based fault diagnosis method to solve the SCADA data imbalance issue. First, a data generation module centered on generative adversarial networks is designed to create a balanced dataset. Specifically, the long short-term memory network that can handle time series data well is used in the generator network to learn the temporal correlations from SCADA data and thus generate samples with temporal dependencies. Meanwhile, the convolutional neural network (CNN), which has powerful feature learning and representation capabilities, is employed in the discriminator network to automatically capture data features and achieve sample authenticity discrimination. Then, another CNN is trained to perform fault classification using the augmented balanced dataset. The proposed approach is verified utilizing actual SCADA data derived from a wind farm. The comparative experiments show the presented approach is effective in diagnosing wind turbine faults. Full article
(This article belongs to the Section A3: Wind, Wave and Tidal Energy)
Show Figures

Figure 1

22 pages, 3932 KiB  
Article
Transferable Contextual Network for Rural Road Extraction from UAV-Based Remote Sensing Images
by Jian Wang, Renlong Wang, Yahui Liu, Fei Zhang and Ting Cheng
Sensors 2025, 25(5), 1394; https://doi.org/10.3390/s25051394 - 25 Feb 2025
Viewed by 181
Abstract
Road extraction from UAV-based remote sensing images in rural areas presents significant challenges due to the diverse and complex characteristics of rural roads. Additionally, acquiring UAV remote sensing data for rural areas is challenging due to the high cost of equipment, the lack [...] Read more.
Road extraction from UAV-based remote sensing images in rural areas presents significant challenges due to the diverse and complex characteristics of rural roads. Additionally, acquiring UAV remote sensing data for rural areas is challenging due to the high cost of equipment, the lack of clear road boundaries requiring extensive manual annotation, and limited regional policy support for UAV operations. To address these challenges, we propose a transferable contextual network (TCNet), designed to enhance the transferability and accuracy of rural road extraction. We employ a Stable Diffusion model for data augmentation, generating diverse training samples and providing a new method for acquiring remote sensing images. TCNet integrates the clustered contextual Transformer (CCT) module, clustered cross-attention (CCA) module, and CBAM attention mechanism to ensure efficient model transferability across different geographical and climatic conditions. Moreover, we design a new loss function, the Dice-BCE-Lovasz loss (DBL loss), to accelerate convergence and improve segmentation performance in handling imbalanced data. Experimental results demonstrate that TCNet, with only 23.67 M parameters, performs excellently on the DeepGlobe and road datasets and shows outstanding transferability in zero-shot testing on rural remote sensing data. TCNet performs well on segmentation tasks without any fine-tuning for regions such as Burgundy, France, and Yunnan, China. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

20 pages, 512 KiB  
Article
Applying Wearable Sensors and Machine Learning to the Diagnostic Challenge of Distinguishing Parkinson’s Disease from Other Forms of Parkinsonism
by Rana M. Khalil, Lisa M. Shulman, Ann L. Gruber-Baldini, Stephen G. Reich, Joseph M. Savitt, Jeffrey M. Hausdorff, Rainer von Coelln and Michael P. Cummings
Biomedicines 2025, 13(3), 572; https://doi.org/10.3390/biomedicines13030572 - 25 Feb 2025
Viewed by 266
Abstract
Background/Objectives: Parkinson’s Disease (PD) and other forms of parkinsonism share motor symptoms, including tremor, bradykinesia, and rigidity. The overlap in their clinical presentation creates a diagnostic challenge, as conventional methods rely heavily on clinical expertise, which can be subjective and inconsistent. This highlights [...] Read more.
Background/Objectives: Parkinson’s Disease (PD) and other forms of parkinsonism share motor symptoms, including tremor, bradykinesia, and rigidity. The overlap in their clinical presentation creates a diagnostic challenge, as conventional methods rely heavily on clinical expertise, which can be subjective and inconsistent. This highlights the need for objective, data-driven approaches such as machine learning (ML) in this area. However, applying ML to clinical datasets faces challenges such as imbalanced class distributions, small sample sizes for non-PD parkinsonism, and heterogeneity within the non-PD group. Methods: This study analyzed wearable sensor data from 260 PD participants and 18 individuals with etiologically diverse forms of non-PD parkinsonism, which were collected during clinical mobility tasks using a single sensor placed on the lower back. We evaluated the performance of ML models in distinguishing these two groups and identified the most informative mobility tasks for classification. Additionally, we examined the clinical characteristics of misclassified participants and presented case studies of common challenges in clinical practice, including diagnostic uncertainty at the patient’s initial visit and changes in diagnosis over time. We also suggested potential steps to address the dataset challenges which limited the models’ performance. Results: Feature importance analysis revealed the Timed Up and Go (TUG) task as the most informative for classification. When using the TUG test alone, the models’ performance exceeded that of combining all tasks, achieving a balanced accuracy of 78.2%, which is within 0.2% of the balanced diagnostic accuracy of movement disorder experts. We also identified differences in some clinical scores between the participants correctly and falsely classified by our models. Conclusions: These findings demonstrate the feasibility of using ML and wearable sensors for differentiating PD from other parkinsonian disorders, addressing key challenges in its diagnosis and streamlining diagnostic workflows. Full article
(This article belongs to the Special Issue Challenges in the Diagnosis and Treatment of Parkinson’s Disease)
Show Figures

Figure 1

23 pages, 2539 KiB  
Article
Ensemble Learning for Network Intrusion Detection Based on Correlation and Embedded Feature Selection Techniques
by Ghalia Nassreddine, Mohamad Nassereddine and Obada Al-Khatib
Computers 2025, 14(3), 82; https://doi.org/10.3390/computers14030082 - 25 Feb 2025
Viewed by 321
Abstract
Recent advancements across various sectors have resulted in a significant increase in the utilization of smart gadgets. This augmentation has resulted in an expansion of the network and the devices linked to it. Nevertheless, the development of the network has concurrently resulted in [...] Read more.
Recent advancements across various sectors have resulted in a significant increase in the utilization of smart gadgets. This augmentation has resulted in an expansion of the network and the devices linked to it. Nevertheless, the development of the network has concurrently resulted in a rise in policy infractions impacting information security. Finding intruders immediately is a critical component of maintaining network security. The intrusion detection system is useful for network security because it can quickly identify threats and give alarms. In this paper, a new approach for network intrusion detection was proposed. Combining the results of machine learning models like the random forest, decision tree, k-nearest neighbors, and XGBoost with logistic regression as a meta-model is what this method is based on. For the feature selection technique, the proposed approach creates an advanced method that combines the correlation-based feature selection with an embedded technique based on XGBoost. For handling the challenge of an imbalanced dataset, a SMOTE-TOMEK technique is used. The suggested algorithm is tested on the NSL-KDD and CIC-IDS datasets. It shows a high performance with an accuracy of 99.99% for both datasets. These results prove the effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue Using New Technologies in Cyber Security Solutions (2nd Edition))
Show Figures

Figure 1

24 pages, 1605 KiB  
Article
CGFL: A Robust Federated Learning Approach for Intrusion Detection Systems Based on Data Generation
by Shu Feng, Luhan Gao and Leyi Shi
Appl. Sci. 2025, 15(5), 2416; https://doi.org/10.3390/app15052416 - 24 Feb 2025
Viewed by 259
Abstract
The implementation of comprehensive security measures is a critical factor in the rapid growth of industrial control networks. Federated Learning has emerged as a viable solution for safeguarding privacy in machine learning. The effectiveness of pattern detection in models is diminished as a [...] Read more.
The implementation of comprehensive security measures is a critical factor in the rapid growth of industrial control networks. Federated Learning has emerged as a viable solution for safeguarding privacy in machine learning. The effectiveness of pattern detection in models is diminished as a result of the difficulty in extracting attack information from extremely large datasets and obtaining an adequate number of examples for specific types of attacks. A robust Federated Learning method, CGFL, is introduced in this study to resolve the challenges presented by data distribution discrepancies and client class imbalance. By employing a data generation strategy to generate balanced datasets for each client, CGFL enhances the global model. It employs a data generator that integrates artificially generated data with the existing data from local clients by employing label correction and data generation techniques. The geometric median aggregation technique was implemented to enhance the security of the aggregation process. The model was simulated and evaluated using the CIC-IDS2017 dataset, NSL-KDD dataset, and CSE-CIC-IDS2018 dataset. The experimental results indicate that CGFL does an effective job of enhancing the accuracy of ICS attack detection in Federated Learning under imbalanced sample conditions. Full article
(This article belongs to the Special Issue Advanced Computer Security and Applied Cybersecurity)
Show Figures

Figure 1

19 pages, 3169 KiB  
Article
Comparative Analysis of Perturbation Techniques in LIME for Intrusion Detection Enhancement
by Mantas Bacevicius, Agne Paulauskaite-Taraseviciene, Gintare Zokaityte, Lukas Kersys and Agne Moleikaityte
Mach. Learn. Knowl. Extr. 2025, 7(1), 21; https://doi.org/10.3390/make7010021 - 21 Feb 2025
Viewed by 283
Abstract
The growing sophistication of cyber threats necessitates robust and interpretable intrusion detection systems (IDS) to safeguard network security. While machine learning models such as Decision Tree (DT), Random Forest (RF), k-Nearest Neighbors (K-NN), and XGBoost demonstrate high effectiveness in detecting malicious activities, their [...] Read more.
The growing sophistication of cyber threats necessitates robust and interpretable intrusion detection systems (IDS) to safeguard network security. While machine learning models such as Decision Tree (DT), Random Forest (RF), k-Nearest Neighbors (K-NN), and XGBoost demonstrate high effectiveness in detecting malicious activities, their interpretability decreases as their complexity and accuracy increase, posing challenges for critical cybersecurity applications. Local Interpretable Model-agnostic Explanations (LIME) is widely used to address this limitation; however, its reliance on normal distribution for perturbations often fails to capture the non-linear and imbalanced characteristics of datasets like CIC-IDS-2018. To address these challenges, we propose a modified LIME perturbation strategy using Weibull, Gamma, Beta, and Pareto distributions to better capture the characteristics of network traffic data. Our methodology improves the stability of different ML models trained on CIC-IDS datasets, enabling more meaningful and reliable explanations of model predictions. The proposed modifications allow for an increase in explanation fidelity by up to 78% compared to the default Gaussian approach. Pareto-based perturbations provide the best results. Among all distributions tested, Pareto consistently yielded the highest explanation fidelity and stability, particularly for K-NN ( = 0.9971, S = 0.9907) and DT ( = 0.9267, S = 0.9797). This indicates that heavy-tailed distributions fit well with real-world network traffic patterns, reducing the variance in attribute importance explanations and making them more robust. Full article
(This article belongs to the Special Issue Advances in Explainable Artificial Intelligence (XAI): 3rd Edition)
Back to TopTop