1 Introduction

Currently, Federated Learning (FL) is a major contributor to emerging research trends. It has emerged to address the limitations of centralized machine learning, enhancing privacy and security by training models locally on distributed data. To address increasing security issues, governments are coming up with provisions for the protection of data, including the General Data Protection Regulation (GDPR) in Europe, the California Privacy Rights Act (CPRA) in the United States (Kim and Bodie 2020), and the Personal Data Protection Act in Singapore (Chik 2013). It is reported that approximately 49 percent of organizations in the United States have experienced at least one data breach in the past. The same case applies to Google, which was charged $57 million for data breaches by GDPR and was prevalent as a penalty in March 2020 (Satariano 2019).

To address these problems, Federated Learning (FL) was first proposed by Google in 2017 McMahan et al. (2017). In this learning process, a model is built using multiple clients, and these clients keep all data localized during the training process. Consequently, this approach utilizes diverse data sources to enhance model security (Ludwig and Baracaldo 2022; Ghani et al. 2024a) and to maintain data privacy. Therefore, the potential of federated learning has been highlighted in real-world applications such as predictive keyboards on smartphones (Upreti et al. 2024), healthcare (Si-Ahmed et al. 2024), agriculture (Friha et al. 2022), finance (Saura et al. 2022), smart cities (Friha et al. 2023), transportation (Li et al. 2023), and energy management (Yousaf et al. 2019, 2023). For example, recently, a French rollout was updated with the extension of a biomedical ML model, built with federated learning to control patients’ data (Kuchler 2019a).

Motivation of the Study: Federated learning is an emerging area of research that integrates advanced privacy-preserving techniques. However, this growing field has attracted attackers and created security vulnerabilities. Consequently, different security mechanisms have been developed to protect user privacy and detect malicious clients in the FL environment. These mechanisms encompass intrusion detection systems (IDS), which include authentication, access control, key management, and encryption. However, this review concentrates exclusively on IDS incorporating neural network models, feature engineering methods, and privacy-preserving techniques. An IDS is designed to detect malicious activities at both the host and network levels, which can lead to data breaches, unauthorized access, data alterations, and user data privacy. Recently, researchers have focused on IDS powered by Artificial Intelligence (AI), particularly those using Machine Learning (ML) and Deep Learning (DL) algorithms. This is because of their capacity to detect zero-day attacks and mitigate threats to their confidentiality, integrity, and availability. They can also adapt to the dynamic changes in the FL environment. Moreover, AI-based IDS have demonstrated promising results in addressing the challenges associated with the security of FL, including limited resource constraints, heterogeneity, scalability, and latency. Therefore, a comprehensive review of FL is necessary to identify the key strategies for developing an IDS. These include examining emerging AI-based methods, incorporating NN models, feature engineering, and advanced privacy techniques. It is important to explore new research challenges and potential future research directions.

Despite the potential benefits of FL, it still faces serious challenges. For example, current security mechanisms such as gradient inversion, label-flipping, poisoning, leakage attacks, statistical information for training models, consensus-based verification processes, and update digest methods have limitations(Kuchler 2019b; Gao et al. 2024; Nair et al. 2024; Alsulaimawi 2024; Li et al. 2024). This includes reduced model accuracy due to data perturbations, susceptibility to adversarial attacks, and increased computational overhead. These problems emerge in several areas, such as:

  • Cyber Attacks: In federated learning, information exchange between the devices and aggregation servers can be exposed to network attacks, physical tampering, and poisoning attacks. Malicious clients may send incorrect model updates to compromise datasets and model accuracy.

  • Non-IID (Non-Independent and Identically Distributed Data): In federated learning, data from different clients often exhibit non-IID characteristics. This can pose challenges, such as affecting model convergence, and overall accuracy. Techniques such as data augmentation, domain adaptation, or personalized models are employed to mitigate these effects.

  • Data Privacy Risks: Federated learning trains models locally, which helps to maintain privacy. However, the exchange of updated parameters and gradients during the training process can potentially expose personal information leading to privacy breaches.

  • Communication Inefficiencies: The rapid growth of IoT applications and AI models has resulted in a huge increase of data at edge nodes, causing high computational and communication costs.

These challenges highlight the critical need for advanced solutions to secure federated learning environments.

Contributions of the Paper: This study presents the key contributions aimed at enhancing the security of federated learning. To put this review paper in its proper context and enable comparison with existing works, we refer readers to Table 2. To the best of our knowledge, no recent study has covered this subject in detail. To fill this gap, we performed a systematic study of multiple security models that have been proposed for FL, along with the associated security issues. Therefore, in this SLR, our aim is to uniquely integrate the neural networks and feature engineering methods used in IDS. In addition, we seek to advance privacy techniques to provide a comprehensive understanding of their contributions to enhancing FL security. The main contributions of this study are as follows.

  1. 1.

    We provide a systematic literature review of 88 research articles related to securing FL with IDS.

  2. 2.

    This study provides an in-depth analysis of NN-based IDS models proposed for securing federated learning, focusing on their strengths, weaknesses, and real-world applications.

  3. 3.

    Another contribution is a comprehensive analysis of feature engineering techniques, such as dimensionality reduction and feature selection, and other advanced techniques discussed in the section 6. It also discusses their impact on malicious client detection.

  4. 4.

    Here analyzed advanced privacy techniques with their challenges in the federated learning environment. Such as differential privacy and federated averaging etc to assess their effectiveness in protecting FL.

  5. 5.

    We highlight open research issues and provide insights on current limitations and future research studies.

The remainder of this paper is structured as follows: Sect. 2 presents the methodology of our systematic literature review. In Sect. 3, we discuss recent surveys. Section 4 presents a background analysis of the basic terms. A brief analysis of the findings of research studies is presented in section 5. Sect. 6 examines feature engineering techniques and their relation to neural network models. Section 7 briefly analyzes these techniques. The advance security models will be discussed in 8. Recommendations, open challenges, and possible opportunities for future work are discussed in Sect. 9. Finally, Section  10 properly conclude this literature review.

2 Methodology of sytematic literature review

This section presents the methodology adopted in this systematic review. The motivation for this study was to systematically find, analyze, and synthesize all research articles related to securing federated learning. Therefore, the SLR was conducted by following Kitchenham’s methodology Kitchenham et al. (2009). Fig. 1 presents the overall structure of our literature review.

Fig. 1
figure 1

Overall structure of our systematic literature review

2.1 Research objectives

  1. 1.

    The objective of this study is to provide a systematic literature on existing research on the development of neural network-based IDS in the context of Federated Learning.

  2. 2.

    Therefore, the other primary objective is to examine and evaluate feature-engineering approaches that improve neural network performance. Furthermore, it aims to determine how these techniques can increase the accuracy of malicious client detection.

  3. 3.

    Also, this SLR is to analyze the advanced privacy techniques that are used to enhance data privacy.

  4. 4.

    To identify and analyze gaps in the existing literature, additional research could contribute to the development of more robust, efficient, and scalable models to address the security challenges in federated learning.

  5. 5.

    Based on the identified research gaps and challenges, we propose future research directions that could lead to substantial advancements in the security of federated learning networks.

2.2 Search process

2.2.1 Databases for searching

In this SLR, relevant papers were explored from published and archive repositories. 5546 papers were sourced from heterogeneous databases, such as Elsevier, IEEE Xplore, MDPI, Wiley, Springer, arXiv, Scopus, and the ACM Digital Library. In addition, the most common search engine, Google Scholar, was explored to obtain relevant papers. The retrieval period ranged from 2021 to October 2024.

2.2.2 Search string

To find the most relevant articles, We used multiple key research terms such as “Federated Learning”, “Intrusion Detection Systems”, “Neural Networks”, “Feature Engineering”, and “Data Privacy-Preserving”. These key terms were combined with the conjunction operator (AND) and (OR). The query searches for the specified terms in the title, abstract, or keywords of the paper.

2.3 Inclusion and exclusion criteria

2.3.1 Inclusion criteria

The research articles are filtered by proper inclusion criteria. The inclusion rules are mentioned below:

  1. 1.

    The studies that contain the keywords “Intrusion Detection Systems” AND “Federated Learning” in the title or abstract of the research paper are selected.

  2. 2.

    Research articles are considered that are published from 2021 to October 2024.

  3. 3.

    Research papers are selected that are well-written in the English language.

  4. 4.

    Selected studies must be result-oriented. Some solid evidence and experimentation must support the proposed methodologies and their ultimate outcomes.

2.3.2 Exclusion criteria

The following rules were defined for exclusion criteria.

  1. 1.

    Articles written in a language other than English.

  2. 2.

    Articles that were not available in full text.

  3. 3.

    Articles that do not discuss IDS-based security models for FL

2.4 Selection of studies

Our screening strategy comprised four steps for selecting relevant articles. In the identification step, 5546 papers were sourced from heterogeneous databases. The titles and abstracts of the extracted articles were reviewed to ensure there was no repetition, as it is common for an article to be published in one venue (journals or conferences) and also on a preprint server, such as arXiv, to gain more visibility, citations, and feedback. Additionally, we ensured that the selected papers were related to IDS in FL settings. However, since the title and abstract alone may not provide a clear reflection of the paper, we proceeded to the screening step, where the titles, abstracts, and full texts of the articles were reviewed. At this stage, 156 relevant papers were selected. In the eligibility step, 120 articles were chosen based on the inclusion and exclusion criteria. Ultimately, 88 articles were selected for in-depth analysis. Figure 2 illustrates the flow of the screening process for relevant articles.

Fig. 2
figure 2

Kitchenhem et al. (2009) flow diagram of the systematic literature review phases

2.5 Data extraction and synthesis

After finalizing the research articles, we read them in full detail to determine the basis for categorization. The key factors considered include the intrusion detection system, the neural network model, the feature engineering methods, privacy preservation techniques, security issues, the limitations of the study, the data set used, the domain of the study, the validation techniques, and the experimental results. Additionally, we analyzed the advantages, disadvantages, goals, approaches, system setup, and security aspects. This process enabled a comprehensive synthesis of current research trends and the identification of research gaps in the emerging field of federated learning. A detailed description is presented in Table 1:

Table 1 Description of review categories

3 Comparison with recent surveys

Table 2 provides a detailed analysis of recent surveys in the domain of federated learning (FL). Before 2024, survey papers (Çevik and Akleylek 2024; Gaber et al. 2024; Cui et al. 2023; Hernandez-Jaimes et al. 2023) focused on covering traditional intrusion detection systems to secure federated learning frameworks. With the rapid development of deep learning techniques, Li et al. (2021), Xie et al. (2024), Nguyen et al. (2022) reviewed various deep learning models. In addition, some review articles (Issa et al. 2023; Belenguer et al. 2022; Ge et al. 2023; Gugueoth et al. 2023; Venkatasubramanian et al. 2023) have discussed federated learning, intrusion detection systems, and neural networks, as well as the integration of these technologies. The Table 2 summary gives a comprehensive view of federated learning in different domains such as IoMT, IoT, Vehicular networks (IoVs), and Industrial IoT 4.0. This diversity shows the broad applicability of the technologies discussed but also highlights the need for more domain-specific surveys to address unique challenges and requirements. However, our study distinguishes itself from existing studies by presenting a more focused and comprehensive analysis of IDS in FL environments. While existing surveys explored the general use of IDS and NN in federated learning, they often do this within specific contexts. In contrast, our work differs from previous studies in that it integrates neural network models, feature engineering methods, and privacy-preserving approaches, offering a holistic framework to address the unique security challenges in FL environments adequately. We then enumerate new research directions and strategies needed for the critical advancement of federated learning security through synthesizing the literature in this area.

4 Background analysis

When developing new models, whether in federated learning (FL) or other fields, having a good understanding of fundamental concepts is of utmost importance for developing advanced research. FL, IDS, NN, and FE are interconnected domains that collectively advance cybersecurity and distributed machine learning. In this section, we first present the basic concepts of federated learning (FL), intrusion detection systems (IDS), neural networks (NN), and feature engineering (FE) and discuss the detailed working principles of these domains.

Table 2 The summary of the related surveys and their key contributions

4.1 Fundamentals of federated learning (FL)

In FL, models are trained locally by each client using their private data, and only model updates are shared with a central server for aggregation, as illustrated in Fig. 3. Moreover, advanced data privacy techniques are used at this stage to preserve client privacy. Finally, various aggregation methods such as federated averaging (Fedavg) and federated model heterogeneous Matryoshka Representation Learning (FedMRL) are used. The aim is to improve model privacy while minimizing communication costs (Yi et al. 2024). However, the common aggregation process is executed by using the Federated Averaging (FedAvg) algorithm, as in Eq 1.

  • \(H_n\): The global model at the n-th iteration.

  • \(U_{k,n}\): The local model update from the k-th client at the n-th iteration.

  • K: The total number of clients.

  • \(w_k\): The weight associated with the k-th client, is often based on the size of the local dataset.

The update of the global model can be represented by:

$$\begin{aligned} H_{n+1} = \sum _{k=1}^{K} \frac{D_k}{\sum _{j=1}^{K} D_j} U_{k,n} \end{aligned}$$
(1)

here, \(D_k\) denotes the size of the local dataset for the k-th client, ensuring that each client’s updates contribute proportionally to their dataset size. These generalizations and re-parametrization of the FedAvg algorithms are known as ‘FedProx,’ which concerns the heterogeneity of systems (Li et al. 2020). Moreover, Federated Matched Averaging (FedMa) and Federated Optimization (FedOpt) aggregation algorithms have also been used in FL (Wang et al. 2020). In addition, federated learning is divided into three types based on data partitioning: Horizontal Federated Learning (HFL), Vertical Federated Learning (VFL), and Federated Transfer Learning (FTL). Horizontal federated learning or partitioning is used with general models across similar entities; vertical partitioning uses different data types for specific models; and federated transfer learning uses the combined approach of both models. Furthermore, these approaches can be classified as cross-silo (few clients with large data volumes) and cross-devices (many smaller devices). To enhance the scalability and efficiency of the model, only the model parameters are communicated during training, thereby reducing the bandwidth usage.

Fig. 3
figure 3

Working principal of federated learning process with intrusion detection system (Gupta et al. 2021)

4.2 Intrusion detection systems (IDS)

Intrusion Detection Systems (IDS) are essential for detecting attacks and anomalies in FL through host-based (Lee et al. 2021), signature-based, and network-based monitoring (Wu et al. 2020). A comparison of these methods is discussed in Table 3. However, AI-based IDS can detect unknown or zero-day attacks using adaptive resonance theory (Bukhanov and Polyakov 2018), genetic algorithms (Yunwu 2009; Ghani et al. 2024b), clustering, fuzzy logic (Dave and Sharma 2014), and deep learning models.

Table 3 Comparison of host-based, network-based, and signature-based IDS

Figure 3 illustrates the decentralized training process, model aggregation, and anomaly detection while maintaining privacy. For example, distributed storage units can store data on smart devices. These units served as repositories for the training and testing datasets. The IDS processes these data to recognize patterns and classify activities as normal or abnormal. A federated cloud server aggregates the model weights from all local IDS models to form a global model and then broadcasts the updated parameters back to the local units to fine-tune their IDS models. This iterative process implements a data-sharing approach that allows anyone with the required computational power to enhance the accuracy of anomaly detection without sacrificing privacy. The testing datasets within each local storage unit evaluate the performance of the IDS model after the updates.

4.3 Neural networks and feature engineering

Federated Learning (FL) integrates advanced neural networks into an IDS that tackles the problems of privacy and data heterogeneity. Neural networks excelled in pattern extraction and learning from data, making them effective for identifying malicious activities in distributed networks. NN architectures serve as function-approximation algorithms that transform data into information. When combined with feature engineering, these models extract deep features from data packets, thereby enabling better classification and discrimination of malicious clients.

Figure 4 demonstrates the workflow of data processing in the federated learning (FL) environment with Neural Networks (NN), and Feature Engineering (scaling. Encoding, selection) techniques. Advanced privacy techniques are applied at aggregation to emphasize the collaboration between components in optimizing features and maintaining data privacy. Commonly used neural networks are, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Auto-Encoders (AE), Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and hybrid Neural Networks.

Fig. 4
figure 4

Architecture of the neural network and feature engineering within federated learning framework

Federated Learning (FL), Intrusion Detection Systems (IDS), Neural Networks (NN), and Feature Engineering (FE) together form a robust framework to address critical challenges in distributed machine learning. FL ensures privacy-preserving model training by decentralizing data processing, while IDS enhances security by identifying anomalies and attacks across distributed systems. Neural networks provide advanced computational power for learning complex patterns in network activity, and feature engineering optimizes raw data for better model performance.

5 Related work

This section summarizes the findings related to the research objectives defined in section2. It discusses the applications of FL-based IDS within various domains and provides a comprehensive review of the literature of advanced studies, including empirical results and analyses.

5.1 Artificial neural network (ANN)

This section reviews several studies that leverage Artificial Neural Networks (ANNs) as a security measure within the federated learning (FL) framework. Specifically, ANNs play an important role in an Evolvable IDS (EIDS) by detecting malicious clients. Moreover, federated learning improves privacy by distributing ANN across multiple devices and by aggregating updates from these devices. This approach increases the model efficiency and significantly reduces the latency associated with model updates. Table 4 summarizes the key studies, highlighting the balance between performance and privacy techniques for securing FL through ANN-based anomaly detection.

Table 4 Artificial neural networks (ANN)

Saura et al. (2022) presented the CyberSec4Europe project, aimed at improving anomaly detection in the Open Banking use case. This model integrates an anonymization technique with a malware information-sharing platform (MISP) to ensure a secure exchange of cyber threat information. MISP focuses on encrypting personal information to protect the privacy of individuals. However, the model is resilient to synthetic datasets and has limited evaluation metrics.

Extending the work in the healthcare domain, Ashraf et al. (2022) deployed the FIDChain model in the IoMT network, where blockchain was integrated with an ANN to preserve user privacy while training models on edge devices. A more secure model was developed by Si-Ahmed et al. (2024), which combined ANN with explainable AI (XAI) to identify deviations from normal behavior and detect zero-day attacks. The integration of XAI enhances the transparency of the decision-making process in the neural networks. Comparatively, FIDChain achieved an impressive accuracy of 99.00% but faced scalability challenges due to its use of blockchain. In contrast, the model with XAI improves regulatory transparency by adding complexity but reduces performance in resource-constrained IoMT devices. An 83% lower recall rate indicates that the proposed model did not detect all malicious clients in the heterogeneous nature of the data.

Similarly, Awan et al. (2023) proposed the FedTrust framework for securing IoT networks. It detects anomalies using a combination of ANN and deep federated learning (DFL). In this model, data was classified by trust metrics, such as reputation and experience, by offering a decentralized and adaptive solution. Related to this, the Federated Learning Adaptive Detection (FLAD) approach was employed to classify DDoS attacks in non-IID and unbalanced datasets. By adjusting the computational load, packet- and flow-level features were used for classification. The computational load was assigned to clients based on the complexity of learning about local attacks. Additionally, homomorphic encryption employ sophisticated techniques and secure aggregation protocols to maintain privacy. This model consistently delivers high accuracy and precision in the above studies mentioned in Table 4 and illustrates the trend toward employing various methods to uphold privacy standards. Fedtrust offers 93.00% security by focusing on the trust-management system. However, FLAD excels in advanced privacy techniques and adaptive learning for complex attacks.

With the more critical network of industrial Internet of Things (IIoT), Salim et al. (2024) proposed FL-CTIF. In this framework, information fusion with ANN excels the generalizability of data, enhances data representation from different data streams, and improves the detection accuracy of cyber-attacks. In contrast, Gayathri and Surendran (2024) deployed a unified ensemble FL model in Wireless Sensor Networks (WSNs). Hybrid machine learning models identify anomalies and environmental changes such as temperature, humidity, and potential faults. Cloud computing complements this model while providing data confidentiality. Its reliance on the local data quality and communication overhead may affect its real-world applicability.

Synthesis and Analysis: The integration of Artificial Neural Networks (ANN) with advanced privacy techniques has significantly improved data privacy and model performance in decentralized anomaly detection systems. Techniques such as blockchain and homomorphic encryption have shown promise in addressing critical privacy and security challenges. However, their application introduces notable scalability and computational efficiency concerns, particularly in resource-constrained environments. Trust management frameworks, like FedTrust, enhance model performance in decentralized networks, yet achieving a balance between accuracy and efficiency remains a persistent challenge in heterogeneous IoT systems. Furthermore, the lack of real-world validation across diverse datasets, such as Wireless Sensor Networks (WSN) and evolving financial fraud scenarios, limits their practical applicability in addressing dynamic cyber threat landscapes. To bridge these gaps, future research should develop hybrid approaches that optimize computational efficiency without compromising privacy. Validation on diverse real-world datasets and evolving cyber threat scenarios will be crucial to enhancing the generalizability of these models. Additionally, addressing communication overhead and ensuring high-quality data within federated learning (FL) systems, particularly in WSN and IoT environments, will further improve their scalability and applicability.

5.2 Multilayer perceptron (MLP)

MLPs are highly efficient in intrusion detection systems (IDS). For example, models such as MLP1 and MLP2 achieved detection accuracy rates exceeding 95.6% Otoum et al. (2021). Furthermore, the integration of MLPs into Federated Reinforcement Learning for IoT networks further enhances performance, with detection rates and accuracy reaching approximately 96.5% and 0.98%, respectively, Chen et al. (2021). Current research trends focus on optimizing the MLP architecture using techniques such as hyperparameter optimization using genetic algorithms and backpropagation. The mathematical foundation of an MLP is its ability to map input values to output values through an architecture comprising an input layer, one or more hidden layers, and an output layer. During the model training process, each connection between nodes is assigned a numeric weight, which is fine-tuned to improve the performance.

Table 5 examines the efficiency of federated learning with MLP. For example, Neto et al. (2022c) the FedSA (Federated Simulated Annealing) model achieves high performance by reducing the number of rounds in the training process. Hyperparameters such as learning rates and client selection are leveraged to increase performance. The need for a more secure IoT framework is addressed by Zhang et al. (2022a), who proposed a cloud-edge collaborative framework using FL while preserving data privacy and confidentiality. Anomalies are detected locally by training deep learning-based IDS at each edge device. Similar to FedSA, this model also emphasizes optimizing learning efficiency. Extending the work on securing non-IID datasets, Lin et al. (2023) proposed a solution to handle vulnerabilities arising in IoT networks. The study integrated three methods-maximum value, union, and federated transfer learning-to manage variability in attacks and ensure more consistent performance in diverse environments. Another innovative approach was presented by Shen et al. (2024), which introduces the Federated Learning Ensemble Knowledge Distillation (FLEKD) framework to enhance IDS in heterogeneous IoT networks. This model effectively handles the heterogeneous data in an IoT network; however, it requires computational overhead in big data. Another lightweight mini-batch FL framework was presented by Ahmad and Shah (2024), which was employed to detect network attacks in IoT environments using MLP, focusing on privacy preservation and minimizing computational demands.

Djaidja et al. (2024) used MLP to employ FL to 5G and beyond networks, addressing the non-IID nature of the network. Advanced aggregation algorithms such as FedAvg, FedProx, FedPer, and SCAFFOLD were used to build a global model that detects anomalies like Neptune (DoS). However, the paper highlights challenges in handling non-IID data, with models like FedAvg struggling with weight divergence. SCAFFOLD shows better convergence, though none of the FL models match the accuracy of centralized models.

Finally, Campos et al. (2024) proposed a model for securing intelligent transportation systems (ITS). This research study addresses the limitations of traditional cryptographic methods for detecting position spoofing attacks. However, challenges such as imbalanced data and potential overfitting were identified in supervised learning. Authors suggest exploring unsupervised approaches and edge computing for partial aggregation to improve scalability and performance in real-world deployment of (ITS).

Synthesis and Analysis: The studies summarized in Table 5 demonstrate the diverse applications of MLPs within federated learning frameworks and their role in enhancing privacy, efficiency, and anomaly detection capabilities. Models like FedSA excel in computational efficiency, while frameworks like FLEKD focus on addressing data heterogeneity, although with higher computational demands. Advanced aggregation algorithms, such as FedAvg and SCAFFOLD, are promising solutions for handling non-IID data, though challenges such as imbalanced data and computational overhead persist.

From a privacy perspective, techniques such as cloud-edge federated learning reduce privacy leakage by processing data closer to its source, particularly in MCS tasks. Despite these advancements, further research is needed to address the trade-offs between scalability, computational efficiency, and performance. Exploring unsupervised approaches, and edge computing could lead to more robust and scalable deployments in heterogeneous environments

Table 5 Multilayer perceptron (MLP)

5.3 Convolutional neural networks (CNN)

Implementing Convolutional Neural Networks (CNN) within FL-based IDS is a significant advancement. In CNNs, data is first reduced to two dimensions for feature extraction and then processed through the Deep Convolutional Neural Network (DCNN). This approach optimizes the network structure by integrating a Maxout multilayer perceptron within the CNN model and enhances the convolutional layer’s capability to extract foreground target features. Experimental results show that this method significantly reduces training time while maintaining a high detection rate and improving data protection measures, outperforming traditional intrusion detection techniques (Liu et al. 2023a).

Table 6 Convolutional neural network (CNN)

Table 6 summarizes CNN’s key aspects and effectiveness in literature. For instance, authors (Deng et al. 2022) proposed the CS-FL model, a novel FL-based IDS that integrates FL with Self-Attention Fusion Convolutional Neural Networks (CNNs) to focus on relevant features and improve classification. In contrast, Zhao et al. (2022) designed a semi-supervised FL-based IDS for securing edge devices in IoT. The proposed model employed knowledge distillation to enhance classifier performance for non-IID data, incorporating a combination of hard label strategy and voting mechanism to reduce communication overhead. Knowledge distillation is a technique where a compact model learns from a larger and more complex model, transferring knowledge to enhance model accuracy and privacy. In parallel, Thein et al. (2024) addresses the challenge of data heterogeneity and poisoning attacks in IoT networks by personalized federated learning (PFL). This model combined the Mini-batch logit adjustment loss and the server’s cosine similarity of local models to identify malicious clients. As for the more critical infrastructure, Jia et al. (2024) proposed the Federated Dynamic Gravitational Search Algorithm (Fed-DGSA). It also optimizes the weight updating process by introducing dynamic weights and random perturbations, addressing inefficiencies in traditional aggregation methods like Fed-Avg. This was designed to enhance the performance of IDS for Artificial Intelligence of Things (AIoT).

In contrast to the semi-supervised learning approach, authors (Onsu et al. 2023) proposed (SBA) an unsupervised federated learning model to tackle the challenges of data poisoning attacks. The proposed solution employs a two-step detection process: first, K-means clustering was used to classify clients based on their validation scores, excluding those with poor scores as potential malicious actors. Second, Score-Based Aggregation (SBA) assigns participation weights to clients based on validation accuracy, reducing the influence of corrupted updates by giving more weight to trustworthy ones. However, the model still fails to deal with partially malicious clients in IoT networks. Finally, FedACNN (Man et al. 2021) was presented for edge-assisted IoT environments. FL with CNN was used to handle non-IID data and preserve privacy. IDS detects attacks like DOS by dynamically weighting client contributions based on their impact on the global model’s accuracy. This also helps identify potential malicious clients, while an attention mechanism reduces communication rounds and improves model convergence. The study highlights limitations in addressing complex attack patterns and non-IID data aggregation, with future work focusing on encrypted traffic analysis and improved detection in diverse IoT environments.

The DetectPMFL model proposed by Zhang et al. (2022b), targets the challenge of unreliable industrial agents in FL, which could degrade the model accuracy by contributing low-quality data. The detection mechanism calculates the credibility of agents while ensuring privacy by integrating an advanced technique. This encryption allows computations on encrypted data without decryption, enhancing privacy and securing data during transmission and storage. The use of blockchain was suggested in this model for more privacy. Similarly, Liu et al. (2023b) designed a delay and energy-efficient asynchronous federated learning (DEAFL-ID) model for intrusion detection in heterogeneous (IIOT) networks. DEAFL-ID overcomes the high training cost, by deep Q-network(DQN) algorithm. Furthermore, the Hybrid sampling-assisted DEAFL-ID addresses data imbalance while maintaining high detection accuracy in the IIoT network.

To secure the smart airports (Chen et al. 2023) deployed Knowledge Distillation (KD)-based CNN-GRU model. Deferentially private FL framework employs KD to enhance model performance, particularly in managing various cyber intrusions. It depended on optimal parameter settings in the adaptive update mechanism, indicating improvements in dynamic security networks. On the other side (Bin et al. 2023), a distributed intrusion detection framework was presented to robust the power systems by keeping data localized. On the other side, the Heterogeneous Federated Learning for Intrusion Detection based on Knowledge Distillation (MHFLID-KD) model proposed by Gao et al. (2022). MHFLID-KD groups clients with similar model architectures and applies dual teacher-student knowledge distillation within each group. The top two performing models in each group act as teacher models, distilling knowledge to weaker student models, thus improving detection accuracy. Rather than direct model aggregation “knowledge transfer” enhances the model performance. However, the system lacks the focus on inter-group model aggregation. To enhance the robustness and scalability. In the future, authors suggest exploring “zero-data knowledge distillation.”

Li et al. (2023) employed an advanced technique of Dynamic Weighted Aggregation Federated Learning (DAFL) with CNN to address the challenges of sufficient attack data and protect privacy. Backdoor attacks, which are a major security threat, were addressed by Backdoor Detection via Feedback-based Federated Learning (BaFFLe). Andreina et al. (2021) integrates a feedback loop to detect these attacks by having randomly selected clients validate the global model each round. In BaFFLe, detection is based on analyzing per-class error rates, with unusual misclassifications indicating potential backdoor activity. Additionally, the Local Outlier Factor (LOF) method is used to identify model outliers across rounds, flagging backdoor attacks. BaFFLe reduces its ability to detect early-stage backdoor attacks, and it focuses primarily on backdoor detection, with plans to expand defenses against other poisoning attacks in future work.

Synthesis and Analysis: Table 6 summarizes CNNs key aspects, and the evolution of FL-based IDS models demonstrates a promising direction for securing IoT networks. But still, there is room for more improvement. Such as all models include their inability to completely address complex attacks such as model poisoning attacks, data poisoning attacks, and adversarial attacks in diverse networks. Additionally, FedACNN and SBA show promise in addressing the issues of data heterogeneity and optimal communication efficiency through dynamic weighting and client classification, but these solutions are not always scalable or resilient against multi-modal attacks. The adoption of privacy-preserving FL, combined with techniques like CNNs, knowledge distillation, and attention mechanisms, helps to enhance both detection accuracy and model efficiency. But causes the issue of computational overhead. Knowledge distillation and attention mechanisms provide significant improvements in model accuracy and efficiency, but they still face scalability issues in large-scale deployments. Future research should focus on integrating encrypted traffic analysis, zero-data knowledge distillation, and more robust aggregation methods to handle a wider range of attack vectors in real-time applicability of these systems.

5.4 Auto encoder (AE)

Auto Encoder enables efficient feature extraction and model training in the IDS framework. Training an autoencoder on each client’s data and transmitting encoded information can reduce data transfer by over 95% compared to traditional methods, thereby conserving bandwidth (Kasturi et al. 2022). Furthermore, this approach improves accuracy by more than 3% in non-IID data distribution scenarios, demonstrating its robustness in diverse data environments (Kumar and Babu 2022). An autoencoder works by learning a compressed representation of input data to reconstruct the original input in an unsupervised manner. It employs an encoder that maps the input into a latent representation while the decoder represents the output. This method utilizes both labeled and unlabeled data in a semi-supervised manner, achieving better accuracy compared to traditional supervised models, while being more cost-effective and efficient.

Autoencoders are particularly effective in detecting botnet attacks. Sudhina Kumar et al. (2022) introduces a secure FL approach with IDS in an IoT environmen. Using a deep auto-encoder model, each IoT device locally detects anomalies, such as Botnet attacks, by training on its data. Blockchain integration was suggested as future work to preserve privacy. Parallely, Regan et al. (2022) proposed a deep autoencoder model that was used in IoT to address anomalies such as source IP spoofing while maintaining data privacy. An advanced FedEx model by Huong et al. (2022) was introduced for anomaly detection in IoT-based Industrial Control Systems (ICS) within Smart Manufacturing environments. The system combines the Variational Autoencoder (VAE) and Support Vector Data Description (SVDD) to detect anomalies based on deviations from normal patterns, addressing both cyberattacks and operational anomalies. Authors Huong et al. (2021) address the challenge of detecting cyberattacks in IIoT-based Industrial Control Systems (ICS). The proposed FL architecture uses a hybrid model of Variational Autoencoder (VAE) and Long Short-Term Memory (LSTM) networks to capture both local and long-term data dependencies. Anomalies were detected by calculating prediction errors and comparing them to an optimized threshold determined by the Kernel Quantile Estimator (KQE).

Further refining the AE Aouedi et al. (2022a) introduces a semisupervised learning model to overcome the lack of labeled data while maintaining privacy in the IIoT network. This model is integrated with joint announcement protocols to reduce communication overhead and enhance system resilience. In contrast, Aouedi et al. (2022b) deployed the same model to address challenges such as privacy concerns, bandwidth overhead, and the need for extensive labeled data in 5 G and beyond networks. The approach combines unsupervised learning using Autoencoders (AE) on edge nodes to capture features from unlabeled local data, with supervised learning on a small labeled dataset at the central server to refine the model.

In healthcare, AnoFed was introduced by Raza et al. (2023), using Transformer-based Autoencoders (AE), Variational Autoencoders (VAE), and Support Vector Data Description (SVDD). Anomalies were detected through a two-step process: first, edge devices locally train AEs/VAEs on their data to minimize reconstruction loss, signaling anomalies with higher losses. Second, SVDD is applied with kernel density estimation to adapt changes in data distribution, providing adaptive anomaly detection. This approach ensures sensitive health data remains local while maintaining effective detection accuracy for cardiac arrhythmias and other abnormal ECG signals. Lastly, Tian et al. (2021) addresses the challenge of distributed threat detection in IoT and Cyber-Physical Systems (CPS) environments, focusing on issues arising from non-IID data and asynchronous FL constraints. The authors propose a novel approach, Delay Compensated Adam (DC-Adam), which uses Taylor Expansion-based gradient compensation to handle gradient delay and resource limitations, enabling stable anomaly detection despite asynchronous updates. The system detects cyberattack-related anomalies using a Denoising Autoencoder (DAE). The server asynchronously aggregates these updates to enhance a global model, ensuring privacy through Federated Learning (FL).

Synthesis and Analysis: FL demonstrates its adaptability across diverse application domains.Table 7 highlights the significant role of federated learning (FL) as a predominant privacy-preserving technique employed in 7 out of 8 studies. For instance, Advanced encryption techniques, such as AnoFed Raza et al. (2023), are crucial in healthcare, where real-world datasets like PhysioNet highlight the role of autoencoders in protecting sensitive patient information. Variants like Variational Autoencoder (VAE), Transformer-based AEs, and Denoising Autoencoders (DAE) handle different data complexities, with LSTM-based models capturing long-term dependencies and SVDD refining anomaly thresholds. Scalability remains a significant challenge. Approaches like Delay Compensated Adam (DC-Adam) have been proposed to address resource constraints on edge devices (Tian et al. 2021). Techniques such as reinforcement learning have improved the handling of non-IID data, particularly in client selection. Models like FedEx and AnoFed enhance explainability and adaptive detection. These features make Auto Encoder well-suited for real-time, high-interpretability applications in industrial control systems and healthcare. Future research should focus on integrating more advanced techniques, such as self-supervised learning and edge-optimized algorithms, to further enhance anomaly detection frameworks’ scalability, explainability, and robustness. By addressing these challenges, federated learning and autoencoder-based approaches can extend their applicability to a broader range of real-time, privacy-sensitive environments.

Table 7 Auto encoder (AE)

5.5 Recurrent neural network (RNN)

Combining the framework of Federated Learning (FL) with Gradient Boosting Decision Trees (GBDT), as implemented in FEDFOREST, significantly enhances the efficiency and interoperability of IDS Dong et al. (2022). Recurrent Neural Networks are particularly effective for sequential data due to their “memory” capability, which captures dependencies over time, making them useful for applications such as language modeling, machine translation, and speech recognition. Variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) address issues like the vanishing gradient problem, further enhancing their performance. Authors (Driss et al. 2022) presented this model combines GRU with a Random Forest-based ensemble method to improve detection accuracy by combining model predictions in Vehicular Sensor Networks (VSNs). Similarly, Yakan et al. (2023) addresses the challenges of detecting misbehavior in V2x communication within 5G edge networks. The capacity of LSTMs to capture temporal dependencies proved useful in identifying deviations in vehicular speed and position, addressing integrity concerns in Cooperative Intelligent Transport Systems (C-ITS). To solve the issues of cyber attacks in the Fog-IoT network (Radjaa et al. 2023) used LSTM in FDL- IDS to classify attacks. For industrial IoT (Chander and Upendra Kumar 2023) introduce the MFSDL-ADIIoT model for anomaly detection. The model integrates several key components such as: data preprocessing, feature selection using a Deer Hunting Optimization Algorithm (DHOA), and classification through a Cascaded Recurrent Neural Network (CRNN). This approach captures complex patterns in high-dimensional IIoT data, improving the detection rate. A similar IIoT (NR et al. 2024) study introduces an FTL-based IDS applying Principal Component Analysis (PCA) for feature reduction and using the Walrus Optimization Algorithm for feature selection.

Table 8 Recurrent neural network (RNN)

Rieger et al. (2022b) designed the advanced model Deepsight to combat backdoor attacks in FL, where malicious clients inject poisoned updates to compromise model integrity. DeepSight employs Normalized update energies (NEUPs) and Division Differences (DDifs) to analyze abnormal model updates. Additionally, a voting-based model filtering scheme combines classifiers with clustering algorithms to group similar updates and enhance the detection of poisoned contributions.

The work of Khan et al. (2022a) highlighted the use of Simple Recurrent Units to enhance the security of IoT-augmented industrial control systems (ICSs) by detecting cyber-attacks in real-time. In the context, of securing smart buildings (Sater and Hamza 2021) designed FL-based IDS model is for temporal anomalies related to energy consumption and operational metrics, which may indicate equipment malfunctions or security threats. Another model proposed by Mothukuri et al. (2021) to address the malicious clients issue in IoT network. This model employed LSTM and GRU to analyze time-series data.

Finally, Rieyan et al. (2024) introduces multilayer IDS with Behaviour-based and anomaly-based detection techniques for the IoMT network. The system leverages an enhanced Simple Recurrent Unit (SRU) network, which includes skip connections and a bidirectional structure to mitigate gradient fading and improve the analysis of sequence data. A dynamic behavior aggregation strategy is employed, adjusting the number of participating clients based on training time and accuracy and optimizing communication in the FL framework. The system also integrates explainable AI and a bloom filter module for efficient memory usage communication overhead, further enhancing its applicability in resource-constrained environments. Khan et al. (2021) presents DFF-SC4N; this model employs Gated Recurrent Units (GRUs) for intrusion in Supply Chain 4.0 networks. It enables local servers to train on their data while preserving privacy.

Synthesis and Analysis: Table 8 provides a comprehensive view of all studies utilizing RNN and its different variants with FL-based IDS. In Table 8, federated learning was a dominant privacy technique. In Table 8, the use of real-world datasets like the Reddit dataset Rieger et al. (2022b) and the Car Hacking dataset (Driss et al. 2022) demonstrates the applicability of RNNs in detecting anomalies in real-world environments. However, challenges like gradient vanishing, data complexity, and the need for scalable solutions persist. Future research should focus on further refining these models to better handle multi-modal data, asynchronous updates, and adaptive threat detection in large-scale distributed systems.

5.6 Deep neural network (DNN)

The deep architecture of Deep Neural Networks (DNN) allows it to handle complex and hierarchical data representations, making it practical for challenging tasks such as image recognition, natural language processing, and anomaly detection. Typically, DNN consists of an input layer that receives the data, multiple hidden layers that extract and combine features, and an output layer to produce the final predictions or classifications (Mills et al. 2021). Wang et al. (2023) detected anomalies by evaluating the mean squared reconstruction error of reconstructed data, using a threshold to classify malicious traffic in IoT network. Privacy was preserved by processing data locally on IoT devices and sharing only model weights with a central server using a Mini Batch Average Aggregation scheme. As the important factor of android security (Mahindru and Arora 2022) presents DNNdroid, an FL-based framework for malware detection in Android apps. The framework uses a Deep Belief Network (DBN) to detect malicious apps, including Trojans, worms, spyware, and botnets, by analyzing app behaviors and permissions, demonstrating the flexibility of DNNs in Mobile security. Another more critical security challenge in Advanced Metering Infrastructure (AMI) was addressed by Liang et al. (2022). Toldinas et al. (2022) developed a federated transfer learning (FTL) based IDS defense mechanism. The detection process begins by transforming network traffic features (NTF) into images, which are then used to train DNNs. Differential privacy, as demonstrated by Toldinas et al. (2022), enhances security by injecting noise into the data. This model (Friha et al. 2023) addresses the problem of enhancing security in Industrial Internet of Things (IIoT) environments by developing a secure, decentralized, and differentially private FL-based IDS. The specific intervention for malicious client detection includes using Quantum-based secure exchange to secure against external parties and Differential Privacy (DP) to secure against participating parties. Authors (Popoola et al. 2023) address intrusion detection within Consumer-Centric Internet of Things (CIoT) networks. The detection process involves training the FDL models to recognize normal behavioral patterns in the datasets, allowing them to discern deviations indicative of potential intrusions. The models are evaluated across binary and multiclass classification scenarios.

Table 9 Deep neural network (DNN)

Addressing the issue of backdoor attacks Rieger et al. (2022a) presented CrowdGuard, an innovative defense mechanism. CrowdGuard effectively identifies anomalies and malicious clients through a multi-faceted detection process that analyzes the behavior of neurons in the hidden layers of local models. The detection process involves client-side analysis within Trusted Execution Environments (TEEs), which ensures the privacy of local models while enabling robust anomaly detection. Hybrid Learning Behavior Influence Maximization (HLBIM) is a hybrid federated learning system designed to ensure secure, privacy-aware learning and inference for detecting financial crimes. Guo et al. (2021) addresses the critical challenge of user activity analysis in mobile wireless networks. The mode was deployed by integrating GRU-LSTM and trying to enhance attack detection.

Synthesis and Analysis: Table 9 reviewed studies that showcase the diverse application of DNN with IDS, Federated learning as a primary privacy and security technique utilized across (Wang et al. 2023; Mahindru and Arora 2022; Popoola et al. 2023; Guo et al. 2021). Study proposed by Rieger et al. (2022a) effectively addresses the challenges posed by the distributed nature of financial data and the need for privacy and security in collaborative learning environments. Other researchers, such as Wang et al. (2023) and Mahindru and Arora (2022), may struggle to perform efficiently in large-scale, real-time environments due to the computational complexity of deep learning models. Additionally, communication overhead is a concern in FL-based systems, as seen in the work of Liang et al. (2022) and Toldinas et al. (2022), where frequent communication between devices and central servers can lead to delays and increased network congestion, especially in resource-constrained IoT environments. Another critical issue is the trade-off between privacy and accuracy, where techniques like differential privacy, used by Friha et al. (2023) and Popoola et al. (2023), may reduce detection accuracy in exchange for enhanced privacy, which is particularly problematic in sensitive sectors such as IIoT and consumer IoT. Furthermore, while specific defenses like CrowdGuard (Rieger et al. 2022a) are effective against backdoor attacks, they may not generalize well to more sophisticated or evolving threats, necessitating frequent updates. Finally, heterogeneity in data poses a challenge for models such as Guo et al. (2021), where variations in data across devices can hinder model performance and convergence, impacting the overall consistency and reliability of the system.

5.7 Graph neural networks (GNN)

Graph Neural Network architectures are designed to handle graphical data effectively, such as capturing complex relationships and dependencies. These models can work with traffic data to collaboratively train across edge devices, preserving privacy and security. Table 10 highlights key studies that demonstrate the advanced application of GNN. For example, Nakip et al. (2023) leverages GNNs’ strengths in analyzing network traffic, ensuring robust intrusion detection, even with diverse data distributions. Combining GNNs with federated learning efficiently enhances cybersecurity measures across various domains. For instance, authors Shen and Wang (2023) presented Blockchain-Assisted Cross-Silo Graph Federated Learning (B-CGFL) that proposes a solution for data siloed across multiple organizations. A novel algorithm, E-GraphSAGE, was used to detect anomalies by classification. Privacy preservation is ensured through secret sharing mechanisms, where model parameters are encrypted with sub-secrets before being uploaded, and blockchain ensures data integrity. The reputation-aware model incentive system in the blockchain further secures and improves the model quality through competitive collaboration between coordinators. On the other side, authors Zhang et al. (2023) address the vulnerabilities in the Controller Area Network (CAN) bus with GNN, a critical vehicle communication protocol that lacks encryption and authentication, making it vulnerable to various cyberattacks. The study offers real-time anomaly detection within 3 milliseconds. Anomalies are detected using a two-stage classifier cascade based on GNN. The first stage uses a one-class classifier for anomaly detection, designed to handle imbalanced data by focusing on regular CAN messages. The second stage classifies the specific type of attack using a multi-class classifier with an open-max layer to handle unknown anomalies.

Table 10 Graph neural networks (GNN)

With GNN, an advanced model XFedGraph-Hunter was proposed by Son et al. (2023), addressing the growing threat of Advanced Persistent Threats (APTs). By employing GraphSAGE specifically, integrated with a pre-trained transformer model, the system effectively handles the complexity of provenance graph data for detecting APT attacks. It integrates GNNExplainer for interpretable AI, allowing for transparency in the decision-making process. Authors Thi et al. (2023) address Advanced Persistent Threat (APT) detection in distributed environments with XFedHunter. The system integrates Network-based IDS (NIDS) with Provenance-based IDS (PIDS) to enhance threat detection. NIDS leverages GNN-GRU to monitor SDN traffic, while PIDS uses GNNs to analyze provenance data. The framework improves optimization and accuracy, especially in GNN-based models, and includes explainability via the SHAP framework, enabling cybersecurity experts to interpret model decisions effectively.

This Zhang et al. (2024) study introduces MDD-FedGNN, a framework designed to enhance malicious domain detection (MDD) by leveraging vertical federated learning (VFL) and Graph Neural Networks (GNNs). By utilizing GNNs and contrastive learning, MDD-FedGNN aggregates node embeddings from different institutes, improving detection accuracy, even in scenarios with noisy data. To ensure data privacy, federated learning with differential privacy is applied to node embeddings, mitigating the risk of sensitive data exposure.

Synthesis and Analysis: In comparing GNN-based models, B-CGFL and MDD-FedGNN focus strongly on privacy, utilizing federated learning to ensure data remains confidential. The CAN Bus anomaly detection model excels in real-time performance but lacks privacy, limiting its application in privacy-sensitive or collaborative scenarios. XFedGraph-Hunter and the XFedHunter models focus more on specialized threat detection techniques (e.g., APTs) with additional layers of explainability (through GNNExplainer or SHAP). However, they could be more decisive in terms of privacy features. In summary, while B-CGFL and MDD-FedGNN stand out regarding privacy and collaborative learning across organizations, the other models offer strengths in real-time detection, explainability, and handling of complex threats, each tailored to specific cybersecurity challenges.

5.8 Hybrid neural networks (HNN)

The hybrid models are significant advancements by integrating multiple neural network architectures to bolster protection against network attacks. These models, such as Generative Adversarial Networks (GANs), Reinforcement Learning (RL), and Transformers, introduce novel mechanisms to ensure security.

Table 11 Hybrid neural networks (HNN)

Their Quyen et al. (2022) model integrates Generative Adversarial Networks (GANs) for class distribution balancing and Reinforcement Learning (RL) for optimal client selection in the aggregation process. CNN and GRU-based IDS provided security by extracting spatial and sequential features to detect anomalies in Kitsune data. An advanced AI-based security model GAADPSDNN provided by Farea et al. (2024) in IoT. The model leverages DNN for classification and a genetic algorithm for feature selection while reducing computational complexity. GAADPSDNN combines the strengths of Random Forest (RF), Support Vector Machines (SVM), and deep learning models such as CNN and DNN to classify DDOS attacks. The model can be implemented in both FL and centralized machine learning. Despite the high performance of heterogeneous data in IoT, the scalability issue limits its adoption in real-time implementation. Another model, Collaborative IoT DDoS Detector (CIDD) proposed by Neto et al. (2022a) to address the same challenge. CIDD trains local models on tenant-specific traffic data and aggregates updates through a central server to enhance classification. However, the adaptability of the model in diverse IoT data is limited. Hyperparameter optimization can be used in future work to cover the range of cyber attacks. ChandraUmakantham et al. (2024) used GhostNet improves feature extraction and BiGRU for capturing temporal dependencies in IoT network traffic. Homomorphic encryption encrypts local model updates for promising security. Due to the large model, computational overhead remains a challenge.

Focusing on the security of more critical infrastructure IIoT (Rashid et al. 2023), deploying CNNs and RNNs incorporates anomaly-based and signature-based detection techniques. This study strengthens the classification process. However, challenges like outliers and adversarial attacks still arise as the model’s limitations. To tackle the security challenge at a large scale, Aouedi and Piamrat (2023) propose the Federated-Blending Intrusion Detection System (F-BIDS) model in IoT and IIoT networks. This model used Decision Trees (DT) and Random Forest (RF) classifiers as base models and a neural network meta-classifier trained in the federated learning approach to identify anomalies, including DDoS, SQL injection, and MITM attacks. In this model, real-time detection capabilities can be enhanced to introduce explainability features, improving decision-making within the system. Similarly, Fenanir and Semchedine (2023) proposes a Smart Intrusion Detection (SID) system to enhance privacy and security by training deep learning models such as DNN, CNN, and LSTM in a distributed environment. The model identifies a limitation in its lack of real-time detection focus and suggests exploring blockchain integration for further privacy and transparency in future work.

The security challenge in 5G was tackled by deploying AMI through the development of a Transformer-IDM with Hierarchical Federated Learning (HFed-IDS). Sun et al. (2022) designed Transformer-IDM that incorporates transformer layers for the categorical data and convolutional layers for the numerical data followed by the DoS, R2R, U2R, and probing attack detection. The study notes limitations regarding resource heterogeneity across devices. Authors Li et al. (2022) integrates a Transformer-based IDS with secure federated learning to detect False Data Injection Attacks (FDIA) in smart grids. The Transformer’s self-attention mechanism analyzes power data for anomalies, while federated learning and Paillier encryption ensure privacy and security. Future work will focus on detecting other cyber-attacks in more realistic settings. To tackle the security of the multi-host environment (Kwon et al. 2022) propose the Federated Hypersphere Classifier (FHC) model. Anomalies were detected by a unique technique of mapping regular input into the hypersphere. However, the limitation persists when only some hosts contain standard data. Another study, FLDID (Verma et al. 2022), was suggested for securing smart manufacturing to detect cyber anomalies like DoS, DDoS, and ransomware using a hybrid deep learning model (CNN, LSTM, MLP). Models are trained locally on edge devices, preserving privacy with Paillier-based encryption during transmission. Due to the high computational cost, future work aims to optimize encryption and cluster clients to mitigate data poisoning attacks.

Friha et al. (2022) designed FELIDS to protect agriculture IoT. It leverages deep learning models (DNN, CNN, RNN) with GRPC encryption to secure communication. However, it assumes all edge nodes are trustworthy, with future work targeting data poisoning attacks and expanding to newer IoT datasets.

The utilization of hybrid models in different traffic domains also poses significant advancements, such as He et al. (2022) developed a novel collaborative IDS for UAV networks that can utilize CGAN-LSTM to address data skew and small datasets. Blockchain secures the federated learning aggregation process, while differential privacy enhances security by adding Gaussian noise to model updates. Future work should explore the impact of heterogeneous and non-IID data on model performance to improve applicability across diverse environments. An advanced model was proposed by Khan et al. (2022b) to address the security of the Controller Pilot Data Link Communication (CPDLC) system in aviation. This system detected attacks like eavesdropping and injection by training on real and GAN-generated data. Using a DNN, gradient updates are shared between ATSCs. However, the model demands a large dataset for training. Autoencoder can be used in the future to more robust the system.

Authors Liu et al. (2022) provide a FedBatch framework for intrusion detection in Maritime Transportation Systems (MTS), leveraging federated learning to handle unstable communication and varying device performance across vessels. It addresses the issue of stragglers with an adaptive mechanism for batch aggregation. While FedBatch enhances system robustness, its reliance on periodic updates limits its real-time detection capabilities, making it more suitable for near-real-time or batch processing. Similarly, another critical traffic infrastructure of Controller Area Network (CAN) Shibly et al. (2022) designed Personalized Federated Learning-based IDS. The model employs supervised (CNN, XGBoost, MLP) and unsupervised (Autoencoder) learning for classification. At the same time, privacy was preserved through Secure Multi-party Computation (SMC) during gradient aggregation.

Authors Shi et al. (2023) deployed Multimodal Hybrid Parallel Network (MHPN) model enhances network intrusion detection by combining traffic statistics and raw traffic load modalities. CNNs and LSTMs are used for spatial and temporal feature extraction, while a CosMargin classifier improves anomaly detection by enforcing strict class constraints. However, its reliance on closed-set classification limits the detection of unknown attacks. Blockchain integration with this model improves privacy. Similarly (Ntizikira et al. 2023) the SP-IoUAV security framework protects UAV networks from threats like DDoS and MitM attacks using a CNN-LSTM hybrid model. While differential privacy and secure multi-party computation protect data during processing. However, the computational cost persists limitation of the model deployment. Another novel method was proposed (Suresh et al. 2024) for real-time host intrusion detection. It utilizes an Adaptive Threshold-Correlation Algorithm (ATCA) to dynamically adjust detection thresholds based on traffic patterns, accurately. While privacy was maintained through traffic similarity score metrics. However, future work with a focus on non-IID environments will improve the model’s performance. The significant security challenge addressed by Ahsan et al. (2024). FLBert model proposed for malicious client detection in Software-defined Vehicular Ad Hoc Networks (VANETs) by integrating Federated Learning (FL) with BERT for sequence classification. Data classification was performed to detect the position of falsification attacks and outperform traditional models like Random Forest and SVM in accuracy and privacy. Key privacy techniques include periodic re-initialization of the global model.

To prevent catastrophic forgetting in dynamic networks, Jin et al. (2024) presented FL-IIDS. It enables models to retain knowledge of past threats while adapting to new ones. It uses dynamic example memory and loss functions such as class gradient balancing and label smoothing to detect attacks for instance DoS and backdoor. Hybrid models are also utilized to enhance the security of Wireless Sensor Networks (WSNs). Bukhari et al. (2024) proposed SCNN-Bi-LSTM model for intrusion detection. The model combines Federated Learning with a Stacked Convolutional Neural Network (SCNN) and Bidirectional Long Short-Term Memory (Bi-LSTM) to detect Denial of Service (DoS) attacks. The system analyzes spatial and temporal data patterns. Although it reduces communication overhead, the model depends on synchronized data collection across nodes. Similarly, to tackle the challenge of 5 G/6 G IoT security authors (Luo et al. 2024) integrates a transformer-based Neural network with personalized federated learning to detect attacks. Privacy is preserved through parameter sharing, with no raw data transmission. The system’s primary limitation is its reliance on synchronous communication rounds and asynchronous collaboration will be explored to enhance security in the future.

Hybrid models also excel in the medical field, such as authors (Singh et al. 2022a) present a Dew-Cloud-based Hierarchical Federated Learning (HFL) model to detect intrusions in the Internet of Medical Things (IoMT). The system uses a Hierarchical Long Short-Term Memory (HLSTM) network to identify attacks such as XSS, MITM, backdoor, and DDoS. The system leverages differential privacy to enhance data protection. Despite its strengths, the model faces scalability and latency challenges, particularly as client numbers increase.

Synthesis and Analysis: Table 11 depicts that hybrid models have been increasingly utilized in various domains such as IoT, IIoT, IoMT, AIoT, Agriculture IoT, Telecommunication, and Vehicular traffic. This huge implementation of the hybrid model highlights their critical role in securing federated learning and giving advanced and emerging research trends. However, the challenges remain persistent in implementing many real-world applications after analyzing the above literature studies about hybrid models. Future research should explore more lightweight and more secure models that adapt to data’s dynamic and challenging nature. For privacy preservation in federated learning, more advanced privacy techniques should be researched.

The comparative analysis in section 5 of neural networks within federated intrusion detection systems underscores the interplay between feature engineering and model performance. Foundational models like Artificial Neural Networks (ANNs) provide simplicity but are limited in scalability for complex tasks. Advanced architectures such as Multilayer Perceptrons (MLPs) and Auto Encoders (AEs) optimize feature extraction and dimensionality reduction, enhancing detection accuracy for malicious clients. However, their limitations in managing high-dimensional or real-time data necessitate more robust solutions. Recurrent Neural Networks (RNNs) and Deep Neural Networks (DNNs) excel in identifying sequential and non-linear patterns, benefiting significantly from advanced feature engineering techniques like encoding and transformation. At the forefront, Graph Neural Networks (GNNs) and Hybrid Neural Networks (HNNs) demonstrate superior scalability and precision by leveraging relational data and multi-model designs, effectively addressing the complexities of heterogeneous federated learning systems. These insights reinforce the critical role of feature engineering in maximizing the utility of neural networks for malicious client detection, aligning performance enhancements with data heterogeneity and system constraints.

Parallely, privacy-preserving techniques play a crucial role in strengthening federated learning systems across diverse applications. From traditional methods like Anonymization to advanced approaches such as Blockchain, Homomorphic Encryption, and Differential Privacy, each technique offers unique benefits and challenges. Recent innovations, including Federated Transfer Learning and Knowledge Distillation, further enhance adaptability and efficiency, especially in cross-domain and resource-constrained environments. Architectures like hierarchical and decentralized FL address scalability and resilience issues, ensuring robust performance in distributed settings. This section 5 comprehensive analysis highlights the necessity of selecting privacy techniques based on system constraints, computational resources, and specific privacy requirements, laying a strong foundation for addressing the challenges discussed in subsequent sections.

6 Feature engineering techniques

Scholars have debated the theory of feature engineering, its connotations, and its applicability, emphasizing its role in addressing the curse of dimensionality caused by exponential data growth (Singh et al. 2022b). Feature engineering is crucial for optimizing the performance of neural networks by enhancing the model’s accuracy, robustness, and learning efficiency. Features are essential for detecting anomalies and malicious activities in networks, as illustrated in Fig. 5. The objective is to fine-tune the ML model to produce a feature space that fully represents the problem for predictive models (Neto et al. 2022b; Tran et al. 2022). Feature engineering comprises three key components: feature construction, selection, and extraction. Although feature selection and extraction aim to reduce dimensionality and eliminate irrelevant or noisy features, feature construction adds valuable features to the prediction model. Raw data often contains redundancies, noise, and missing values, making feature engineering necessary to address these issues. This process involves generating new features, filtering the original ones, or mapping data to solve these problems. By refining raw data and enhancing the relationship between the input and predicted variables, feature engineering identifies characteristics that yield precise and efficient results (Neto et al. 2022b; Tran et al. 2022). Table 12 lists the different feature engineering methods.

Fig. 5
figure 5

This figure shows the working flow of data preparation with NN model and feature engineering techniques

6.1 Feature selection

Feature selection involves identifying the most relevant features for use in model training. This improves the model performance and reduces the computational complexity and training time, specifically in federated learning environments. Techniques, such as mutual information, chi-square tests, and recursive feature elimination, are vital for filtering irrelevant or redundant data. In Ahmad and Shah (2024), the MLP used Correlation-Based Feature Selection with the LASSO technique, which identifies the most relevant features for a model by evaluating their correlation with the response variable. This approach reduces the dimensionality of the data and enhances model interpretability and efficiency, which are critical in complex federated learning scenarios in which computational resources are at a premium. In Saura et al. (2022) ANN model used an information gain technique for anomaly features that yielded the most significant information regarding anomalies in financial fraud detection. An 80% accuracy was achieved by isolating the features that best differentiated between normal operation and potential threats. In Wang et al. (2023) DNN model used the Mutual Information technique to select features with the highest information gain relative to the target variable, particularly in environments where understanding the interaction between features and targets is crucial for detecting adversarial attacks. Mahindru and Arora (2022) DNN model used the Chi-square test technique, which helps in determining the relevance of categorical features in datasets like Drebin. In Bin et al. (2023) CNN model with automatic feature extraction implements gated mechanisms for feature selection, streamlining the feature space to focus on the most informative attributes, which helps achieve an accuracy of 98% on the NSL-KDD dataset.

Table 12 Feature engineering methods in federated learning

6.2 Feature extraction

Feature extraction is a powerful method for decreasing the dimension, replacing multiple features with new and fewer features, and excluding unnecessary or noisy information. Compared with feature selection, which eliminates attributes for optimization, feature extraction constructs new features based on the rules of the original data. This improves the interpretability of the input data in the development of machine learning. Among these important aspects, it is possible to highlight Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), which are beneficial in federated learning, to improve the representation of important information.

In Neto et al. (2022c), Shen et al. (2024) MLP models with a feature extraction technique CICFlowMeter.V3 were used, which is essential for transforming network traffic data into structured features that can be easily analyzed using machine learning models. These techniques allow for the precise capture of patterns indicative of network attacks, thereby enhancing the detection capabilities of MLP architectures in federated settings. In Campos et al. (2024), MLP utilizing sophisticated feature extraction methods significantly enhanced the model’s ability to discern complex attack patterns in network traffic. This study shows how advanced algorithms can refine feature input to achieve high accuracy levels, underscoring the importance of tailored feature engineering strategies in IDS.

Automatic feature extraction was utilized by CNN models. This approach underscores the utility of extracting salient features that contribute significantly to model accuracy. Hybrid feature extraction has been mentioned in studies such as Chen et al. (2023), where a combination of CNN and GRU models leveraged the hybrid feature extraction model to handle complex data structures effectively, resulting in high model accuracy. Studies such as Bin et al. (2023), Thein et al. (2024), Jia et al. (2024), Li et al. (2023), Gao et al. (2022) with automatic feature extraction of CNN achieved a high accuracy greater than 95%. Onsu et al. (2023) with deep CNN achieves an accuracy of 85% by detecting the attacks from the MINST dataset.

In Regan et al. (2022), an autoencoder with the Fisher Feature Extraction method was pivotal in isolating features that are most discriminative between classes of normal behavior and Gafgyt attacks in energy usage datasets. This technique optimizes the input feature space, enabling the model to focus on the most impactful data points, thus enhancing detection precision in federated learning environments.

In Toldinas et al. (2022) DNN with Network Traffic Feature(NTF) Records Framing Techniques, which packages network data into structured forms, making it easier for the neural network to analyze traffic flow and detect DDoS attacks efficiently. The study (Zhang et al. 2023) with graph neural networks used graph feature extraction, which uses the READ (Recursive Embedding and Aggregation of Documents) method to derive meaningful features from graph data. This approach helps the model to learn complex patterns of data and their interconnections. These are important factors for the detection of intrusions. Therefore, it has been applied to horizontal and decentralized federated learning architectures. For example, in Quyen et al. (2022), GAN-based models often utilized feature extraction to preprocess data, ensuring that the model captured relevant patterns. This technique is critical for identifying complicated attacks, particularly in horizontally centralized federated learning architectures. In contrast, CNN + LSTM and transformer-based NN (Ntizikira et al. 2023; Luo et al. 2024) utilize PCA to reduce the dimensionality of data and retain the most informative features while removing noise. This technique is beneficial in horizontally decentralized federated learning environments.

6.3 Feature transformation techniques

Feature transformation is the general process of altering data types or structures in a way that dramatically improves a feature’s capacity to detect patterns and make comprehensive predictions to the algorithm. Such preprocessing methods include scaling and normalization because the values of a feature are usually not on similar scales. This consistency prevents high-magnitude features from dominating the learning of the model, which is critical in algorithms that compute distances between data points, such as k-nearest neighbors (KNN) and gradient descent-based optimization in neural networks. Min-Max Scaling brings the feature to the given range, which is normally between 0 and 1; this is quite helpful when the parameter should be on the positive side. Standardization scales the values to have a mean of 0 and standard deviation of 1, which can accelerate the learning of stochastic gradient descent, which is often utilized by neural networks, especially when a large number of epochs are required for training.

Studies such as Thi et al. (2023) use a graph neural network model with a normalization technique to standardize data ranges, ensuring that each feature contributes equally to the learning process of the model. Its application across various datasets, such as NF-ToN-IoT, DARPA TCE3, and VirusShare, IoTPOT, has consistently improved the detection performance of federated learning models. In Aouedi and Piamrat (2023), neural networks involve feature mapping by transforming data into a format that neural networks can process more effectively. This technique, often combined with normalization, is used in various federated learning models to standardize the input data and enhance the model performance and accuracy. The log-transformation method is particularly useful for handling skewed data. It can stabilize the variance and normalize distributions, making the statistical analysis more robust. Moreover, Log transformation is widely used in financial modelling and other areas where data scalability is considered an important factor.

6.4 Feature encoding techniques

It is critical to transform categorical variables into numbers for encoding. Many machine-learning models work with numbers only, primarily those that rely on mathematical computations. Appropriate encoding not only helps increase the model accuracy, but also the efficiency of the learning algorithm. The study (Lin et al. 2023) MLP utilizes one-hot encoding to transform categorical data into a numerical format, which is used for handling nominal data that do not have any key ordering. This technique ensures that the neural network correctly interprets the input data without any implicit hierarchy among categories, which is particularly important in the context of IoT security. This technique is employed in Mqttset, MQTT-IOT-IDS2020, and the IoT. One-Hot Encoding: This method creates a new binary column for each category of the feature. One-hot encoding is ideal for nominal categorical data where no ordinal relationship exists. The encoded features ensure that the model interprets the data correctly and significantly affect the performance of linear models and neural networks. Liang et al. (2022) transforms categorical variables of the NSL-KDD dataset, which is critical for IDS models analyzing protocol types or categorical network traffic attributes. He et al. (2022) The hybrid model CGAN + LSTM has successfully used this technique to preprocess data, enhancing its ability to detect bot infiltration and web attacks. Label Encoding: This technique converts each value in a categorical column into a unique integer. Although straightforward, label encoding implies an ordinal relationship among categories and may lead to poor performance or unexpected results in models that assume a natural ordering of categories, such as linear regression models. Target Encoding: This sophisticated approach replaces a categorical value with a blend of the subsequent probability of the target given a particular categorical value and the prior probability of the target over all the training data. Target encoding helps handle high-cardinality features and can be particularly useful in improving the performance of decision tree-based algorithms (Micci-Barreca, 2001). Studies such as Sudhina Kumar et al. (2022), Huong et al. (2022), Aouedi et al. (2022a), Aouedi et al. (2022b). These studies employed standard AE and VAE encoding techniques to effectively compress and reconstruct the input data, enhancing the neural network’s ability to identify subtle anomalies in datasets ranging from N-BaIoT to SCADA systems. Such encoding not only facilitates efficient data compression but also aids in maintaining data privacy, which is a core aspect of federated learning.

6.5 Hybrid techniques

Hybrid techniques have been used to enhance the performance of the models. For example, it was analyzed in Doriguzzi-Corin and Siracusa (2024) that the ANN model handles temporal features to capture patterns over time, which is crucial for detecting progressive attack strategies such as U2R and R2L probing in complex network environments, such as CIC-IDS2018 and IoT Bot. This approach has been proven to be highly effective, reaching an accuracy rate of 99.9% by leveraging time-based patterns that are often indicative of intrusive activities(Gayathri and Surendran 2024). ANN combines mutual information with temporal analysis to refine the detection of synchronous attacks, such as ARP Poisoning and DNS Flood, applied to real-time datasets such as TON-IOT and CICDDOS. This results in a high accuracy of 99.5%, underscoring the value of integrating information theory with time-series analysis to pinpoint malicious activities. Another study(Raza et al. 2023) utilized a hybrid encoding approach combining AE and VAE techniques to address diverse attack vectors, such as Byzantine attacks and model poisoning, to leverage the strengths of both encoding strategies to enhance its detection capabilities and achieve an impressive accuracy rate of 98.8% on the PhysioNet dataset. Furthermore, Khan et al. (2022b) used time series and one-hot encoding by the LSTM model to capture time-dependent patterns in data, which are crucial for identifying spoofing and DOS attacks in CAN datasets. A study (Farea et al. 2024; Rashid et al. 2023) with a hybrid model of (CNN+ DNN) and (CNN+RNN) used a genetic algorithm (GA) to improve its ability to detect sophisticated cyber-attacks by iteratively selecting the best feature subsets.

6.6 Feature construction

The use of deep feature synthesis and data fusion in Salim et al. (2024) exemplifies how combining multiple data sources and advanced synthesis techniques can lead to high detection accuracies of 99.5% against complex attacks such as SSL and DNS floods. Similarly, the dynamic adjustment of features and client reliability scoring in Si-Ahmed et al. (2024) illustrate the adaptability required in federated environments, achieving accuracies of up to 90%.

7 Evaluation and comparison of feature engineering techniques

In federated environments, feature engineering must address the challenge of extracting meaningful insights from data, while ensuring alignment with the privacy-preserving and decentralized nature of federated learning. Neural network architectures, particularly deep learning models, significantly enhance the detection of malicious clients by learning complex patterns and relationships in the data. When combined with advanced feature engineering techniques, their effectiveness is further amplified.

7.1 Diverse implementation approaches

Research on feature engineering-based IDS has explored diverse approaches to applying feature engineering. This diversity is evident from the use of multiple methods for engineering features. Feature construction does not exclude feature selection; thus, researchers can freely employ any of the three methods in their feature-engineering processes. Consequently, various combinations of these methods have been applied in practice.

Table 13 Evaluation and comparison of feature engineering techniques

7.1.1 Feature engineering approaches

For example, 64 of 88 studies used a single method, whereas 30 of 88 studies used hybrid methods. Among the analyzed studies the number of 12 out 88 studies used CNN as an automatic feature extraction technique. In the context of feature construction, Radjaa et al. (2023) researchers have employed diverse techniques for data aggregation and decomposition to design new features. Various correlation-based (NR et al. 2024; Ahmad and Shah 2024) and search-based methods (Saura et al. 2022; Wang et al. 2023; Mahindru and Arora 2022; Chander and Upendra Kumar 2023; Toldinas et al. 2022; Rashid et al. 2023; Kwon et al. 2022) have been used for feature selection. Similarly, in the literature, 29 out of 88 studies have used different feature extraction algorithms for transforming features.

7.1.2 Handling categorical and high-dimensional data

Feature extraction and normalization are universally applicable for improving the accuracy and robustness of a model across different datasets and attack types. One-hot encoding and PCA are particularly effective for handling categorical and high-dimensional data, respectively, ensuring that the models learn the most relevant features without being overwhelmed by noise. Advanced techniques such as genetic algorithms and TF-IDF vectorization cater to specific data types, such as optimizing feature selection or processing textual input. These methods provide additional layers of sophistication, enabling models to address more complex intrusion-detection scenarios.

7.1.3 Generalizability challenges

Previous work has shown that, with the help of feature engineering, a new IDS can achieve better results in the particular contexts in which they were evaluated. However, it has been noticed that the proposed feature engineering methods are not generalizable across various cases and most of the earlier work done in this field is more case-based. Some other authors have discussed the framework for the generic application of feature engineering; there is not enough scope with a sufficiently large number of applications that would allow considering the issue of this generalization. The limitation of generalizability is more remarkable for expert-dependent processes of feature creation and knowledge-based feature selection, where decision-making is based on an expert’s knowledge. Although such methods can create highly valuable data concerning the predicted object, the ideas described in the data composition cannot be applied to other cases because the defining sets of features are selected based on a particular problem or a researched object. This simply means that the success of feature engineering in a given case does not imply that the same methods will be used in another case.

Section  6 examines the key feature engineering techniques that are crucial for optimizing the neural network performance in federated learning-based intrusion detection systems. Feature Selection emphasizes retaining the most relevant data, reducing dimensionality, and improving performance, although it often requires domain expertise. Feature Extraction leverages existing data to create new features that highlight intricate patterns but may introduce computational overhead. Feature Transformation ensures compatibility across models by standardizing or normalizing data, particularly for heterogeneous client datasets, although sometimes at the cost of the original data interpretability. Feature Encoding enables categorical data to be numerically represented, preserving relationships but potentially increasing dimensionality. Finally, feature construction enriches datasets by deriving new features and enhancing predictions, but it poses risks of redundancy and added complexity. Together, these techniques significantly enhance the accuracy and utility of federated learning models in malicious client detection.

8 Discussion

The ‘gold standard’ for robust federated learning usually relies on an Intrusion Detection System (IDS) with the integration of neural networks, feature engineering, and privacy techniques. In our systematic literature review, we reviewed a range of well-established models, such as ANN, MLP, CNN, AE, RNN, DNN, GNN, and Hybrid Models, as summarized in Tables 4,5,6,7,8,9,10,11. All neural network models adopted feature engineering and data privacy techniques to provide more security and robustness to the federated learning framework.

Table 14 Empirical analysis of advance IDS models in FL environment

Table 14 provides significant findings for models with critical infrastructure, such as IoT, IIoT, IoMT, and Cyber Security. As shown in the table, models such as CNN with hybrid sampling and GNN with textual + IP feature extraction techniques achieved high accuracy in IIoT and malicious domain detection, respectively. However, integrating advanced feature engineering techniques, such as hybrid encoding (AE + VAE) and WHOIS extraction, improves the detection rates. A novel approach combining neural networks and geometric medians was also introduced, demonstrating improved accuracy and convergence rates in various network settings (Indrasiri et al. 2023). These methods have shown effectiveness against byzantine and targeted model-poisoning attacks. As illustrated in Fig. 6, advanced feature-engineering techniques play a critical role in mitigating such attacks and improves the performance of these security models. However, securing federated learning for malicious clients remains an active area of research.

Fig. 6
figure 6

Barchart representing the advance feature engineering techniques used between 2021 to 2024

In section 5 total, 88 studies were reported. Hybrid models are the most commonly used models, while convolutional neural networks (CNN) are also the most commonly proposed model in Fig. 7. It is estimated that 30 hybrid models in the literature review exemplify the effectiveness of combining different neural networks and provide more accuracy while detecting malicious clients. Table 11 and Fig. 7 preliminary evidence suggests that hybrid models can significantly contribute to security. For example, FedGAN and IDS (Tabassum et al. 2022) have been used to address non-IID data challenges in the IIOT network. In contrast, 19 studies used a CNN model to improve the detection accuracy of malicious clients. These studies (Liu et al. 2023b; Bin et al. 2023; Jia et al. 2024; Andreina et al. 2021; Man et al. 2021) excel in feature extraction and have demonstrated significant improvements, as shown in Table 6. This study (Man et al. 2021) showed that CNN can efficiently handle high-dimensional and complex network traffic data. In Fig. 7 and section 5, we can see that other neural network models perform well while securing federated learning. This is not to say that the NN model is superior to the others. The suitability of a model is determined by the type of dataset and features used to train the classifier, whereas the practicality of the NN model is determined by a well-structured architecture in which their models are tested in real-world settings with direct interaction of attacks. Furthermore, we noticed that the use of GAN has barely been considered so far, generating attack data that could help to improve the effectiveness of the developed approaches. Owing to recent advancements in generative models, the use of these techniques will likely expand in the coming years to develop IDS.

Fig. 7
figure 7

Bar chart representation of the number of studies with NN models published across the years from 2021 to 2024. The line graph represents the average model accuracy of studies across the years

As noted in the literature, different validation approaches were used to evaluate the model. Figure 8 shows that three major types of validation techniques are employed by AI studies. For example, K-Fold Cross-Validation (used in almost 25 studies), cross-validation (used in 23 studies), and hold-out validation (14 studies). Hold-out validation is suitable for large datasets and involves partitioning of datasets into training, validation, and test data. Training data are used to train the model, validation data are used to fine-tune the model, and test data are used to assess the performance of the model. K-fold CV is a robust method that divides datasets into a k number of folds. One fold was applied for the assessment of the model performance, while the other fold was used for the training of the model. The K-fold CV repeats k times such that all of the k folds are used as training or test data. LOOCV can be described as a type of k-fold CV in which the value of k equals the number of samples in the dataset. As a result, it can only be applied to data that include a small number of samples, because its computational requirements are very high. However, we would like to highlight the application of LOOCV, which has been observed in many studies. More specifically, LOOCV is useful because each case in the dataset is used once as the testing set, whereas the rest of the cases are used as the training set. Therefore, we suggest employing LOOCV to evaluate neural network models, as it establishes a strong correlation between the training and test instances, thereby enhancing the model’s reliability and applicability in practical scenarios.

Fig. 8
figure 8

Pie chart showing the distribution of various validation techniques used in research studies between 2021 to 2024

Figure 9 demonstrates that approximately 54 datasets were used to evaluate the proposed research studies. The literature addresses that many studies evaluated outdated datasets, which do not reflect the modern networking environment. The imbalanced class distributions in these datasets reduce the impact and efficiency of the model. Future studies should focus on updating and enriching these datasets, balancing class distributions, reducing feature redundancy, and ensuring generalizability across diverse network environments. By integrating real-world data from various sources, researchers can enhance the robustness and generalizability of IDS models. This ensures that they are better equipped to handle the complexities of current and emerging attack vectors while preserving data privacy.

Fig. 9
figure 9

Pie chart illustrating the distribution of datasets across different domains

Due to recent advancements in generative models, the use of advanced privacy techniques will likely expand in the coming years to secure federated learning, as shown in Fig. 10. It visually emphasizes the dominance of federated learning as a privacy-preserving technique; the other techniques are in smaller portions. For example, Federated Learning, 60%; Differential Privacy, 10%; blockchain, 5%; Encryption Techniques, 5%; Graph-based Techniques, 3%; Local Training, 3%; Hybrid Techniques, 1%; and others, 13%. Another comprehensive observation in Table 14 and Sect. 5 is the trade-off between privacy protection and the model performance. Techniques such as federated learning ensure data privacy by keeping data local; however, models such as MLP (Shen et al. 2024) and CNN (Liu et al. 2023b) show that resource-intensive tasks and ensemble methods may hinder scalability in IoT and IIoT environments. Furthermore, cryptographic techniques such as those used in ANN (Doriguzzi-Corin and Siracusa 2024) provide robust security; however, their high time complexity limits their practical application in real-time systems. Recent studies have explored trade-offs between these factors and proposed various approaches to enhance privacy while maintaining performance. Differential privacy (DP) is widely used to protect individual data, with local and global DP showing positive effects on fairness (Gu et al. 2022a). However, stricter privacy can intensify discrimination, which necessitates careful parameter selection (Gu et al. 2022a). The No-Free-Lunch theorem suggests that simultaneously achieving excellent privacy, utility, and efficiency in certain scenarios is unrealistic (Gu et al. 2022b). Gaussian mechanisms with local DP can preserve user privacy, but utility decreases, and transmission rates increase with stronger privacy. Sketching algorithms offer a promising solution that potentially provides both privacy and performance benefits without sacrificing accuracy (Liu et al. 2019). These findings highlight the complex interplay between privacy, utility, and efficiency in FL systems.

Fig. 10
figure 10

Barchart representing the advance data privacy techniques used between 2021 to 2024

Figure 11 shows the diverse domains of federated learning. The significant focus of the literature is on the Internet of Things (55%), Cyber Security (20%), and Industrial IoT (10%), and the real-world application of federated learning is approximately 5%. Other domains, such as Healthcare, Transportation, and Vehicular Networks, are underrepresented and highlight the research gaps. Future work should explore these areas and emphasize practical and real-world applications. Interdisciplinary studies integrating neural networks with emerging technologies can drive broader and more impactful innovations. Another critical gap in this area is the lack of empirical studies comparing the performances of GNNs, DNNs, ANNs, and AEs in real-world federated learning scenarios. Future research should focus on conducting comparative studies to evaluate the strengths and weaknesses of these models under various conditions.

Fig. 11
figure 11

Research studies in literature categorized based on domain between 2021 to 2024

9 Challenges and future work

This section summarizes the challenges encountered in the literature. Based on the information collected, we identified potential future work and open challenges that necessitate further research, many of which have been mentioned in previous studies. Future research should explore promising directions identified by these gaps.

9.1 User experience and human-computer interaction in federated learning

Due to the rapid growth in the use of mobile phones, wearable sensor-based human activity recognition (HAR) has emerged as the most popular topic in the Internet of Things (IoT). (Xiao et al. 2021).However, it is difficult for traditional approaches to solve this problem and simultaneously obtain high recognition accuracy and user privacy at the same time. To solve this issue, federated learning has been deployed on client-side devices such as smartphones, smartwatches, and IoT. However, it is necessary to focus on several key areas to provide more secure federated learning to clients. For example, optimizing resource consumption through energy-efficient algorithms, minimizing computational overhead, and balancing performance with device constraints are critical for a trustworthy user experience. Additionally, future research will investigate highly flexible and dynamic human-computer interfaces suitable for federated learning frameworks. These frameworks would indicate that devices would be heterogeneous, that the user would have an increased awareness of the privacy issue through informative interfaces, and that there would be structures for loading federated learning tasks.

9.2 Network attacks

The security of federated learning presents critical challenges, for example, in the emerging domain of multi-model FL, with adversarial attacks, data poisoning, and byzantine attacks. Multi-modal FL applies the mechanism of integrating heterogeneous data, including traffic statistics and raw sensor data, to improve the models’ performance while exposing them to vulnerabilities. Shi et al. (2023) proposed a multimodal hybrid network for intrusion detection (Shi et al. 2023). These models with advanced techniques improve anomaly detection; however, they are still susceptible to adversarial and model poisoning attacks. Model poisoning attacks deliberately reduce the global model accuracy, whereas adversarial attacks cause misclassification impact, as discussed by He et al. (2023). Future work will explore robust feature extraction techniques that specifically target attacks and employ techniques such as moving target defense (MTD) and randomized feature extraction to counter adversarial transfer and client features. In addition, standardized evaluation frameworks should be prioritized to test defenses against adaptive attacks and explore packet-level adversarial threats. Yu et al. (2022) combined Autoencoder and Variational Autoencoder techniques to address byzantine attacks; however, future work should focus on more advanced Byzantine-resilient algorithms and real-time monitoring for enhanced system resilience. ChandraUmakantham et al. (2024), Ahmad et al. (2024) propose integrating sequence models like BERT for improved intrusion detection in vehicular networks. In conclusion, addressing the vulnerabilities of adversarial attacks, model poisoning, and multimodal data integration is essential for advancing FL security research.

9.3 Non-independent and identically distributed data

One of the primary challenges in federated learning is dealing with non-independent and identically distributed (non-IID) data across clients. This can lead to biased model training and poor generalization. Clustering-based aggregation methods are used to manage the non-IID data. These methods dynamically adjust the features, and the client reliability scores illustrate the adaptability required in an FL environment. Future research could focus on federated multitask learning, personalized models, and transfer learning techniques to handle non-IID data effectively. Multi-task learning, in which the learning problem is treated as a multi-task learning problem and each client considers it as a separate task. Liu et al. (2023a) highlighted the challenge of unstable communication, leading to stragglers in the federated learning process, impacting model aggregation and convergence. Future work could address the impact of stragglers on model convergence and explore methods to mitigate the influence of pathological models trained on non-IID data.

9.4 Challenges in communication efficiency and overhead

The primary limitation of any FL application is the communication costs per training round. In federated training, hyperparameters, such as weights, learning rate, and batch size, must be communicated from the central server to multiple clients, and the trained models from each client must be transmitted back to the server. The traffic on the server, packet transmission loss, and communication time can vary significantly, depending on the network bandwidth. Training complex neural networks in a decentralized environment incurs high computational expenses at each node, as highlighted in Suresh et al. (2024), emphasizing the need for better algorithms to reduce the communication overhead without affecting the output. Federated compression techniques, model pruning, and quantization strategies are essential for optimizing communication bandwidth and reducing computational load on edge devices. For example, the study in Friha et al. (2023) (Edge-IIoT) reported lower F1 scores of 12%, indicating high false positives and false negatives, highlighting the importance of improving DNN architectures to enhance both precision and recall. Future research should explore advanced communication optimization techniques, including bandwidth optimization methods and decentralized federated learning architectures, to minimize the computational complexity and improve the scalability of FL systems.

9.5 Data heterogeneity

One of the key challenges in federated learning (FL) is the non-IID (non-independent and identically distributed) data, each client may have a different distribution of data, resulting in variations in the statistical properties of the data across clients. This can cause problems for FL, as the learning process assumes that each client’s data is representative of the overall data distribution. In FL-enabled IDS, such as Salim et al. (2024), Saura et al. (2022) this heterogeneity can lead to significant performance degradation, as the global model may not generalize well across different clients. Although much research has focused on dividing datasets into client-specific partitions, only a few studies (Zhang et al. 2024) have explored vertical federated learning (VFL) scenarios, which are more aligned with real-world use cases, where clients possess different types of data. To address this, future work should focus on techniques like federated domain adaptation and federated multi-task learning, which can help models adapt client-specific data while improving the global model’s accuracy. Furthermore, data normalization and personalized models could mitigate the impact of data skew by aligning features across heterogeneous devices. These solutions must be validated through large-scale, real-world datasets, particularly in smart environments. For example, in smart homes with various IoT devices like cameras and door locks, each device has different specifications and protocols, requiring an IDS to account for this diversity to detect and prevent attacks across the network effectively.

9.6 Data privacy

Federated learning proposes a privacy solution through a distributed learning process. Providing privacy to local clients. An extensive discussion in the literature raises serious privacy concerns, and there is a need for more feasible solutions and studies to address security against cyberattacks and enhance the privacy of local clients. For instance, a study (Sater and Hamza 2021) suggested that incorporating new technologies, such as modern communication protocols, encryption standards, blockchain, and lightweight deep learning, could provide potential privacy solutions. Additionally, exploring advanced privacy-preserving techniques, such as differential privacy and homomorphic encryption, is essential for enhancing security without compromising data privacy. Adopting hierarchical federated learning frameworks, as discussed in Ahsan et al. (2024), can enhance the robustness and efficiency of IDS. The integration of additional privacy techniques, such as CKKS encryption (Zhang et al. 2022b) and knowledge distillation, further enhances performance, suggesting that hybrid approaches can effectively improve IDS.

9.7 Optimizing feature engineering and federated learning

Relative to feature engineering techniques such as information gain and temporal learning, federated learning helps to protect data privacy by applying local feature selection before the aggregation process. Future research could work towards inventing adaptive encoding strategies to meet new security challenges in the network setting and continue refining feature extraction to improve user privacy. Furthermore, advanced feature selection and dimensionality reduction techniques, such as PCA and RFE, will be developed in future research with machine learning models. For example, PCA with a variational autoencoder can improve efficiency and accuracy, whereas federated reduction approaches minimize data transmission. High-dimensional data in IDS models increases the computational cost and lowers the detection accuracy. Future work should explore advanced feature selection and dimensionality reduction techniques, such as PCA and RFE, combined with machine-learning approaches. Hybrid models using PCA with deep learning methods, such as VAEs, can improve efficiency and accuracy, while federated reduction approaches minimize data transmission. These methods can be tested on large real-world datasets for practical improvement.

9.8 Scalability

Scalability is a major concern while training complex and advance models. There are large number of clients involved in the learning process, the management of these client’s during model updates become challenging and resource-consuming. Future research should optimize communication protocols, minimize data exchange, and improve synchronization mechanisms to ensure efficient collaboration. The adaptability of FL in dynamic environments is another critical aspect, as IDS operate in constantly evolving settings with new threats and attack patterns. Research should aim to develop FL techniques that adapt quickly to changes, involving algorithms and mechanisms for real-time feedback and evolving threat intelligence. Techniques such as online learning, transfer learning, and continual learning can help IDS models learn from new data, without discarding previously learned information.More future work in scalability should be aimed at the design of algorithms that can gracefully scale to large hosting and participating clients and large datasets. Studying how to minimize the latency and bandwidth for more efficient aggregation of the model’s updates should be a focus of research. Furthermore, the optimal client selection methods and dynamic resource control mechanisms across various contexts must be explored. New advancements can be made to prove these concepts in various real-life applications, and significant scalability research can be performed to evaluate the functionality and efficiency of such approaches.

9.9 Dataset enrichment and diversity

Overall, based on previous analyses of the datasets used, future research should prioritize developing and utilizing more diverse and contemporary datasets that accurately reflect modern network traffic and the evolving landscape of cyber threats. By incorporating real-world data from various sources, researchers can enhance the robustness and generalizability of IDS models, ensuring that they are better equipped to handle the complexities of current and emerging attack vectors.

9.10 Real-world implementations

Encouraging real-world implementations is important for validating the practical applicability of federated learning (FL) approaches, offering valuable insights into operational challenges, and facilitating model refinement. Our analysis revealed that only 5 out of 88 studies were tested in real-world scenarios. For instance, the research gap identified in Awan et al. (2023) highlights the need to enhance the FedTrust approach in complex IoT environments, and proposes future work on trust parameters, dataset diversity, and real-world validation. Additionally, Si-Ahmed et al. (2024) pointed to the lack of real-time evaluation of FL-based intrusion detection systems in IoT and IIoT environments, with challenges including model interpretability and feature engineering complexity in heterogeneous datasets.

Beyond intrusion detection and healthcare, FL has growing potential in various sectors such as power systems, telecommunications, and agriculture. In the power sector, FL has been applied to improve the security of smart grids via distributed anomaly detection. For example, Sun et al. (2022) proposed a Transformer-based Intrusion Detection Model (Transformer-IDM) integrated with Hierarchical Federated Learning (HFed-IDS) to secure Advanced Metering Infrastructure in 5G-enabled smart grids. Future studies should focus on addressing resource heterogeneity and evaluating diverse datasets to improve federated resource management.

In telecommunications, FL has been used to optimize 5G networks and enhance intrusion detection. For instance, Yakan et al. (2023) leveraged LSTM models to detect misbehavior in V2X communication within 5G edge networks, thereby improving the integrity of Cooperative Intelligent Transport Systems (C-ITS) Yakan et al. (2023). This demonstrates FL’s ability of FL to enhance both network security and privacy in dynamic communication environments. In agriculture, FL has shown promise in securing Agri-IoT systems. Wang et al. (2023) proposed a federated model to withstand adversarial attacks such as Krum and Bulyan in unsupervised scenarios, highlighting the potential of FL in protecting IoT-based agricultural systems from cyber threats (Wang et al. 2023).

These examples illustrate FL’s growing role of FL in various sectors. While FL facilitates collaborative model training while ensuring privacy and reducing communication costs, future research must address sector-specific challenges, such as device heterogeneity, resource management, and adversarial threats, to strengthen its applicability and robustness in real-world settings.

9.11 Legal and ethical considerations in federated learning

Federated learning presents unique legal and ethical challenges, particularly regarding data privacy and security. Regulations such as the General Data Protection Regulation (GDPR) Satariano (2019) and the Health Insurance Portability and Accountability Act (HIPAA) Edemekong et al. (2018) impose strict requirements on data usage and protection. Although FL enhances privacy by keeping data localized, challenges remain in ensuring that system security is not compromised. Future research can focus on developing privacy-preserving techniques that comply with these regulations, such as differential privacy and secure multiparty computation while enhancing transparency in data governance. Additionally, ethical concerns regarding bias in data and accountability of FL models require further exploration, especially in sectors such as healthcare and finance.

10 Conclusion

This comprehensive review provides a detailed analysis of the critical role of neural network-based Intrusion Detection Systems (IDS) in enhancing the security of federated learning environments. The decentralized nature of FL and its potential applications in IDS, neural network models, feature engineering methods, and advanced privacy techniques were explored. These advancements have significantly enhanced the security of FL, making it more robust against cyber-attacks. Exploring novel neural network architectures, privacy-preserving methods, and deployment scenarios in real-world sectors, such as power, finance, and telecommunications, could bridge the existing gaps between theoretical advancements and practical implementation. However, several challenges remain to be overcome. Addressing the practical challenges of deploying IDS in real-world federated learning environments, such as resource constraints, device heterogeneity, and dynamic network conditions. Future research must improve the scalability to handle larger systems in the real world. This review outlines promising future directions for research and development of robust FL. Integration with technologies such as blockchain and homomorphic encryption has been identified as a potential avenue for enhancing security and privacy. Advancements in algorithms and optimization techniques have been emphasized to improve the efficiency and effectiveness of FL in IDS applications. While acknowledging the progress made in FL for IDS, the survey acknowledges the need to address communication efficiency challenges and overcome potential limitations. By doing this, more robust and scalable FL approaches can be developed, contributing to the advancement of the IDS technology.