1 Introduction
Cyber resilience is receiving attention from
Information Technology (IT) experts due to the surge in cyber-attacks compromising the existing infrastructure [
81]. Cybersecurity mainly protects IT assets such as data. Still, cyber resilience is the ability of the system to defend against successful cyber-attacks and revert to a normal state when cybersecurity fails to protect the system [
71]. Cyber resilience enables organisations to return to running when cyber-attacks are missed by the deployed cybersecurity solutions [
42]. Cyber resilience is not only about resisting potential breaches but rather about learning from those attempts and continuously adapting the system to changing conditions to dampen its impact on service survivability. In other words, it aims to sustain system operations while ensuring mission execution [
14].
Let us consider cyber-attacks that happened in 2007 in Estonia, then in 2010 the Stuxnet attack on the Iranian nuclear program [
65]. After the 2007 Estonian cyber-attack, many technologically advanced governments reinforced their national cyber resilience [
134]. Cyber resilience has been implemented in many applications, such as Information Technology (IT) security research [
6]. Even though it is widely now utilised among practitioners in many countries, understanding cyber resilience is critical, especially from the information security perspective within political, industrial, and business domains [
145]. Cyber resilience is increasingly an explicit concern for programs, systems, and missions. Therefore, cyber resilience architects and system engineers investigate ways to implement cyber resilience concepts by integrating and enhancing technologies into designs and architectures of cyber resilience [
23].
Cyber resilience combines best practices from business continuity, IT infrastructure security, and other disciplines to create a business strategy that addresses today’s needs and goals. An enterprise can prepare efficiently and prevent, detect, respond to, and recover from cyber-attacks. If an enterprise can at least partially continue its business operations during a cyber-attack, it will be called a
cyber resilience enterprise [
68]. The relationship between cyber resilience and business is formed by enterprise connection, which becomes essential when assessing business resilience.
A cyber-attack that manages to breach the organisation’s systems or networks could have a significant impact on its overall operation. That is why cyber resilience becomes paramount for those responsible for risk management, business continuity, and cybersecurity professionals [
45]. Protecting against cyber-attacks has become more complex due to several vulnerabilities and sophisticated threats. Cyber resilience attempts to rebalance by designing systems to continue working under cyber-attacks [
145].
Cyber resilience efforts and strategies have traditionally been considered enabling governments and businesses to deliver the intended outcome despite disruptions to information and communication systems [
37]. Additionally, most professionals understand the importance of cybersecurity, but fewer IT security specialists understand the adequate significance of cyber resilience; unfortunately, the top management might not be fully aware of cyber resilience [
54]. Cyber resilience recognises that cyber systems contain components across the physical, information, cognitive, and social environments in which they exist [
38]. Recent efforts based on this idea have generated a set of cyber resilience metrics that organisations can integrate with decision-analytic frameworks to compare cyber system designs or prioritise cyber system upgrades and maintenance [
90].
1.1 Contributions
This survey includes main contributions as follows: (1) focuses on cyber resilience and its critical domains, which have received more attention from researchers; (2) understands the significant domains of cyber resilience, including frameworks, strategies, applications, tools, and technologies and outlines the requirements for each domain; (3) discusses each of these domains in detail and groups them into five domains based on the critical area discussion, as shown in Figure
1; (4) explores solutions related to each domain and compares and analyses different studies to find ways of enhancing cyber resilience; (5) compares
Cyber Resilience Frameworks (CRFs) and strategies based on technical requirements for various applications, helping researchers, practitioners, and organisations choose best practices for enhancing cyber resilience; and (6) presents key findings, limitations, problems, and future directions in the field of cyber resilience. Overall, it provides a comprehensive overview of cyber resilience, identifies key strategies and research challenges, and offers insights into future directions for enhancing cyber resilience.
1.2 Survey’s Selected Domains
The selected domains of frameworks, strategies, recent advancements applications, and tools are all critical in contributing to the overall cyber resilience of organisations in many ways, which can be summarised as follows.
Frameworks. They offer a roadmap for organisations to assess their current cyber resilience, identify vulnerabilities, and prioritise actions. By adhering to established frameworks, organisations can ensure that they cover all critical aspects of cyber resilience, making their efforts systematic and consistent. Moreover, using common frameworks facilitates communication and collaboration among different organisations and sectors, enhancing overall cyber resilience at a broader scale. Frameworks also provide a common language and set of standards for cyber resilience, which can help improve collaboration and information sharing between organisations.
Strategies. These help organisations define their objectives, allocate resources, and proactively protect against cyber threats. Strategies encompass both technical and non-technical aspects, emphasising the importance of employee training and awareness. Effective strategies ensure that organisations are prepared to efficiently prevent, detect, respond to, and recover from cyber incidents. Effective cyber resilience strategies are critical in mitigating the impact of cyber-attacks and ensuring that organisations can continue to operate in the face of cyber threats.
Recent Advancements. These recent advancements include today’s digital age. Organisations need to stay up to date with the latest advancements in cyber resilience as well as threat intelligence and industry trends. Cyber-attacks are becoming increasingly sophisticated and frequent, and continuous integration of new advancements is crucial for organisations to proactively mitigate emerging risks and vulnerabilities. By keeping abreast of these advancements, organisations can better protect themselves from potential attacks.
Applications. These applications include software solutions like Security Information and Event Management (SIEM) systems, Intrusion Detection Systems (IDS), and vulnerability scanners. They are critical in continuously monitoring the organisation’s digital environment, detecting anomalies, and providing real-time alerts. Integrating these applications into the cybersecurity ecosystem enables proactive threat detection and incident response, reducing the potential impact of cyber-attacks. These tools can help organisations identify potential threats and vulnerabilities and respond quickly and effectively to cyber-attacks.
Tools and Technologies. These tools and technologies include firewalls, encryption technologies, backup and recovery systems, and other security measures. These tools act as defensive mechanisms, safeguarding an organisation’s digital assets and data. They work with strategies and applications to create multiple layers of protection against evolving cyber threats. Practical cyber resilience tools and technologies are critical in protecting organisations from cyber threats and ensuring they can recover quickly during attacks.
These domains are interrelated and contribute to developing a comprehensive cyber resilience strategy. Frameworks provide a structured approach to identifying and prioritising cyber resilience efforts, whereas strategies, recent advancements, applications, tools and technologies provide the specific measures and tools needed to implement those efforts. By working together, these domains can help organisations build solid and effective cyber resilience to mitigate cyber-attack impact and ensure business continuity.
1.3 Requirements Classification
In this survey, we utilised cyber resilience requirements classification into the groups as shown in Table
1, which can be summarised as follows.
Framework Requirements. These requirements are related to CRFs and are used in Section
3, which support specific application domains such as security systems, network systems, and
Cyber-Physical Systems (CPS). It is similar to the
Cyber Resilience Engineering Framework (CREF), the most popular one. There are other framework requirements, such as development cost to design the CRF, deployment cost to implement the framework, and maintenance cost to continue utilising the framework. The other framework requirements, such as support by multi-data sources, will make the framework more flexible for incorporating data sources. Correspondingly, the framework should be open source to be easily modified, customised, and utilise metrics to quantify the framework.
Strategy Requirements. These requirements are related to cyber resilience strategies used in Section
4, which are support specific application domains, should be effective, should be strategically acceptable, cost of the strategy, quality of the strategy, and flexibility of the strategy. These requirements will help select the best strategy for cyber resilience in a specific application. For example, the flexibility of the strategy requirement will show the ability to change the strategy to improve cyber resilience quickly.
Recent Advancement Requirements. These requirements are related to improving cyber resilience and are considered in Section
5. They refer to supporting a specific area or application domain, such as in the supply chain, organisation, or cyber defence. They cover whether there is an enhancement to the organisational management level or an improvement at the operational management level. Another one is to utilise international standards and technologies to improve cyber resilience, cost improvement, and performance after advancement.
Applications Requirements. These requirements are related to cyber resilience applications and are used in Appendix
A. Those are compatible with specific applications and valid for a particular domain. These requirements will help researchers understand the applications that already implemented cyber resilience. However, these requirements will compare applications implementing cyber resilience. The main applications and sectors using these requirements are transportation, financial, power system, supply chain,
Supervisory Control And Data Acquisition (SCADA) systems, smart grid, communications network, healthcare, and
Industrial Control Systems (ICS).
Tools Requirements. These requirements are related to cyber resilience tools used in Section
6. They refer to work in organisational management, conducting into operational management, easy to use and more friendly for users, web-based too fast access, efficient, software based, open source if available, cost-effective, performing exemplary, monitoring and tracking cyber resilience as a predictive measure for future via use of a database, and showing the improvement path of cyber resilience.
1.4 Survey Outline
The survey is divided into several sections. In Section
2, we define the term
cyber resilience and discuss the concept in detail. In Section
3, we review and compare existing CRFs. Then, in Section
4, we look at existing cyber resilience strategies and highlight their differences. Section
5 covers the recent techniques used to advance cyber resilience. In Section
6, we compare the tools and technologies used to evaluate and improve cyber resilience. In Section
7, we explore research studies on threat modelling related to cyber resilience. Section
8 discusses the key findings, limitations, and open problems related to cyber resilience. Finally, in Section
9, we focus on future research directions for cyber resilience. The survey concludes in Section
10. Due to strict page limits, we discuss implemented cyber resilience applications in Appendix
A1 and list acronyms in Appendix
B.
2 Defining Cyber Resilience
This section presents different definitions of cyber resilience and its meaning in various domains, as illustrated in Table
2. Subsequently, we discuss the conceptualisation of cyber resilience and make it more apparent relative to the existing works. Moreover, we demonstrate cyber resilience and how it works diagrammatically. Cyber resilience has many definitions depending on its implemented application. For instance, a small business can explain cyber resilience to defend against cyber-attacks and roll back to a healthy functioning state. It can also be defined as ensuring devices operate under any threat environment and are not affected by malicious activities such as phishing e-mails and distributing spam [
143]. Similarly, it is also defined as the ability to maintain the operation of a system when it is under cyber-attacks [
9]. Cyber resilience encompasses the capacity to withstand cyber-attacks and involves multiple dimensions for assessment [
138].
Most definitions of cyber resilience focus on an organisational level without considering the system level. However, there are some fundamental differences between those definitions. Studies have been conducted on cyber resilience at the organisation level [
6,
17,
111,
115]. Aoyama et al. [
6] described cyber resilience as the capability of organisations to defend against cyber-attacks based on three factors of cyber resilience: prevention, detection, and response. These factors have a specific resilience factor: prevention for anticipating, detection for monitoring, and learning and incident reporting response. The main limitation of this definition is describing cyber resilience as based on cyber-attacks without mentioning what happens after the successful attack.
Some authors define cyber resilience at the organisation level as the capacity to consider different groups, such as methods, challenges, and reasonable controls for cyber resilience. Björck et al. [
17] defined cyber resilience as the ability to achieve the intended outcome and overcome adverse cyber events continuously. They consistently imply the ability to deliver change tools or make modifications if needed while facing risks. The intended outcome refers to achieving different goals through online services.
Likewise, Ayoub et al. [
115] prescribed cyber resilience at the organisation level as the capability to react, resist, and sense to adapting events and reshaping and adapting operations in environments with foreseeable and unforeseeable risks. These risks emerge when technological change is so rapid that it becomes more challenging to predict many risks arising in the digital space. In other words, cyber resilience encompasses both organisational and cybersecurity and aims to defend against potential cyber-attacks to make survival possible after an attack. Therefore, an essential issue with this definition is that it does not include the main cyber resilience stages, absorbing and recovering.
The Ponemon Institute [
111] determined cyber resilience as an organisation’s prevention alignment and capabilities to detect, respond to, manage, and mitigate cyber-attacks. That refers to an enterprise’s capacity to maintain its core of both purpose and integrity in the face of cyber resilience. A cyber resilience enterprise can have the ability to prevent, detect, contain, and recover from myriad severe threats against data, applications, and IT infrastructure. As per the definition provided by Keleba et al. [
74], cyber resilience is characterised by the capability to effectively endure and swiftly recuperate from unanticipated and substantial interruptions caused by cyber-attacks. It pertains to an entity’s capacity to adjust to recognised and unrecognised emergencies, risks, obstacles, and adversities, ultimately ensuring the continuation of services or business operations despite cyber threats.
Several researchers have only outlined cyber resilience as system level without considering the organisational level. Vugrin and Turgeon [
50] defined the resilience of a system as the occurrence of a particular disruptive event or set of circumstances with the ability to reduce efficiently the targeted system levels affecting the performance. The central gap of this study is to consider only technical issues without considering human errors. To fill this gap, Todorovic et al. [
136] defined cyber resilience as a system able to identify and target the system’s enhancement for the inherent capacity to respond throughout the inevitable change process for both short and long duration. The resilience infrastructure can adapt, anticipate, and absorb a potentially disruptive event via rapidly recovering, whether human caused or naturally occurring.
A few studies in the broader literature have determined cyber resilience at organisational and system levels. Cyber resilience is defined in some works [
22,
84,
85,
108] as the ability to anticipate, withstand, recover from cyber-attack, and adapt to adverse attacks, stresses, conditions and compromises on cyber resources. Cyber resilience can be a capability of an organisation, a business function, a mission, a system, a system-of-systems, or a cross-organisational mission; the term can also be applied to a nation, region, group, household, or an individual.
Hausken [
64] described cyber resilience as the ability of an actor to resist, respond, and recover from cyber-attacks to ensure the actor’s operational continuity. Moreover, the author reviewed and assessed the emerging cyber resilience role. Cyber resilience can include various actors classified into three: non-threat actors, threat actors, and hybrid actors. Threat actors can be hackers and criminals–non-threat actors such as governments, regulators, incident responders, insurers, organisations, and individuals. Mixed actors can be companies that may sometimes, inadvertently or deliberately, compromise the cyber resilience of other actors. Actors operate at various levels, from organisation, group, individual, and regional to global. Each actor chooses strategies based on beliefs and preferences that impact cyber resilience.
Smith [
128] defined cyber resilience as the ability of an organisation or system to anticipate, withstand, recover from, and adapt to adverse conditions, stresses, attacks, or compromises on cyber resources. It encompasses the capacity to provide and maintain an acceptable level of service in the face of faults and challenges to normal operations.
We consider cyber resilience as the ability of systems or services to roll back and continue operations in a normal situation after an attack happens with fast automation. The concept of cyber resilience involves three steps. First, it starts after a cyber attack has caused network, system, or service failure. Second, cyber resilience comes into action to address the affected networks, systems, or services. Finally, it initiates a rollback process to restore the networks, systems, or services to their normal state as quickly as possible with the help of automation. The main popular areas and sectors that implemented cyber resilience are transportation, finance, power systems, supply chain, Supervisory Control And Data Acquisition (SCADA) system, smart grid, wireless communication networks, health care, and Industrial Control System (ICS), as shown in Figure
2.
3 Cyber Resilience Frameworks
The Cambridge Dictionary defines
framework as “the principles, ideas, and information that form a plan or the organisation structure.” CRFs provide organisations with a security approach that is cost-effective, flexible, and performance based [
122]. Many CRFs are proposed, and around 200 are assessment frameworks, highlighting the need for a simple approach for
Small and Medium-sized Enterprises (SMEs) to operate cyber resilience effectively [
31]. Various standards and frameworks have been developed for implementing and evaluating cyber resilience, including
International Organisation for Standardisation/International Electrotechnical Commission 27001 (ISO/IEC 27001),
National Institute of Standards and Technology (NIST) Cybersecurity Framework (NIST-CSF),
Cybersecurity Capability Maturity Model (C2M2),
Computer Emergency Readiness Team-Resilience Management Model (CERT-RMM),
Control Objectives for Information and Related Technologies (COBIT), and
OpenWeb Application Security Project (OWASP).
These frameworks help organisations understand their capabilities and guide them in implementing or refining resilience plans. Additionally, frameworks help to address the assessment of cyber resilience, taking into account factors such as the cost of implementation and the development needs of different types of organisations [
110]. Organisations must apply or develop these frameworks and standards to identify areas in their system that need improvements. Several CRFs are available, but the most popular is the CREF. The main reason for comparing these frameworks is to identify their strengths and weaknesses.
CRFs provide a structured approach to identifying and addressing potential cyber threats and vulnerabilities. They help organisations develop a comprehensive understanding of their cyber risk profile and prioritise their cyber resilience efforts. Frameworks also provide a common language and set of standards for cyber resilience, which can help improve collaboration and information sharing between organisations. There are some criteria utilised to demonstrate the contrast within frameworks as follows: if it pertains to applications domain, similar to the Cyber Resilience Engineering Framework (CREF), supports multi-data sources, open source, uses metrics, and the cost of development, deployment, and maintenance, as can be seen in Table
3.
Some researchers [
19,
20,
24,
48,
77,
81,
89] proposed their framework to be similar to CREF. Most previous and current frameworks do not support multi-data sources, but a few do support them (e.g., [
35,
45,
57,
121]). Bodeau and Graubart [
20] provided a framework that analyses cyber resilience goals, objectives, costs, structure discussions, and practices. It also serves to characterise cyber resilience metrics and motivation. The framework can evolve as the discipline of cyber resilience engineering matures. Cyber resilience engineering is a part of mission assurance engineering. Various subjects inform it, such as resilience engineering, information systems, security engineering, fault tolerance, dependability, survivability, business continuity, and contingency planning. The framework requires three components to enable cyber resilience engineering and related disciplines: cybersecurity, resilience engineering, and mission assurance.
Linkov et al. [
89] used the matrix framework that was proposed by Linkov et al. [
88] to implement cyber systems. They focused on developing a new framework that can inform the extent of the resiliency of cyber systems within the scope of
Executive Order 13636 (EO 13636) and
Presidential Policy Directive 21 (PPD 21). The resiliency matrix must be generalised and applicable across many approaches to perform a comparative evaluation of systems’ resiliency. Moreover, the metrics must be easily monitored and reported to the management changes made by system operations and decision makers. The importance of this resilience metrics framework can be realised by its ability to allocate resources to enhance resiliency at an organisation.
The frameworks presented in some works [
20,
21,
24] provided and discussed a background on the CREF, which can help structure the analysis through an assessment. The CREF is illustrated in Figure
3 and described in more detail in the work of Bodeau and Graubart [
20]. The CREF can organise the cyber resilience domain into goals, objectives, and techniques. Goals are the high-level intended statement outputs. Objectives are more specific information about intended outcomes to enable assessment; an objective can be determined with a single goal but may sometimes support achieving multiple goals. Techniques are approaches to achieve one or more cyber resilience purposes applied to the design or architecture of a business/mission function based on the cyber resources that support them.
Bodeau et al. [
19] described how resilience techniques apply to an acknowledged system. They extended the definitions of goals, objectives, and methods presented in another work [
20] on the CREF. The extended definitions are (1) extend the set of potential threat sources to include common errors, events, and adversarial actions, (2) extend the collection of adversarial actions to include vectors of non-cyber-attacks, and (3) cyber-physical consideration on pure cyber systems.
Choudhury et al. [
35] focused on the problem of determining dynamic actions to achieve resilience concerning the failure of hardware, compromised systems, or services. They took the first steps towards developing a formal methodology to make a complex enterprise web resilient. Additionally, they presented a unifying graph-based model for representing the behaviour, infrastructure, and missions of the enterprise web and the dependencies among them. This approach’s benefit is that it consolidates multiple data sources, such as Net-Flow, events, and logs, into one model, providing insight into the reasons for actions in the model space. They then transformed actions determined in the model space, such as deleting an edge in a graph, into real-world activities, such as blocking communication between a client and a server. The recommended algorithms are implemented, and they seek to release the same as an open source software framework for simulating resilient cyber systems.
Aoyama et al. [
6] presented a framework that includes four categories: (1) effective control, (2) decision and implementation, (3) communication and coordination, and (4) information and management. However, the framework was developed in safety engineering and applies to cyber incident handling. In fact, by considering each incident response task procedure to be one project, with similar projects to be managed, in this case, the situation under cyber-attack can be regarded as a dynamic resource allocation problem.
Khan and Al-shaer [
77] proposed an initial version of a formal framework called
CREF. Their proposed framework provides a comprehensive view of resilience to measure network resilience from various aspects. It covers all metrics from different levels, such as proactive, resistive, and reactive. The CREF is derived from the DREF (Dependability and Resilience Engineering Framework). It is a framework that explains the resilience quantification of communication and IT systems. The CREF is highly generic and can be used at various levels to measure the resilience of network systems. They applied their framework to firewall devices, part of cybersecurity devices, to show their approach’s usefulness and practicality.
The framework proposed by Yano et al. [
48] for cyber resilience is based on two elements: partitioning the system into different segments and using the kill-chain attack model to structure the defence and adoption of a lifecycle based on the goals that were described in the MITRE framework presented by Bodeau and Graubart [
20]. The assets are allocated and prioritised for different segments according to the assimilation strategy that increases situational awareness and emphasises the implementation of situational awareness elements. With these elements, defenders may have a ready view of events and logs in progress and the necessary actions to contain the attacks forcing the least possible damage to the tasks in progress.
Friedberg et al. [
52] presented metrics and results for the CRF. The key contributions of their framework are threefold. First, it allows the evaluation of cyber resilience concerning different performance indicators of interest. Second, simplifying the complexities related to performance indicators of importance can be done intentionally. Finally, it supports identifying reasons for good or poor resilience to improve system design. The presented metric framework provides a scalable system model and a flexible way to measure system resilience numerically.
Nevertheless, it does not come without limitations and challenges. First, the approach aims to describe a single system, which is difficult to define in Cyber-Physical Systems (CPS). Another challenge is framework implementation and evaluation. The information required to build the interdependencies between different features and their respective performance is often unavailable for current CPS installations. The multi-dimensional performance concept makes the framework more generally applicable than the work by Rieger [
119], which focuses on control response as a performance measure.
Ayoub et al. [
115] outlined some recommendations and pointed out issues that governments might face in building a resilient cybernation. The reason is to devise a range of tools and materials that can adapt to the rapid changes in the digital world. Most governments will probably arm IT with a robust framework to deal with various unforeseen challenges in the future. A structure must be a platform for organisations to share information about actions quickly and collaborate over threat intelligence. This framework will raise awareness of the need to enact any resilience plan and a more proactive response mechanism than cyber-monitoring.
Rose et al. [
121] presented a framework for estimating and analysing various types of resilience related to cyber itself and cyber-related sectors. They provided ranges of cost estimates and broad effectiveness of stability set through different syntheses from the academic literature, including industry-specific information. Their analysis indicates that the location of cyber resilience tactics is relatively low in cost, potentially handy, diverse, and quite extensive.
Gisladottir et al. [
57] called for a framework or systematic evaluation of risk, rules, and resilience of cyber systems incorporating behavioural sciences. It is partly due to the problem’s complexity and the underlying system, including data vulnerabilities, event tracking, software patching, and the interdependence of stakeholders. The need to collect and systematically utilise data from existing systems and establish best practices based on the goals and performance of the optimisation also contributes to the framework’s necessity. Selected numbers of well-framed rules are the key to maximising cyber systems’ resilience and minimising human factor risks. Also mentioned are two main steps to evaluate the effect of a new rule inside a particular order’s security. The first step to the practical application of any development involves the estimation of minimum decision latitude. The second step is to research the methodology to quantify the level of independence employees experience.
Maziku and Shetty [
96] discussed achieving cyber resilience in an intelligent grid network with a security score model using a framework of a
Software-Defined Networking (SDN) for IEC 61850-based substation communication network. The Software-Defined Networking (SDN) framework incorporates SDN principles and the security risk score model leveraged to achieve cyber resilience. They demonstrated how the SDN relieves their intelligent grid network of improvement and excessive timing performance of IEC 61850 type messages, making them time compliant. The security score model will also incorporate the device critically in the IEC 61850 network. They provided the ability to reconfigure the IEC 61850 network in real time by implementing the security score model in SDN. They approve their approach with the estimated model in an experiential
Global Energy Network Institute (GENI) test outlined by wide-area networks with realistic and dynamic traffic scenarios to address IEC 61850 network attacks.
Kott et al. [
81] introduced a CREF that was developed and offered by Bodeau and Graubart [
20], which provides an overview of how to structure cyber resilience capabilities by addressing the goals, objectives, and practices in alignment with the “adversary activities” that occur within each ability to reflect the intent and potential actions that the capabilities are intended to protect. They discussed the cyber resilience goals and associated objects from the framework [
20], which aligns closely with
North Atlantic Treaty Organization (NATO) cyber resilience goals.
Haque et al. [
62] proposed a CRF for the ICS by crumbling the resilience metric into several hierarchy sub-metrics. These metrics were presented as a tree structure that can capture information of a qualitative nature on the system’s security posture concerning resilience that a high-level framework to identify where analysis and modelling are needed. Additionally, they show the formalisation of cyber resilience metrics by illustrating resilience metrics calculation using the
Analytical Hierarchy Process (AHP). This framework serves as a versatile platform for different criteria-based decision aids, which can help the technical experts identify gaps in the study of ICS resilience.
Dickson and Goodwin [
45] emphasised that organisations must build a cyber resilience capability by shortening the lifecycle stages: defence, detection, response, and recovery. They said cyber resilience is a framework designed to help organisations withstand attacks. It is not a single product or layer of protection but a way for organisations to structure their defences so that no one event is destructive. They presented the CRF with five components: identify, protect, detect, respond, and recover, as demonstrated in Figure
4. Similarly, these five components of cyber resilience are discussed by Blum [
18].
Intending to aid Small and Medium-sized Enterprises (SMEs) in operationalising cyber resilience, Carías et al. [
33] present a framework that SMEs could use to understand what domains and policies are implied in the cyber resilience building process. In addition, the framework has also been presented in the form of an implementation order that SMEs can follow to operationalise cyber resilience based on experts’ experience. The main idea of the framework is not to be as specific and exhaustive as possible but to be synthesised and generalist for SMEs to understand what cyber resilience indicates and start implementing it without being crushed. Using the framework and implementation of the order could help SME managers in the process of cyber resilience building by giving them a synthetic tool with the essential actions and an order in which to implement them.
Bejarano et al. [
13] review frameworks and standards to achieve cyber resilience in organisations, such as the NIST framework, ENISA (European Union Agency for Cybersecurity), and ISO/IEC 27032. The authors envision a new CRF that leverges
Machine Learning (ML) techniques to improve business continuity. The National Institute of Standards and Technology (NIST) framework supports five risk management functions: identify, protect, detect, respond, and recover. Machine Learning (ML) algorithms are increasingly used in cybersecurity to detect subtle patterns and handle large volumes of data. Organisations like
ENISA! (ENISA!) help countries better prepare, detect, and respond to information security problems.
These standards raise quality, safety, reliability, efficiency, and interchangeability levels. Resilience is critical for preserving system functionalities and mitigating the consequences of cyber-attacks. Cybersecurity frameworks, standards, and good practices contribute to understanding different types of attacks and managing cyber-attacks. The NIST framework provides a simple, practical framework aligned with guidelines and recommended good practices. Advances in communication technologies and hyper-connectivity drive the need for cyber resilience [
13]. The NIST framework is practical and applicable to organisations but requires significant implementation effort. Companies’ existing cyber resilience mechanisms require adopting relevant standards, processes, and resources. The work proposes using ML models and techniques to predict and recover from attacks and protect systems promptly.
Hammad et al. [
61] propose a framework using
Artificial Intelligence (AI) based on a hierarchy for cyber resilience in interdependent critical infrastructure systems. The framework identifies, detects, and mitigates cyber and physical attacks through enhanced situational awareness. It focuses on developing an integrated cyber-defence solution to detect and respond to attacks targeting interdependent critical infrastructures. The proposed framework, called
deep defence, aims to improve system situational awareness through telemetry and events from different domains and layers of the systems. It utilises deep and adversarial ML elements to enhance attack anticipation and response. The framework also emphasises the need for coordinated adaptive-capacity resources on individual and interdependent systems’ cyber and physical layers to strengthen resilience. The authors aim to develop a comprehensive approach that can be applied to different interconnected critical infrastructure systems and adapt to the evolving threat landscape.
In today’s digital age, cyber threats are becoming more and more prevalent. That is why organisations must have a comprehensive cyber resilience program in place. The Australian Signals Directorate recognised this need and implemented a program that includes vulnerability scanning, patch management, and incident response planning [
130]. However, more than these measures are required—they must also be aligned with the organisation’s business objectives and priorities. This can often be challenging, but the Australian Signals Directorate has overcome it by creating a risk management framework. This framework helps prioritise resilience against cyber threats, ensuring the organisation is prepared to handle any potential attacks. Organisations can protect themselves against cyber threats and ensure business continuity by following these steps.
Al Maruf et al. [
3] proposed a framework that is a timing-based approach for designing cyber resilience in CPS under safety constraints. It aims to ensure the safety of CPS in the face of faults and cyber-attacks. The framework develops a common methodology for safety analysis and computation of control policies and design parameters in CPS employing various resilient architectures. It allows for the comparison of different resilient architectures and enables the extension of analysis and design from one architecture to another. The framework utilises a hybrid system model that captures CPS adopting any of the resilient architectures.
The framework in the work of Al Maruf et al. [
3] models the cyber subsystem as operating in a finite number of statuses. It formulates a problem of computing control policies and timing parameters jointly to satisfy a given safety constraint. The derived conditions from the hybrid system model are used to compute control policies and timing parameters relevant to the employed architecture. The solution provided by the framework can be applied to a wide class of CPS with polynomial dynamics and allows for the incorporation of new architectures. The proposed framework is verified through a case study on adaptive cruise control of vehicles, demonstrating its effectiveness in ensuring cyber resilience in CPS.
PHOENI2X [
53] is a project funded by the European Union, which aims to create a CRF for
Operators of Essential Services (OES) and EU Member State authorities. The framework will be designed to provide AI-assisted orchestration, automation, and response capabilities for business continuity and recovery, incident response, and information exchange. The main objective of the project is to enhance cyber crisis management and resilience by focusing on preparedness, shared situational awareness, and coordinated incident response. PHOENI2X aims to use serious games to raise awareness of social engineering and to improve the ability to detect attacks. The project will integrate different cognitive aspects to provide an effective learning experience. The framework will be tested through use cases in the energy, transport, and health care sectors, highlighting the importance of supply chain aspects and addressing specific threats identified in each domain.
The state level should prioritise developing a shared understanding and terminology of cyber resilience in cyberspace. This lack of clarity is hindering research and policy-making efforts [
67]. The concept of cyber resilience gained attention in 2012, focusing on the ability of systems, actors, and functions to prepare for, absorb, recover from, and adapt to adverse effects, including cyber-attacks. However, state-level cyber resilience is still an emerging concept that requires further research and theoretical advancements to avoid vagueness and misuse.
Hubbard [
67] proposes a comprehensive conceptual framework for state-level cyber resilience that highlights the dynamic nature of resilience, the presence of resilience capacities at various levels and across actors within the state, and the need to confront and recover from specific types of cyber damage. The framework aims to establish a common terminology and promote a systematic, multi-dimensional approach to assessing and improving states’ capacity for resilience in cyberspace.
The identity stage recommends using security scanner tools. N-Stalker [
101] is one of the popular security scanner tools; it is a web security assessment tool. It allows scanning web applications against buffer overflow, SQL XSS injection, and SQL infusion blemishes. N-Stalker is a helpful security tool for IT auditors, developers, system/security administrators, and IT experts. The detection stage involves security monitoring, and a powerful tool for this stage is OSSEC [
109]. OSSEC is open source, free, and multi-platform. OSSEC is a security tool that, through its comprehensive course of action, decides, including custom alert principles, while creating resources to make a move when alerts occur.
The protection stage that defines access control, data security, and information protection needs a specific tool to complete. The most popular tool for this stage is GnuPG [
131]. GnuPG is a comprehensive and free tool that executes the OpenPGP standard characterised by RFC4880 (otherwise called
PGP). GnuPG encodes information and correspondences; it consists of a flexible fundamental administration framework alongside modules for a wide range of vital open indexes. GnuPG, or GPG, is an order line device with highlights for a simple combination of different applications. An abundance of front-end applications and libraries are accessible. GnuPG likewise offers help for
Secure/Multipurpose Internet Mail Extensions (S/MIME) and
Secure Shell (SSH).
The response stage responds to planning and analysis of the events and logs, with the famous and suitable tool for this stage being the Apache Metron tool [
7]. Apache Metron gives an adaptable, propelled security investigation system developed with the Hadoop Community from the Cisco OpenSOC Project. A digital security application structure allows associations to distinguish digital irregularities and empowers associations to react to recognised inconsistencies quickly.
The recovery stage will return to ordinary tasks and many tools available for backup and recovery, but the popularity lies with the Bacula tool [
120]. It has had many open source tools for recovery and personal computer programs that grant the framework director to oversee reinforcement, recuperation, and check for personal computer information through various types along with the system. Bacula’s free information reinforcement programming is generally simple to utilise and exceptionally useful while offering many propelled stockpiling executives that make it simple to discover and recoup lost or harmed records.
Limited research has addressed the existing challenges faced by CRFs will help the cybersecurity community collaborate on improving current Cyber Resilience Framework (CRF)s. Furthermore, they will assist the cybersecurity community in identifying organisations, universities, and people working on designing and developing CRFs [
125].
4 Cyber Resilience Strategies
In this section, we compare and discuss different cyber resilience strategies. Cyber resilience strategies refer to the actions and measures taken by organisations to prepare for, respond to, and recover from cyber-attacks. Strategies can include technical measures, such as implementing firewalls and encryption, as well as non-technical measures, such as employee training and awareness programs. Effective cyber resilience strategies are critical in mitigating the impact of cyber-attacks and ensuring that organisations can continue to operate in the face of cyber threats. We present comparisons within these strategies, as seen in Table
4. Cyber resilience can be achieved by applying strategies based on principles [
39,
132] and investment [
34].
A great deal of previous research into cyber resilience strategy with the supply chain has focused on management strategies to improve cyber resilience, thereby pointing out how the strategy can be automated using innovative
Information and Communications Technology (ICT) systems [
139]. The Information and Communications Technologies (ICT) has already indicated playing a significant role in managing and controlling the value of a complex network. However, additional ICT capabilities, mainly aiming for improving cyber resilience, may be exploited in supply chains to ensure quick response to disruptions and risks within a short time.
These capabilities support joint development, repository IT ecosystems where B2B (Business 2 Business) or B2G (Business 2 Government) both push and pull the different web services of contemporary to be created by an actor of the supply chain and governmental agencies. Enabling B2B! (B2B!) and B2G! (B2G!) data sharing will allow companies to access an unimaginable amount of data and services that can enhance and improve the whole supply chain’s cyber resilience. For example, organisations will be able to control and manage suppliers and portfolios online quickly, making more accurate Estimated Time of Arrival (ETA) estimations to monitor the transport infrastructure capacity in real time. Likewise, it would be easier for organisations to rapidly learn and apply any sudden changes in trading regulations while complying with regulatory frameworks.
One study by Efthymiopoulos [
49] examined the trend of cyber resilience strategy in cyber defence. It included the importance of cyber resilience during the North Atlantic Treaty Organisation (NATO) strategic evaluation. This aims to approach and integrate the NATO collective defence methodologically. Additionally, it discusses the technological assessment of NATO, implying strategic and operational changes for all alliances. It will be operating strategically and operationally while considering different challenges and threats. NATO reviews cyber resilience as a tendency for building capabilities wherein fields include, but are not limited to, training/awareness/education, network protection infrastructure, systems configuration, and infrastructure protection, among others.
The first systematic study of cyber resilience strategy as principles was reported by Conklin [
39] in 2017 for a cost-efficient approach and sufficient to protect the critical systems that power the way of our life. They offer pedagogy for disseminating and a staged approach to implementing cyber resilience policies and a general curriculum. A cyber resilience strategy is maintaining functionality at all costs without considering defending outside elements or lesser critical ones. They explained the cyber resilience strategy into seven principles: classify, risk, rank, deploy, test, recover, and evolve. The organisation implementing a cyber resilience strategy will give them more ability to withstand and recover rapidly from disruptive events.
In 2019, Tehrani [
132] presented another cyber resilience strategy as principles that discussed and illuminated the underlying national critical infrastructure defence principles integrated with cyber warfare. The discussion showed how to establish cyber resilience policies to face growing and new threats. Likewise, they demonstrated how states might use the attribution concept and its applicability to deal with actors behind malicious cyber activities. In other words, it examined the issue of the applicability of international rules and attribution regulations to state and non-state actors for malicious cyber activities in the attribution context.
A detailed examination of cyber resilience strategy by Carías et al. [
34] showed a road map for building cyber resilience using an efficient investment strategy. To achieve this, the system dynamics methodology will be followed to get experts’ opinions on the best approach to supporting cyber resilience. Cyber resilience experts must use technology and personal training, and neither should be overlooked as an investment strategy. This strategy will be helpful to factories in minimising the probability of any cyber-attack efficiently. Factory managers can use their model as a decision-making tool because it shows the behaviour of main variables that are not easily quantifiable in simple graphs. Therefore, this model could be a helpful tool in a factory manager’s decision-making process to develop strategies for enhancing cyber resilience.
An excellent strategy is to enable cyber resilience in SMEs based on a few simple steps as part of the new digital world. Those few steps can be summarised into seven steps [
91] that can pave the way for SMEs to cyber resilience. These seven steps are (1) invest in effective antivirus, anti-malware, and firewall solutions; (2) ensure the critical data of the business is protected; (3) have clear and simple policies in place; (4) have awareness training regularly; (5) review policies and contracts with suppliers; (6) have an up-to-date plan for incident response; and (7) consider investing in cyber insurance for covering the disclosure of security and data privacy incidents.
One of the strategic decision-making frameworks for assessing the cyber resilience of additive manufacturing supply chains was proposed by Rahman et al. [
116]. The strategy framework utilises a data fusion technique called the
hierarchical evidential reasoning based approach, which handles the data’s incomplete, uncertain, and subjective nature. The strategy is based on the Dempster Shafer theory and incorporates Yager’s recursive rule of combination for validation.
The assessment process essential criteria (factors) are aggregated by Rahman et al. [
116] to obtain a Cyber Resilience Index using the Dempster Shafer combination rule. Based on their experience, knowledge, and education, the subjective data experts used to evaluate the cyber resilience attributes. The strategy allows for a holistic assessment of the cyber resilience of additive manufacturing supply chains, considering both cyber structures and organisation-wide operations. Practitioners can adopt the proposed methodology to assess the condition state of cyber resilience and compare multiple organisations’ cyber resilience.
One of the cyber-resilient control strategies proposed is to enhance the cyber resilience of
Microgrid (MG) systems and restore cyber connectivity after
Denial of Service (DoS) and latency attacks presented by Yao et al. [
146]. The strategy consists of two control modes. The adaptive-gain resilient controller’s first model is designed to sustain the fast stabilisation of Microgrid (MG) systems under non-uniform time-varying latency attacks. It is proved by the stochastic stability analysis using the Lyapunov-Krasovskii functional method.
The ETTR (Event-Trigger Topology Reconfiguration) controller [
146] is a model designed to mitigate excessive latency and connectivity issues resulting from Denial of Service (DoS) attacks. The
ETTR! (ETTR!) controller optimally reestablishes the damaged cyber topology and restores the destroyed control objective under DoS attacks, such as accurate power sharing. A switching mechanism is also designed to coordinate the preceding control modes to guarantee the secondary control functions of MG systems. The proposed control strategy provides a systematic control framework for the complicated MG scenario under both attacks with sufficient stability and optimal cyber performance.
The U.S.
Department of Defense (DoD) has been taking proactive measures to ensure its cyber resilience systems are robust against potential cyber-attacks [
40]. One of the main aspects of this effort involves implementing a cyber resilience strategy program that includes network segmentation, multi-factor authentication, and continuous monitoring. However, integrating these measures with the existing systems and processes has proven challenging.
A step-by-step implementation strategy was developed by De Cristofaro et al. [
44] to address this issue, which involves rigorous testing and validation before full deployment. This approach is crucial to ensuring that the cyber resilience measures effectively protect against potential threats. The Department of Defense (DoD)’s commitment to cybersecurity is commendable, and its efforts serve as a model for other organisations looking to strengthen their cybersecurity systems. By prioritising cyber resilience, the DoD is taking a proactive stance against threats that could compromise national security and the safety of its personnel.
5 Recent Advancements in Cyber Resilience
In this section, we introduce techniques to improve and increase cyber resilience. Additionally, we compare different cyber resilience improvements, as seen in Table
5. Recent advancements in cyber resilience have come in multiple forms, such as based on recommendations [
58], based on best practices and standards [
129], and based on using technologies [
111] to improve cyber resilience or based on multiple factors [
90].
Several examined the advancements in cyber resilience, but the first one by Partridge and Young [
76] presented the CERT-RMM applicable in organisations. The model allows its adopter’s continuity in using preferred codes and standards of practice at a tactical level that improves the management of operational cyber resilience at the process level. This technique shows the areas of overlap and redundancy between Computer Emergency Response Team (CERT)-Resilience Management Model (RMM) process areas and the guidance in the NIST discussed in the work of Mylrea et al. [
100], and it also identifies the gaps that may affect the maturity of practice. It aligns the tactical practices suggested in the NIST publications to the process areas that represent operational resilience management at a process level.
One of the cyber resilience improvement studies to the supply chain was established by Goldman et al. [
58], which presented several approaches to improve cyber resilience and described an application scenario. They mentioned techniques that did not apply to all systems. However, to begin building cyber resilience into existing or appearing systems, the designers must analyse which strategies are most suitable for the environments and missions. Furthermore, they focused on actionable operation and architectural recommendations to enable mission assurance and address advanced critical services threats. These recommendations can create improvements leading to a transformation with minimal impact on essential functions, acting as a deterrent, reversing adversary advantage, and increasing adversary cost and uncertainty.
Bodeau and Graubart [
21] discussed a general assessment approach to cyber resilience and improved the recommendations with architectural evolution and process improvement to make more productive use of cyber resilience practices. They focused on resilience assessment for family systems, system-of-systems, mission/business segments, or common infrastructures. The advantage of their approach is that it can also be applied to components, services, and individual systems. Moreover, the method can be applied as a built-in architecture or an operational where the emphasis may be on either “low-hanging fruit” or opportunities for high-leverage improvements while using a few numbers of cyber resilience techniques.
The organisation has many steps to improve cyber resilience, but we will demonstrate the five main phases. At first, while initiating a discussion about cyber resilience, it is critical to be aware of its executive management. Second, finding the right balance between corrective controls, detective, and prevention is vital. Third, making the right balance between technical rules, processes, and people is required. Fourth, implementing best practices and standards in the organisation, such as
ISO/IEC 27001! (ISO/IEC 27001!) and AXELOS cyber resilience best practice guide, must be carried out. Finally, testing and keeping the organisation up to date with new cyber-attacks will ensure cyber resilience is under control and working properly [
129].
One of the manageability implementations for improving the cyber resilience and risk management processes of SMEs is proposed by Nykänen and Kärkkäinen [
106]. They presented the semantic wiki as a platform for information security knowledge. They introduced traditional information security based on
Confidentiality, Integrity, and Availability (CIA) properties to control the catalogue to select appropriate controls from availability viewpoints. Suppose we wish to focus on the authorities and resilience. Then, in this case, they must be using the NIST SP 800-53 control catalogue, including 115 low controls, with only 87 of these on level one as expected to implement it in all information systems in the first phase. The number of authorities can also reduce this first power phase in their classifications to 50 only.
A recent study by Aguilera [
145] has shown improved cyber resilience to overcome cyber-attacks using the Flooid resilience platform. Their resilience platform is designed to manage and orchestrate the container lifecycle while applying cyber resilience techniques and enforcing security. Flooid allows for deploying an application, executing its security, and returning the system to a specific state in case of a cyber-attack. They presented Flooid’s strategy to decrease the number of threats through the most common vulnerabilities, such as new code, inner-container attacks, cross-container attacks, container escaping, and resource consumption. They proved that Flooid could perform stateful recovery with minimal overhead. The recovery strategy includes container rollback, cloning, or live migration. They found that the performance of their approach is up to four times faster due to less information transmitted relative to the traditional procedure to reinstate the steady state.
Galinec and Steingartner [
54] undertook the preliminary work on advancing cyber resilience in cyber defence. The study investigated how cyber defence and cybersecurity can be combined to increase cyber resilience while describing cybersecurity relations, IT security, operational technology security, information security, and other related disciplines and practices within cyber defence. Exploring new techniques and standards for achieving cyber resilience is necessary, particularly in light of emerging cyber-attacks.
Li et al. [
86] introduced the metric of similarity to capture how similar vulnerabilities between two different products are by applying it in a statistical study on databases of
Common Vulnerabilities and Exposure (CVE)/National Vulnerability Database (NVD). They showed that multiple products could result from most vulnerabilities, even from other vendors. The similarity metric can estimate the probability of a zero-day to exploit successful self-propagation between two different products. Such propagation can be effectively reduced by assigning various effects to a pair of connected hosts.
A high-performance post-implementation of cyber resilience in any organisation requires five steps to attain augmented performance. The first step builds on a solid foundation of protecting and hardening core assets. The second step performs a pressure test to identify the resistance via coached incident simulation. The third step applies automated defence technologies such as automated orchestration capabilities and advanced identity access management. The fourth step uses data and intelligence for proactive threat hunting, such as implementing strategy and providing tactical knowledge of the threat. Last but not least, evolving chief information security officer roles in business leadership means that the next generation such officers should be business adept and tech-savvy [
75].
Linkov and Kott [
90] discussed the resilience of a system, an organisation, and a network, considering several factors in an often complex and contradictory manner, enhancing the stability and improving cyber resilience. These factors are managed based on complexity, chosen topology, added resources, design for reversibility, control propagation, provided buffering, prepared active agents, built agents capabilities, considered adversary, and the conducted analysis.
The Ponemon Institute [
69] presented the importance of improving cyber resilience to ensure a strong security position. They highlighted the importance of automation for cyber resilience. Automation allows security technologies that replace or increase intervention to contain and identify breaches or cyber exploits. Such technologies depend on Artificial Intelligence (AI), orchestration, analytics, and ML. They have shown improvements with some recommendations for achieving a more substantial cyber resilience level, such as investing in automation, hiring a skilled workforce, participating in threat intelligence, considering a valuable and integral, aligning privacy and cybersecurity, and using key metrics for measuring cyber resilience.
Baykara and Das [
12] propose a honeypot-based approach for
Intrusion Detection and Prevention Systems (ID/PS) that can detect zero-day attacks and reduce false positives in Intrusion Detection Systems (IDS). This system can help improve organisations’ cyber resilience by providing an additional security layer to their information systems. Using virtualisation technologies can also reduce the cost of configuration, maintenance, and management. Therefore, the study’s proposed system is a potential solution for enhancing organisations’ cyber resilience.
In the study conducted by Ahmed et al. [
2], a notable advancement in cyber resilience pertains to evaluating cyber resilience within field hospitals, aligning with the burgeoning trends in the field hospital domain and the broader health sector. This evolving landscape introduces potential vulnerabilities that malicious actors could exploit, underscoring the need to enhance response strategies to achieve robust cyber resilience. The assessments conducted in this context serve a dual purpose: they inform users and stakeholders about the extent of risks surrounding the hospital’s cyber assets and shed light on the avenues through which threat vectors could manifest. This approach starkly contrasts prevailing practices that assess the cyber assets of mobile field hospitals, illustrating a shift towards recognising and mitigating potential vulnerabilities.
To bolster the cyber resilience of Phasor Measurement Unit networks against malicious assaults and system anomalies, Qu et al. [
114] devised an optimisation-centered approach for network management. This approach capitalises on the SDN communication framework to facilitate the reinstatement of the Phasor Measurement Unit connectivity and reestablish observability within power systems. Their scheme facilitates swift network recovery by optimising the path generation and installation procedure while streamlining the SDN rule implementation on switches. This effort has culminated in creating a functional prototype system through which the authors gauged power system observability, recovery speed, and the efficiency of rule compression. Their evaluation hinged on the IEEE 30-bus system and the IEEE 118-bus system.
A new algorithm for modelling heavy-tailed data to understand cyber risks better and improve cyber resilience was proposed by Dacorogna et al. [
41]. Using this algorithm, the authors analyse a database of cyber complaints filed at the Gendarmerie Nationale, which reasonably estimates the whole distribution, including the tail. The study confirms the finiteness of the loss expectation, a necessary condition for insurability. The authors draw the consequences of this model for risk management, then compare its results to other standard EVT models, and lay the ground for the classification of attacks based on the fatness of the tail. The study aims to contribute to understanding cyber risks and improving cyber resilience in modern economies.
Kim and Kim [
78] propose a blockchain-based
Non-Stop Customs Clearance (NSCC) system for cross-border trains. The proposed system addresses delays and resource consumption issues caused by customs clearance processes. The purpose of the proposed system is to create an NSCC process for cross-border trains, reducing delays and resource consumption associated with traditional customs clearance systems. The proposed system uses a blockchain network to connect various trade and customs clearance agreements. This integration ensures the integrity and minimal resource consumption of the system.
The system proposed in the work of Kim and Kim [
78] includes various participants, such as railroads, freight vehicles, transit stations, and the existing customs clearance system. The proposed system utilises sequence diagrams and blockchain technology to protect customs clearance data’s confidentiality and integrity. The article demonstrates the structural attack resilience of the proposed system using a blockchain, a consensus algorithm, and an attack sequence diagram created with MITRE ATT&CK. This approach strengthens the system’s ability to withstand attacks. The results show that the blockchain-based Non-Stop Customs Clearance (NSCC) system is time- and cost-efficient compared to the current customs clearance system. The proposed system offers improved cyber resilience attacks, making it more secure and reliable.
7 Threat Modelling for Cyber Resilience
Threat modelling is one of the approaches for identifying security requirements to design the systems correctly and securely [
117]. Threat modelling makes it possible to identify all potential threats to the systems and therefore assists system designers in considering the mitigation and making their design more secure and reliable. A threat model covers policies against various security threats and possible mitigation strategies [
60]. The primary purpose of a threat model is to facilitate awareness and identification of all possible threat scenarios that may be applicable in a specific context. Threat modelling can help identify, classify, and describe threats [
98].
Threat modelling finds application in two main ways: first, as an assessment tool to evaluate the existing state of a system, and second, as a security-by-design instrument during the development of novel approaches [
141]. These models can be employed as inputs for running attack simulations, a technique that delves into the actions of potential attackers within the system. By leveraging the outcomes of these simulations, stakeholders can delve into security scenarios, enabling them to identify and implement measures more efficiently to fortify the security of their systems.
Several popular threat modelling methodologies summarised in Table
7 are classified based on the volume of data. Some of these methodologies are suitable for cyber resilience, such as
Spoofing, Tampering, Repudiation, Information disclosure, Denial of Service, and Elevation of privilege (STRIDE) modelling [
133];
Process for Simulation and Threat Analysis (PASTA) modelling [
137]; and
Damage, Reproducibility, Exploitability, Affected, and Discoverability (DREAD) modelling [
105]. However, some of them are not suitable for cyber resilience, such as
Visual, Agile, and Simple Threat (VAST) modelling [
1];
Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) modelling [
4]; and Trike modelling [
124]. These threat modelling methodologies are illustrated in Figure
6.
Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, and Elevation of privilege (STRIDE) defines both a threat model and a stepwise threat modelling process. STRIDE is widely applied to analyse the security of systems since it provides a precise classification of threats [
140]. The STRIDE primary helps the software developers consider security during the design phase [
79]. The implementation of PASTA begins at the system level, using a high-level architecture. This initial round enables threat modellers to define all inputs and outputs for each system component [
82]. Damage, Reproducibility, Exploitability, Affected, and Discoverability (DREAD) is an asset-centric threat modelling approach developed by Microsoft in 2018. DREAD considers the traditional qualitative risk rating (low, medium, and high). In general, the DREAD threat modelling approach utilises a scoring system to calculate the probability of occurrence for each identified area of the asset being threat modelled [
105]. DREAD acts as a classification scheme for comparing, quantifying, and prioritising the amount of risk presented by each threat [
73].
The Visual, Agile, and Simple Threat (VAST) methodology is designed for performing an in-depth analysis of the process and application-level threats that focus on enterprise business [
1]. It incorporates three necessary posts for supporting a scalable solution: automation, integration, and collaboration [
1]. The Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) methodology is a risk-based strategic assessment and planning method for cybersecurity. OCTAVE focuses on assessing organisational risks and does not address technological risks. Its main aspects are operational risk, technology, and security practices [
126]. Trike methodology is an open source security audit framework that uses threat modelling. Trike was introduced in 2006 as a stand-alone desktop application and evolved into a spreadsheet [
117]. Trike modelling focused on satisfying security auditing processes for cyber risk management.
Microsoft presents five significant threat modelling phases [
94] that are illustrated in Figure
7. These five phases are (1) defining security requirements, (2) creating an application diagram, (3) identifying threats, (4) mitigating threats, and (5) validating the threats that have been mitigated. The popular tool for threat modelling developed by Microsoft called the
Threat Modelling Tool (TMT) helps software developers identify and mitigate security issues early in the
Software Development Life Cycle (SDLC). The tool was first released in 2008 under the name of Microsoft
Security Development Lifecycle (SDL) and later replaced with Microsoft Threat Modelling Tool (TMT) in 2011, with the latest version released in 2018 [
99]. Microsoft TMT is designed for all developers, including those who are not experts in software security.
The threat model of cyber resilience is a model of malware rebirthing botnet. It can be used in different ways to modify and collect existing malware systems, including inserting known malware signatures into the code of non-malicious and processing systems to achieve confidence in denial and network traffic to overload sensors. It can use program signatures of known malware to trigger malware detection systems that the system was taken offline for further analysis [
29].
8 Discussion
This section summarises findings, limitations, open problems, and future directions related to cyber resilience. Several works have shown that cyber resilience is necessary for academic and industrial environments. As mentioned in the previous sections, most cyber resilience areas discussed by the researchers address frameworks, strategies, improvements, applications, and tools for cyber resilience. Some areas, such as principles, metrics, life-cycle management, assessment methods, and organisational cyber resilience, were lacking during the discussion on cyber resilience. Current studies on cyber resilience have highlighted the importance of these areas and how they will improve cyber resilience at different organisational levels.
8.1 Research Challenges
In our survey on cyber resilience, we discovered various research challenges in the field. These include the need for standardisation and consistency in CRFs, strategies, recent advancements, and tools. The existing frameworks for cyber resilience have implementation complexity and cannot properly measure and quantify cyber resilience. The strategies and approaches discussed in the literature need to be more compatible with specific applications. Recent advancements in cyber resilience studies require multiple and complicated configurations for implementation.
While there are many cyber resilience tools available, they can often be limited in terms of performance, features, and accessibility. Furthermore, these tools can be quite costly. Measuring and evaluating cyber resilience can also be a complex task, and it is important to understand further how human factors impact it. This is especially challenging in systems and networks with autonomous agents. Many assessment approaches and tools need to be more effective in measuring cyber resilience during cyber-attacks.
To address these challenges, more comprehensive and integrated approaches to cyber resilience are needed. This highlights the importance of developing better frameworks, strategies, tools, and techniques to measure, enhance, and quantify cyber resilience in various domains. It is recommended that investigation and development efforts be directed towards various areas to enhance cyber resilience capabilities. There is a need to establish standardised and consistent CRFs and strategies that can be implemented in different fields and industries.
Furthermore, it is important to develop metrics and tools that can measure and quantify cyber resilience, conduct research on the influence of human factors such as employee behaviour and decision making on cyber resilience, and create comprehensive and integrated approaches to cyber resilience that incorporate both technical and non-technical strategies. Addressing these research challenges and developing new approaches and techniques is critical to enhancing cyber resilience capabilities and mitigating the impact of cyber-attacks.
8.2 Findings
Many existing works and surveys focused on the fundamental frameworks for attaining cyber resilience. Most CRFs involve very high developmental but low maintenance costs. Between developmental costs and complexity, the implementation of most frameworks reported in this survey has a healthy proceeding. There are limited studies on CRFs that support a multi-data source to analyse complex infrastructure efficiently.
Very few frameworks are open source. The current study’s most important clinically relevant finding was that few previous frameworks used metrics for quantifying cyber resilience. Most of the findings in this survey demonstrate that the current cyber resilience strategies are technical. The existing strategies and approaches are of high quality and will enhance cyber resilience. In general, most current studies on cyber resilience strategies involve high flexibility.
The recent studies on cyber resilience advancements found that most of them apply to developing cyber resilience at an organisational level instead of the system level. However, few studies enhance cyber resilience in supply chain systems, cyber systems, cyber-attacks, and ICS. Most of these enhancement works are discussed at the organisational level. The recent advancements studies found and discussed in this survey use international standards such as ISO/IEC 27001! to improve cyber resilience.
Many applications and areas implemented cyber resilience, such as the transportation sector, financial sector, power systems, supply chain, SCADA systems, smart grid, wireless communication networks, healthcare, and ICS. For example, communication networks favour applications executing cyber resilience, particularly in the intelligent grid network. There are few tools and technologies available for cyber resilience with some limitations.
One unanticipated finding is that no cyber resilience tools could simultaneously work with organisational and operational management. However, when comparing these tools, we found most of them have helpful features such as being easy to use, efficient, and software based—besides, most of the cyber resilience tools generate detailed reports. In general, the performance of current cyber resilience tools is quite reasonable, given that most are efficient.
8.3 Limitations
The major limitation of the existing frameworks is their implementation complexity. The frameworks discussed in this survey cannot properly measure and quantify cyber resilience. The principal limit of the existing strategies and approaches in the literature is their low compatibility with specific applications. Recent advancements in cyber resilience studies require multiple and complicated configurations for implementation. The limitations of the existing cyber resilience tools naturally include features and performance. The fundamental issue with these tools and technologies is that they are not open source. Moreover, the cost of most cyber resilience tools is extremely high.
Systems and networks enabled with autonomous agents can respond to cyber-attacks with speed and scale that are unachievable with purely human defenders. Still, the mere presence of autonomous agents in the system adds vulnerabilities and can reduce cyber resilience. Most assessment approaches presented in this survey on frameworks, strategies, improvements, and tools have limitations for quantifying cyber resilience, especially in systems and networks with autonomous agents that can enable cyber resilience with new technologies such as the Internet of Things (IoT) and AI.
The main limitation of most of these studies is that they do not thoroughly discuss cyber resilience. Understanding the cyber resilience concept is critical before implementation. A well-established systematic literature review can provide an in-depth understanding of the cyber resilience concept, strategies, applications, tools, and limitations. It is necessary to have a systematic literature review that provides a systematic approach to the domains discussed in this survey.
8.4 Open Problems
There are still many unanswered questions about cyber resilience at the organisational or operational levels. The organisation’s strategy for cyber resilience overlooks the individuals in charge of its implementation and management, and additionally who will be responsible technically for measuring cyber resilience at the operational level. One open problem is achieving consensus control of complex networks and systems with cyber resilience for resisting distributed DoS attacks on the communication infrastructure.
The main reason is that complex systems and networks may create additional difficulties in analysing and quantifying cyber resilience. We discuss and analyse cyber resilience assessment studies in this survey in different domains, such as frameworks and recent advancements. However, most assessment studies and tools cannot measure cyber resilience under cyber-attacks. The measurements and benchmarking using assessment tools in cyber resilience are one of the current leading open problems.
Most existing research trust and importance systems utilise various defence mechanisms against specific cyber-attacks. Although researchers have proposed and implemented several such defence techniques, current systems typically address only minimal cyber-attacks and hardly provide a comprehensive solution. We believe the ability to design a comprehensive stable system with cyber resilience to an entire collection of cyber-attacks is an open problem and a big challenge.
Many types of cyber-attacks affect the systems and networks, but most current works on cyber resilience considered only the distributed DoS attacks. Different types of cyber-attacks need to be considered when implementing cyber resilience. Unfortunately, the literature reviewed in this survey only concerns a specific type of cyber-attack. Cyber resilience needs more investigation and consideration with multiple cyber-attack types, which can be mounted simultaneously.