1 Introduction
The
Security Operations Centre (SOC) is a crucial component in safeguarding the confidentiality, integrity, and availability of the modern digital enterprise against evolving cyber attacks. As a dedicated facility staffed with highly skilled security professionals and equipped with advanced technologies and processes, the SOC serves as the central hub for detecting, analysing, responding to, and recovering from cyber threats. The increasing financial impact of cyber incidents, stemming from both cyber crime activities and state-sponsored attacks, has significantly driven the demand for SOCs in recent times. The estimated cost of cyber crime worldwide was $7.08 trillion in 2022 and is projected to increase to $13.82 trillion by 2028 [
46]. This surge, which has seen a 600% increase since the COVID-19 epidemic [
50], has led to a corresponding increase in cybersecurity spending, with the size of the cybersecurity market set to reach $657.02 billion by 2030 [
16]. The SANS 2023 SOC Survey, which encompasses the major sectors such as technology, banking and finance, cybersecurity, and government, reveals that around 80% of surveyed organisations operate their SOC 24/7, either in-house, outsourced, or in mixed mode [
13]. Even small- and medium-sized businesses are recognising the need for SOCs, highlighting the growing demand for SOC support.
The SOC typically offers a range of essential security functions across three tiers, as shown in Figure
1. Tiers 1 and 2 focus on real-time detection and response to cybersecurity incidents, sometimes referred to as hot path activities. At Tier 1, the focus is on
high-speed remediation, which involves handling initial incident response, swiftly addressing security events through alert triage, basic analysis, and prompt resolution of straightforward issues. Advanced incidents are escalated to Tier 2, where analysts undertake
advanced analysis and remediation. They conduct in-depth investigations, collaborate with other teams, and employ advanced techniques for remediation. In Tier 3, senior analysts and security engineers actively conduct proactive hunting, advanced forensics, complex investigations, and incident response strategy development. They also provide guidance for lower-tier analysts. These activities are referred to as cold path activities. Operating across these three tiers, the SOC delivers a unified and comprehensive cybersecurity strategy that aligns with the NIST Cybersecurity Framework’s core functions of detection, response, and recovery [
43].
The modern SOC leverages a plethora of advanced technologies, many of which are AI enabled, to enhance its capabilities in detecting, analysing, and responding to cyber threats. These technologies include (i)
Security Information and Event Management (SIEM) systems [
23], which collect, correlate, and analyse security event data from diverse sources in real time; (ii) Security Orchestration, Automation, and Response (SOAR) platforms [
6], which automate and streamline security processes, facilitating efficient incident response and workflow management, (iii) Endpoint Detection and Response (EDR) solutions [
25], which are deployed to detect and respond to advanced threats targeting endpoints; (iv) Extended Detection and Response (XDR) platforms [
4], which integrate multiple security controls and use advanced analytics to detect threats across the organisation’s infrastructure; and (v) User and Entity Behaviour Analytics (UEBA) solutions [
32], which serve to identify anomalous behaviour and detect insider threats. These technologies empower the SOC to effectively monitor, detect, and respond to cyber threats, strengthening an organisation’s overall cybersecurity posture.
However, despite the increasing use of AI-enabled tools and technologies, recent studies such as the SANS Institute’s annual SOC Survey [
13] and Trend Micro’s Global Study on Security Operations [
39] indicate that SOCs face a range of challenges across the people, process, and technology domains, hindering their efficiency and effectiveness in addressing cyber threats. On the people front, high staffing requirements and a scarcity of skilled personnel create difficulties in adequately supporting SOC operations. In terms of processes, the lack of automation and orchestration, the absence of enterprise-wide visibility, and the absence of well-defined processes or playbooks contribute to inefficiencies and hinder incident response. From a technology perspective, the overwhelming volume of alerts without proper correlation, the presence of numerous unintegrated tools, and the lack of context surrounding observed incidents further impede effective decision making and response. Collectively, these challenges contribute to the growing problem of
alert fatigue, which refers to the state of mental weariness and decreased responsiveness among security analysts due to the overwhelming volume of security alerts to be triaged and investigated [
2,
39]. Alert fatigue poses significant ramifications for SOCs, impacting their ability to effectively detect, prioritise, respond to, and recover from security threats. This phenomenon can have a detrimental impact on an organisation’s overall cybersecurity posture.
The rapid advances in AI and the growing integration of AI-enabled tools and technologies within SOCs give rise to a compelling argument for the implementation of
human-AI teaming within the SOC environment. Effective human-AI teams can leverage the distinct capabilities of both human analysts and AI systems, while overcoming the known challenges and limitations of each team member. AI-powered automation and orchestration can effectively handle well-known and routine scenarios without requiring human intervention, freeing up SOC analysts to focus on higher-level tasks requiring critical thinking and domain expertise. Moreover, AI-driven augmentation can empower SOC analysts with advanced insights through the correlation and analysis of extensive datasets, thereby expediting decision making and enhancing incident response capabilities. Considering the highly dynamic cybersecurity landscape, even seasoned domain experts might encounter novel and open-ended scenarios that present unique challenges devoid of clear-cut solutions. In such scenarios,
human-AI collaboration emerges as a potential avenue, combining human intuition, contextual understanding, and expertise with the computational capabilities and data processing power of AI, to navigate complex situations and explore prospective solutions to open-ended problems. Human-AI teaming can be applied across all three tiers to enhance the entire spectrum of security operations as depicted in Figure
1.
This article presents our vision and proposal for harnessing the power of human-AI teaming to enhance the efficiency and effectiveness of SOC operations. Our proposal emphasises strategic utilisation of human-AI teaming, particularly human-AI collaboration, with the overarching goal of enhancing SOC functions, specifically by strengthening detection and response capabilities. Central to our proposal is the foundational concept of
Situation Awareness (SA) and
shared SA, often referred to as “common ground” [
19,
21], which establishes a pivotal connection between the human analysts and AI systems. SA and shared SA are crucial factors that can significantly impact the overall effectiveness and efficiency of a human-AI team. Furthermore, we advocate for the implementation of
flexible autonomy wherein the level of automation for a given task or function is dynamically adjusted over time. This adjustment can be managed either by the human expert, termed
adaptable automation, or by the system itself, known as
adaptive automation [
40]. We believe this adaptability is essential to maintain optimal human-AI team performance in dynamic security environments. We also introduce our initial work on designing a conceptual framework for human-AI teaming, which we term the
\(\mathcal {A}^2\mathcal {C}\) Framework. The framework supports three distinct modes of decision making:
automated,
augmented, and
collaborative. In the automated mode, AI systems autonomously handle decision making without human intervention. In the augmented mode, the AI system defers decision making to the human expert but provides relevant information, insights, and recommendations. Conversely, in the collaborative mode, termed as
collaborative exploration, the human expert addresses complex and uncertain problems by harnessing the capabilities of AI systems to engage in joint exploration and investigation. By strategically integrating these three modes of decision making, our aim is to support flexible autonomy in human-AI teams. This initial conceptualisation lays the groundwork for future development, providing a clear and compelling roadmap to transform this vision into a practical and impactful solution for addressing the persistent issue of alert fatigue in SOCs.
The rest of the article is organised as follows. Section
2 sets the stage with an overview of the SOC’s incident triage, analysis, and response function. Section
3 delves into the problem of alert fatigue, examining its underlying causes and potential impacts. Section
4 lays the groundwork for human-AI teaming by exploring foundational concepts crucial for its implementation. Section
5 introduces the
\(\mathcal {A}^2\mathcal {C}\) Framework, wherein shared SA enables flexible autonomy in human-AI teams through automated, augmented, and collaborative decision making. This section also discusses how the framework can contribute to mitigating the problem of alert fatigue in SOC. Finally, Section
8 concludes the article with an outlook on future work.
2 Incident Triage, Analysis, and Response
One of the primary functions of the SOC is
incident triage, analysis, and response—a comprehensive process comprising several crucial stages, including real-time alert monitoring and triage, in-depth analysis and investigation, incident containment and recovery, and coordination of incident response activities [
33]. This process enables the SOC to swiftly identify, assess, and address security incidents, minimising their impact and protecting critical assets.
A modern SOC employs a range of tools to continuously monitor the organisation’s network, systems, and applications for security events. A security event is any observable occurrence, such as a firewall blocking a connection attempt, or the detection of a malware infection within a system. These events are pushed out to a central system such as the SIEM, which enriches, analyses, and correlates the collected event data to generate alerts. An alert is a technical notification that a particular event, or series of events, has occurred [
33], such as multiple failed login attempts from a specific IP address within a short period of time.
Intrusion Detection Systems (IDS), such as the Network IDS and the Host IDS, are additional sources of alerts.
Triage analysis allows for swift assessment and analysis of these alerts to filter out genuine security incidents from false positives.
In-depth analysis of validated alerts is undertaken to determine the root cause of the incident, identify affected systems, and assess the potential impacts of the incident. Based on the findings of the investigation, an appropriate response is developed and implemented. This may involve containment measures, such as isolating infected systems or restricting network access, to prevent further damage or loss. Remediation steps, such as patching vulnerabilities or implementing security controls, are undertaken to address the underlying causes of the incident. Once the incident has been resolved, a comprehensive incident report is documented, detailing the incident’s timeline, findings, actions taken, and lessons learned. This report serves as a valuable resource for future incident prevention and response. Finally, based on the findings from the incident analysis, security rules and configurations may be refined to improve detection accuracy and reduce false positives. Figure
2 shows a simplified representation of the main steps involved.
In this section, we provide a succinct overview of the triage analysis and in-depth analysis steps as they are closely linked with the problem of alert fatigue.
2.1 Triage Analysis
Triage analysis is a fundamental Tier 1 process that serves as the initial step in the incident response process [
34,
61]. Its primary goal is to quickly determine the criticality and legitimacy of each alert. As alerts get generated by the SIEM, or an equivalent system, they are prioritised and presented to the Tier 1 analysts for validation. Alert prioritisation is accomplished by assessing various criteria such as alert type, alert severity, the affected systems, the potential organisational impact, or specific contextual details linked with the alert. While the SIEM performs the initial prioritisation, analysts can either validate the alerts in the order predetermined by the system or exercise their own judgement to establish the validation sequence. Alert validation involves determining the legitimacy of alerts and distinguishing genuine security incidents from false positives. This involves actions such as cross-referencing alerts with other relevant events, reviewing system logs, analysing network traffic, and consulting threat intelligence sources to assess the validity and seriousness of the alerts. Certain alerts may exhibit complex patterns or indicators of more sophisticated threats that require further scrutiny. These “interesting” alerts are escalated to higher-tier analysts or specialised incident response teams for more in-depth investigations [
30]. Conversely, swiftly identified false positives or inconsequential alerts undergo routine examination and closure.
2.2 In-depth Analysis
Incidents that are more complex and critical are escalated to Tier 2 analysts who conduct in-depth analysis and investigation [
34,
61]. This stage involves two steps: alert prioritisation and alert investigation. The primary goal of alert prioritisation remains consistent: aiming to ensure that the most critical alerts are investigated first. Unlike Tier 1, where some incidents can be promptly resolved through automation processes, Tier 2 requires a higher level of human involvement due to the intricate nature of incidents and the need for comprehensive analysis. This stage relies on human intervention and expertise, harnessing the distinct capabilities of skilled analysts. Although some structured processes might continue to guide their approach, analysts must apply critical thinking, creative problem solving, and lateral reasoning to investigate and respond to these incidents effectively [
51]. Moreover, the potential for collaboration between human analysts and AI becomes more pragmatic and advantageous within this tier. Here, the human expert can harness the capabilities of AI systems for collaborative exploration and investigation of novel security incidents [
22,
26,
36].
While similarities exist between triage analysis at Tier 1 and more in-depth analysis at Tier 2, important distinctions set these tiers apart. At Tier 1, the emphasis is on swift validation and categorisation of alerts, enabling the SOC to efficiently handle a significant volume of low-to-medium complexity alerts. Resolutions at Tier 1 typically occur within seconds or minutes. In contrast, Tier 2 involves a comprehensive investigation process that can span from hours to even months [
33]. This can be attributed to the escalating complexity of the digital landscape, coupled with the evolving threat landscape. In this context, even highly experienced security analysts encounter novel and open-ended situations that pose unique challenges without a clear and definitive solution. It should be noted that the volume and velocity of alerts at Tier 2 are relatively lower than those at Tier 1.
5 \(\mathcal {A}^2\mathcal {C}\) Framework: Conceptualisation of Human-AI Teaming
Building upon the foundational concepts of incident triage, analysis, and response in Section
2, the exploration of key factors contributing to alert fatigue in Section
3, and human-AI teaming concepts in Section
4, this section introduces the
\(\mathcal {A}^2\mathcal {C}\) Framework as a conceptual model for human-AI teaming and collaboration. As illustrated in Figure
3, the framework incorporates SA and shared SA to support flexible autonomy through various decision-making modes. These modes can be applied to different domain-specific tasks such as incident triage, analysis, and response in the SOC.
5.1 SA and Shared SA
For human-AI teams to excel, both individual SA and shared SA play crucial roles: the former ensuring effective individual performance, and the latter fuelling superior team coordination [
21]. Development of shared SA requires several key situation models, as depicted in Figure
3. These models include self-awareness, teammate awareness, and world awareness.
5.1.1 Self-Awareness.
Humans must possess meta-awareness of their own capabilities, including awareness of how fatigue, excessive workload, and training gaps impact their abilities. Similarly, AI needs to develop self-awareness of its own strengths and limitations, and use this to determine proactive handovers to human experts when confidence is low or the risk is too high. When humans are cognisant of their own limitations, they can proactively seek assistance from colleagues or AI systems and avoid making decisions independently. Conversely, when AI systems are aware of their own limitations, they can appropriately flag alerts that necessitate further human scrutiny. This approach ensures that critical decisions are not solely reliant on automated processes, emphasising the importance of human involvement in nuanced and complex situations.
5.1.2 Teammate Awareness.
Effective teaming between humans and AI systems hinges on a deep understanding of the capabilities and limitations of both human analysts and AI systems. While self-awareness is crucial for each entity to recognise its own strengths and weaknesses, mutual awareness takes this concept a step further, fostering a collaborative environment where both parties are cognisant of each other’s expertise and limitations. For humans, teammate awareness empowers them to effectively delegate tasks, seek assistance from the AI when appropriate, and interpret the AI’s recommendations with a nuanced understanding of its underlying rationale. Conversely, AI systems equipped with an awareness of the human’s state, including their level of expertise, cognitive load, emotional state, and potential biases, can dynamically adapt their interactions and recommendations to the individual’s current capacity and perspective, ensuring that human judgement is appropriately considered in decision-making processes.
5.1.3 World Awareness.
Just as humans and AI require high levels of self-awareness and teammate awareness to facilitate effective decision making as a team, they must also develop and maintain a comprehensive situation model of the world to underpin their individual decision-making processes. This situation model should encompass current goals, functional assignments, plans, task statuses, and the states and modes of human and AI teammates involved in the work. As roles and responsibilities dynamically transition between human and AI teammates, actively maintaining this model becomes crucial for ensuring the efficacy of human-AI collaboration.
5.2 Flexible Autonomy
Human-AI teams can tackle cognitive tasks in a number of different ways, some of which are illustrated in Figure
3 and discussed briefly in the following. Flexible autonomy empowers both humans and AI to dynamically switch between these decision-making modes by adjusting their control and decision authority based on the context, situation, and specific operational requirements. Shared SA provides the foundation for these dynamic adjustments, playing a crucial role in optimising human-AI team performance across diverse and dynamic scenarios. In doing so, it serves as a linchpin in mitigating alert fatigue.
5.2.1 Full Automation.
In fully automated decision making, decisions are exclusively determined by the AI system without direct human intervention. This approach presents several advantages, including speed, efficiency, and consistency, particularly for routine or well-understood scenarios characterised by clear patterns and well-established decision-making criteria. This ensures that the AI can make accurate decisions with a high degree of confidence, minimising the consequences of potential errors. However, it is important to consider the appropriateness of this approach, as its effectiveness is contingent on the predictability and clarity of the decision-making environment.
5.2.2 Selective Deferral.
Selective deferral is a type of decision-making process employed by AI systems to dynamically handover decision-making responsibility to human experts, contingent upon the level of uncertainty or complexity within a given situation. In this approach, the AI system assesses the available information and evaluates its confidence in rendering a decision. Should the AI system possess a high degree of confidence, it autonomously proceeds with decision making. Conversely, in instances of low confidence or when confronted with complexities surpassing its capabilities, the AI defers the decision to a human expert. In doing so, the AI may provide recommendations or contextual information to aid the expert’s decision making (augmented deferral). Alternatively, the AI may defer without recommendations or additional context, aiming to avoid undue influence on the expert’s decision (passive deferral). This dual deferral strategy enables adaptive and collaborative decision making tailored to the intricacies of diverse scenarios.
5.2.3 Collaborative Exploration.
Collaborative exploration is a dynamic decision-making paradigm designed for scenarios requiring joint efforts from both humans and AI systems to investigate and resolve complex problems. This approach recognises that even experienced professionals may encounter novel or previously unseen situations, necessitating additional information and expertise. By harnessing the strengths of both humans and AI, collaborative exploration seeks to enhance the accuracy and efficiency of decision-making processes.
The typical unfolding of this approach is as follows. When an expert encounters a situation characterised by uncertainty or lacking sufficient context, they initiate collaborative exploration with the AI system. With the collective knowledge acquired through collaborative exploration, the expert formulates well-informed decisions and, when deemed necessary, determines an appropriate course of action to effectively resolve the problem. Additionally, we can also envisage a scenario where the AI system triggers collaborative exploration based on its analysis, proactively engaging human expertise to navigate unforeseen challenges.
A defining feature of collaborative exploration is the nuanced interaction dynamics between the human and the AI, setting it apart from both automation and selective deferral. While automation involves no interaction, and selective deferral involves a one-time transfer of responsibility, collaborative exploration fosters a more interactive dialogue.
5.2.4 Ensemble Decision Making.
In scenarios where neither humans nor AI systems hold a distinct advantage, ensemble decision making combines outcomes from disparate decision makers tackling the same problem, enhancing overall accuracy and robustness. Unlike selective deferral’s reliance on handoffs and collaborative exploration’s emphasis on interactive dialogue, ensemble decision making harnesses independent analyses from multiple team members. This aggregation of diverse perspectives, often achieved through methods like weighted averaging, results in more resilient and precise final decisions.
5.3 Domain-Specific Tasks
In principle, the \(\mathcal {A}^2\mathcal {C}\) Framework can support collaborative decision-making processes involving humans and AI across diverse domains. This section, however, delves into its specific application in the SOC domain, focusing on incident triage, analysis, and response. We discuss how the framework holds promise in alleviating analyst workload and addressing the persistent challenge of alert fatigue through enhanced decision-making processes.
Figure
4 illustrates how the
\(\mathcal {A}^2\mathcal {C}\) Framework can be applied to triage analysis and in-depth analysis. The process begins with the generation and queuing of alerts by the SIEM or equivalent system. The AI conducts a preliminary assessment, swiftly categorising routine alerts that do not require human input and forwarding them to the automated validation queue. Within this queue, the AI autonomously determines whether alerts are false positives or genuine alerts demanding in-depth investigation. In cases where the AI encounters less routine and more intricate alerts and lacks confidence in decision making, it defers to human experts, directing the alert to the expert validation queue. SOC analysts analyse alerts in this queue, deciding whether to take no further action because the alert is a false-positive or benign alert, or to conduct a thorough investigation. Novel or previously unexplored alerts may prompt collaborative exploration, where the expert and AI work together to find a resolution. Similarly, in the in-depth analysis stage, the expert and AI engage in collaborative exploration to find a resolution.
The effectiveness of this workflow relies on SA and shared SA to facilitate dynamic role transitions between the AI and SOC analysts to maintain optimal human-AI performance. The self-awareness of the AI system is crucial, enabling it to defer to the expert when uncertain. During deferral and collaborative exploration, the AI must determine the relevant information to provide to an analyst, aiding efficient alert validation. Furthermore, it must ensure the contextual relevance of the information being shared with the analyst. The AI could do this by leveraging teammate awareness, such as by considering the analyst’s existing knowledge, informational gaps, habitual sources of information consulted by the analyst or the SOC team, and the specific relevance of new information to validating an alert. By seamlessly transitioning between automated AI validation, expert validation, and collaborative validation, the human-AI team can optimally balance decision-making responsibilities, which in turn helps reduce the cognitive load on SOC analysts.
In addition to the segregation of alerts into the automated validation queue and the expert validation queue, human-AI collaboration plays a crucial role in enhancing the prioritisation of alerts within both queues, thereby contributing to the reduction of alert fatigue. By leveraging shared SA, the AI can factor in real-time information about the overall security landscape, ongoing incidents, the current workload of SOC analysts, and their expertise. This intelligent integration of information enables the AI to prioritise and allocate alerts to SOC analysts more effectively, ensuring that the most critical issues are addressed promptly and in alignment with the team’s capabilities.
5.3.1 Illustrative Example: A Financial Company under Phishing Attack.
As a concrete illustrative example, consider the scenario where a financial services company faces a multi-layered phishing attack targeting its employees. Their SIEM detects suspicious activity and generates alerts categorised based on their complexity, which are queued for preliminary assessment by the AI:
•
Automated validation: When the AI encounters e-mails containing malicious URLs, it leverages pre-defined threat intelligence and real-time analysis to automatically quarantine these e-mails and generate low-complexity alerts. This immediate action neutralises the immediate threat without requiring analyst intervention, freeing them to focus on more complex investigations.
•
Augmented deferral: Some e-mails have subtle inconsistencies compared to legitimate sources, such as typos or grammatical errors, slight variations in the domain name, or unusual subdomains. Recognising its limitations in interpreting such inconsistencies, the AI highlights them and defers to the analyst for further investigation. The analyst then meticulously examines the inconsistencies and verifies legitimacy through internal directories or direct contact with the supposed sender before reaching a decision. By highlighting inconsistencies and providing relevant context, augmented deferral empowers analysts to make faster and more informed decisions, ultimately reducing alert fatigue.
•
Collaborative exploration: Some e-mails are highly targeted and employ sophisticated social engineering techniques. These e-mails pose a significant challenge, as they leverage specific details like internal information, projects, or colleagues to appear trustworthy. They might also utilise manipulative tactics like urgency, personalisation, or deceptive attachments to bypass caution. Due to the complexity and potential for manipulation, both the analyst and the AI have limitations in definitively judging the e-mail’s authenticity. To address these complexities, the analyst and the AI collaborate by combining their strengths to explore the situation and develop effective responses as follows:
—
Analyse attack sophistication: Examine e-mail content, communication patterns, and potential information leaks, to understand the attacker’s method and level of effort.
—
Identify potential targets and weaknesses: Analyse targeted executives and potential information leaks to identify vulnerable individuals and areas within the organisation requiring attention.
—
Develop comprehensive response: Create an effective response plan based on the findings, including adjustments to internal security protocols, enhanced monitoring of targeted individuals, and security/user awareness campaigns.
This simple example illustrates how the \(\mathcal {A}^2\mathcal {C}\) Framework can empower SOC teams to combat alert fatigue by leveraging the combined strength of human-AI collaboration. Through its three decision-making modes—automation, selective deferral, and collaborative exploration—the framework effectively prioritises and streamlines the alert handling process.
8 Conclusion
This article envisions a future in which human-AI teaming and collaboration in SOCs not only optimises operational efficiency but also significantly alleviates the cognitive load on analysts. As a result, this collaborative approach is expected to effectively mitigate alert fatigue, ensuring a more streamlined and responsive security environment. The proposed \(\mathcal {A}^2\mathcal {C}\) Framework empowers flexible autonomy in human-AI teams through shared SA, enabling seamless navigation between three key decision-making modes: automated, augmented (through selective deferral), and collaborative exploration. Automation handles routine tasks, selective deferral allows for expert analysis of complex cases, and collaborative exploration tackles novel threats. Collectively, these three decision-making modes, and potentially others, can help optimise incident handling. By doing so, they effectively reduce the cognitive burden on analysts, alleviating the subsequent alert fatigue and ultimately enhancing overall security.
Looking ahead, we aim to develop and evaluate a prototype system built upon the \(\mathcal {A}^2\mathcal {C}\) Framework. This practical implementation will allow us to delve into the intricate dynamics of human-AI teaming within the SOC environment. Through this, we can effectively address existing challenges, refine the framework’s functionalities, and optimise its potential for incident handling. While our primary focus lies in incident triage, analysis, and response, the transformative potential of human-AI collaboration extends beyond this scope. By leveraging human-AI collaboration across other SOC functions and broader security operations, this approach has the potential to significantly enhance cyber defence capabilities, including proactive threat detection and investigation.