Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Towards Human-AI Teaming to Mitigate Alert Fatigue in Security Operations Centres

Published: 15 July 2024 Publication History

Abstract

Security Operations Centres (SOCs) play a pivotal role in defending organisations against evolving cyber threats. They function as central hubs for detecting, analysing, and responding promptly to cyber incidents with the primary objective of ensuring the confidentiality, integrity, and availability of digital assets. However, they struggle against the growing problem of alert fatigue, where the sheer volume of alerts overwhelms SOC analysts and raises the risk of overlooking critical threats. In recent times, there has been a growing call for human-AI teaming, wherein humans and AI collaborate with each other, leveraging their complementary strengths and compensating for their weaknesses. The rapid advances in AI and the growing integration of AI-enabled tools and technologies within SOCs give rise to a compelling argument for the implementation of human-AI teaming within the SOC environment. Therefore, in this article, we present our vision for human-AI teaming to address the problem of alert fatigue in the SOC. We propose the 𝒜2 𝒞 Framework, which enables flexible and dynamic decision making by allowing seamless transitions between automated, augmented, and collaborative modes of operation. Our framework allows AI-powered automation for routine alerts, AI-driven augmentation for expedited expert decision making, and collaborative exploration for tackling complex, novel threats. By implementing and operationalising 𝒜2 𝒞, SOCs can significantly reduce alert fatigue while empowering analysts to efficiently and effectively respond to security incidents.

1 Introduction

The Security Operations Centre (SOC) is a crucial component in safeguarding the confidentiality, integrity, and availability of the modern digital enterprise against evolving cyber attacks. As a dedicated facility staffed with highly skilled security professionals and equipped with advanced technologies and processes, the SOC serves as the central hub for detecting, analysing, responding to, and recovering from cyber threats. The increasing financial impact of cyber incidents, stemming from both cyber crime activities and state-sponsored attacks, has significantly driven the demand for SOCs in recent times. The estimated cost of cyber crime worldwide was $7.08 trillion in 2022 and is projected to increase to $13.82 trillion by 2028 [46]. This surge, which has seen a 600% increase since the COVID-19 epidemic [50], has led to a corresponding increase in cybersecurity spending, with the size of the cybersecurity market set to reach $657.02 billion by 2030 [16]. The SANS 2023 SOC Survey, which encompasses the major sectors such as technology, banking and finance, cybersecurity, and government, reveals that around 80% of surveyed organisations operate their SOC 24/7, either in-house, outsourced, or in mixed mode [13]. Even small- and medium-sized businesses are recognising the need for SOCs, highlighting the growing demand for SOC support.
The SOC typically offers a range of essential security functions across three tiers, as shown in Figure 1. Tiers 1 and 2 focus on real-time detection and response to cybersecurity incidents, sometimes referred to as hot path activities. At Tier 1, the focus is on high-speed remediation, which involves handling initial incident response, swiftly addressing security events through alert triage, basic analysis, and prompt resolution of straightforward issues. Advanced incidents are escalated to Tier 2, where analysts undertake advanced analysis and remediation. They conduct in-depth investigations, collaborate with other teams, and employ advanced techniques for remediation. In Tier 3, senior analysts and security engineers actively conduct proactive hunting, advanced forensics, complex investigations, and incident response strategy development. They also provide guidance for lower-tier analysts. These activities are referred to as cold path activities. Operating across these three tiers, the SOC delivers a unified and comprehensive cybersecurity strategy that aligns with the NIST Cybersecurity Framework’s core functions of detection, response, and recovery [43].
Fig. 1.
Fig. 1. SOC analyst tiers and functions.
The modern SOC leverages a plethora of advanced technologies, many of which are AI enabled, to enhance its capabilities in detecting, analysing, and responding to cyber threats. These technologies include (i) Security Information and Event Management (SIEM) systems [23], which collect, correlate, and analyse security event data from diverse sources in real time; (ii) Security Orchestration, Automation, and Response (SOAR) platforms [6], which automate and streamline security processes, facilitating efficient incident response and workflow management, (iii) Endpoint Detection and Response (EDR) solutions [25], which are deployed to detect and respond to advanced threats targeting endpoints; (iv) Extended Detection and Response (XDR) platforms [4], which integrate multiple security controls and use advanced analytics to detect threats across the organisation’s infrastructure; and (v) User and Entity Behaviour Analytics (UEBA) solutions [32], which serve to identify anomalous behaviour and detect insider threats. These technologies empower the SOC to effectively monitor, detect, and respond to cyber threats, strengthening an organisation’s overall cybersecurity posture.
However, despite the increasing use of AI-enabled tools and technologies, recent studies such as the SANS Institute’s annual SOC Survey [13] and Trend Micro’s Global Study on Security Operations [39] indicate that SOCs face a range of challenges across the people, process, and technology domains, hindering their efficiency and effectiveness in addressing cyber threats. On the people front, high staffing requirements and a scarcity of skilled personnel create difficulties in adequately supporting SOC operations. In terms of processes, the lack of automation and orchestration, the absence of enterprise-wide visibility, and the absence of well-defined processes or playbooks contribute to inefficiencies and hinder incident response. From a technology perspective, the overwhelming volume of alerts without proper correlation, the presence of numerous unintegrated tools, and the lack of context surrounding observed incidents further impede effective decision making and response. Collectively, these challenges contribute to the growing problem of alert fatigue, which refers to the state of mental weariness and decreased responsiveness among security analysts due to the overwhelming volume of security alerts to be triaged and investigated [2, 39]. Alert fatigue poses significant ramifications for SOCs, impacting their ability to effectively detect, prioritise, respond to, and recover from security threats. This phenomenon can have a detrimental impact on an organisation’s overall cybersecurity posture.
The rapid advances in AI and the growing integration of AI-enabled tools and technologies within SOCs give rise to a compelling argument for the implementation of human-AI teaming within the SOC environment. Effective human-AI teams can leverage the distinct capabilities of both human analysts and AI systems, while overcoming the known challenges and limitations of each team member. AI-powered automation and orchestration can effectively handle well-known and routine scenarios without requiring human intervention, freeing up SOC analysts to focus on higher-level tasks requiring critical thinking and domain expertise. Moreover, AI-driven augmentation can empower SOC analysts with advanced insights through the correlation and analysis of extensive datasets, thereby expediting decision making and enhancing incident response capabilities. Considering the highly dynamic cybersecurity landscape, even seasoned domain experts might encounter novel and open-ended scenarios that present unique challenges devoid of clear-cut solutions. In such scenarios, human-AI collaboration emerges as a potential avenue, combining human intuition, contextual understanding, and expertise with the computational capabilities and data processing power of AI, to navigate complex situations and explore prospective solutions to open-ended problems. Human-AI teaming can be applied across all three tiers to enhance the entire spectrum of security operations as depicted in Figure 1.
This article presents our vision and proposal for harnessing the power of human-AI teaming to enhance the efficiency and effectiveness of SOC operations. Our proposal emphasises strategic utilisation of human-AI teaming, particularly human-AI collaboration, with the overarching goal of enhancing SOC functions, specifically by strengthening detection and response capabilities. Central to our proposal is the foundational concept of Situation Awareness (SA) and shared SA, often referred to as “common ground” [19, 21], which establishes a pivotal connection between the human analysts and AI systems. SA and shared SA are crucial factors that can significantly impact the overall effectiveness and efficiency of a human-AI team. Furthermore, we advocate for the implementation of flexible autonomy wherein the level of automation for a given task or function is dynamically adjusted over time. This adjustment can be managed either by the human expert, termed adaptable automation, or by the system itself, known as adaptive automation [40]. We believe this adaptability is essential to maintain optimal human-AI team performance in dynamic security environments. We also introduce our initial work on designing a conceptual framework for human-AI teaming, which we term the \(\mathcal {A}^2\mathcal {C}\) Framework. The framework supports three distinct modes of decision making: automated, augmented, and collaborative. In the automated mode, AI systems autonomously handle decision making without human intervention. In the augmented mode, the AI system defers decision making to the human expert but provides relevant information, insights, and recommendations. Conversely, in the collaborative mode, termed as collaborative exploration, the human expert addresses complex and uncertain problems by harnessing the capabilities of AI systems to engage in joint exploration and investigation. By strategically integrating these three modes of decision making, our aim is to support flexible autonomy in human-AI teams. This initial conceptualisation lays the groundwork for future development, providing a clear and compelling roadmap to transform this vision into a practical and impactful solution for addressing the persistent issue of alert fatigue in SOCs.
The rest of the article is organised as follows. Section 2 sets the stage with an overview of the SOC’s incident triage, analysis, and response function. Section 3 delves into the problem of alert fatigue, examining its underlying causes and potential impacts. Section 4 lays the groundwork for human-AI teaming by exploring foundational concepts crucial for its implementation. Section 5 introduces the \(\mathcal {A}^2\mathcal {C}\) Framework, wherein shared SA enables flexible autonomy in human-AI teams through automated, augmented, and collaborative decision making. This section also discusses how the framework can contribute to mitigating the problem of alert fatigue in SOC. Finally, Section 8 concludes the article with an outlook on future work.

2 Incident Triage, Analysis, and Response

One of the primary functions of the SOC is incident triage, analysis, and response—a comprehensive process comprising several crucial stages, including real-time alert monitoring and triage, in-depth analysis and investigation, incident containment and recovery, and coordination of incident response activities [33]. This process enables the SOC to swiftly identify, assess, and address security incidents, minimising their impact and protecting critical assets.
A modern SOC employs a range of tools to continuously monitor the organisation’s network, systems, and applications for security events. A security event is any observable occurrence, such as a firewall blocking a connection attempt, or the detection of a malware infection within a system. These events are pushed out to a central system such as the SIEM, which enriches, analyses, and correlates the collected event data to generate alerts. An alert is a technical notification that a particular event, or series of events, has occurred [33], such as multiple failed login attempts from a specific IP address within a short period of time. Intrusion Detection Systems (IDS), such as the Network IDS and the Host IDS, are additional sources of alerts. Triage analysis allows for swift assessment and analysis of these alerts to filter out genuine security incidents from false positives. In-depth analysis of validated alerts is undertaken to determine the root cause of the incident, identify affected systems, and assess the potential impacts of the incident. Based on the findings of the investigation, an appropriate response is developed and implemented. This may involve containment measures, such as isolating infected systems or restricting network access, to prevent further damage or loss. Remediation steps, such as patching vulnerabilities or implementing security controls, are undertaken to address the underlying causes of the incident. Once the incident has been resolved, a comprehensive incident report is documented, detailing the incident’s timeline, findings, actions taken, and lessons learned. This report serves as a valuable resource for future incident prevention and response. Finally, based on the findings from the incident analysis, security rules and configurations may be refined to improve detection accuracy and reduce false positives. Figure 2 shows a simplified representation of the main steps involved.
Fig. 2.
Fig. 2. Simplified representation of incident triage, analysis, and response.
In this section, we provide a succinct overview of the triage analysis and in-depth analysis steps as they are closely linked with the problem of alert fatigue.

2.1 Triage Analysis

Triage analysis is a fundamental Tier 1 process that serves as the initial step in the incident response process [34, 61]. Its primary goal is to quickly determine the criticality and legitimacy of each alert. As alerts get generated by the SIEM, or an equivalent system, they are prioritised and presented to the Tier 1 analysts for validation. Alert prioritisation is accomplished by assessing various criteria such as alert type, alert severity, the affected systems, the potential organisational impact, or specific contextual details linked with the alert. While the SIEM performs the initial prioritisation, analysts can either validate the alerts in the order predetermined by the system or exercise their own judgement to establish the validation sequence. Alert validation involves determining the legitimacy of alerts and distinguishing genuine security incidents from false positives. This involves actions such as cross-referencing alerts with other relevant events, reviewing system logs, analysing network traffic, and consulting threat intelligence sources to assess the validity and seriousness of the alerts. Certain alerts may exhibit complex patterns or indicators of more sophisticated threats that require further scrutiny. These “interesting” alerts are escalated to higher-tier analysts or specialised incident response teams for more in-depth investigations [30]. Conversely, swiftly identified false positives or inconsequential alerts undergo routine examination and closure.

2.2 In-depth Analysis

Incidents that are more complex and critical are escalated to Tier 2 analysts who conduct in-depth analysis and investigation [34, 61]. This stage involves two steps: alert prioritisation and alert investigation. The primary goal of alert prioritisation remains consistent: aiming to ensure that the most critical alerts are investigated first. Unlike Tier 1, where some incidents can be promptly resolved through automation processes, Tier 2 requires a higher level of human involvement due to the intricate nature of incidents and the need for comprehensive analysis. This stage relies on human intervention and expertise, harnessing the distinct capabilities of skilled analysts. Although some structured processes might continue to guide their approach, analysts must apply critical thinking, creative problem solving, and lateral reasoning to investigate and respond to these incidents effectively [51]. Moreover, the potential for collaboration between human analysts and AI becomes more pragmatic and advantageous within this tier. Here, the human expert can harness the capabilities of AI systems for collaborative exploration and investigation of novel security incidents [22, 26, 36].
While similarities exist between triage analysis at Tier 1 and more in-depth analysis at Tier 2, important distinctions set these tiers apart. At Tier 1, the emphasis is on swift validation and categorisation of alerts, enabling the SOC to efficiently handle a significant volume of low-to-medium complexity alerts. Resolutions at Tier 1 typically occur within seconds or minutes. In contrast, Tier 2 involves a comprehensive investigation process that can span from hours to even months [33]. This can be attributed to the escalating complexity of the digital landscape, coupled with the evolving threat landscape. In this context, even highly experienced security analysts encounter novel and open-ended situations that pose unique challenges without a clear and definitive solution. It should be noted that the volume and velocity of alerts at Tier 2 are relatively lower than those at Tier 1.

3 Alert Fatigue: Causes, Factors, and Impact

Alert fatigue, a growing challenge in the cybersecurity domain, occurs when SOC analysts become overwhelmed by the sheer volume of security alerts, leading to a diminished capacity to effectively identify and respond to genuine threats. In this section, we draw upon academic and industry research [2, 13, 34, 39] to discuss the primary causes of alert fatigue, the key factors that exacerbate it, and its impact on SOC analysts.

3.1 Primary Causes of Alert Fatigue

One of the key factors leading to alert fatigue and burnout is the escalating workload resulting from the high volume of security events generated by diverse SOC tools and systems [13, 57]. The rapid proliferation of interconnected devices, cloud-based applications, and the Internet of Things (IoT), along with the corresponding expansion of the attack surface, have led to a surge in cybersecurity events and a concomitant increase in alerts that SOCs must investigate. This incessant influx of alerts can lead to desensitisation, decreased attention to detail, and an increased risk of overlooking genuine threats.
Another significant factor contributing to alert fatigue is the high rate of false positives [2, 34, 39, 52]. False positives occur when an alert is triggered for an event that is not actually indicative of a security incident. Table 1 provides some concrete examples of SOC false alarm scenarios. When analysts are inundated with a large number of false-positive alerts, their time and attention get diverted from genuine security incidents, leading to frustration and reduced confidence in the alerting system. Constantly having to investigate and dismiss false positives not only hampers productivity but also increases the risk of overlooking true threats amidst the noise.
Table 1.
ScenarioExample (Severity)
Misconfigured Security RuleA firewall rule intended to block malicious traffic from a specific region accidently blocks all traffic from that region, impacting legitimate business partners (High).
IDS False PositiveAn IDS flags a large file transfer between two trusted servers as a potential data exfiltration attempt due to the size of the data, even though it is a scheduled software update (Medium).
Malware Detection False PositiveA malware detection tool identifies a legitimate system utility as malware due to a shared code snippet with a known malware sample (Low).
Phishing E-mail False PositiveAn e-mail security tool identifies an internal training e-mail containing a simulated phishing link as a real phishing attempt (Low).
Denial-of-Service (DoS) Attack False AlarmA network monitoring tool misinterprets a sudden surge in legitimate traffic (e.g., flash sale on a company website) as a DoS attack, triggering an unnecessary investigation (Medium).
Anomaly Detection False PositiveAn anomaly detection system flags a user who is downloading a large dataset for a data analysis project as suspicious due to their deviation from usual activity (Low).
Table 1. Examples of SOC False Alarm Scenarios

3.2 Factors Contributing to Alert Fatigue

Several contributing factors exacerbate the problem of alert fatigue, as discussed next.

3.2.1 People-Related Factors.

The escalating volume and sophistication of cyber threats necessitate around-the-clock monitoring, driving the demand for a larger cybersecurity workforce. However, understaffing is a prevalent issue [13, 57], leaving existing team members burdened with increased workloads and mounting pressure. This results in extended work hours, skipped breaks, and heightened fatigue, compromising their ability to effectively manage alerts.
Skills shortage [13, 52, 57] in the cybersecurity industry compounds the issue. While experienced security professionals can effectively prioritise, investigate, and respond to alerts, inexperienced analysts may struggle to differentiate between critical and non-critical alerts, resulting in a higher volume of irrelevant or false-positive alerts. Furthermore, strained staffing levels hinders continuous training and development, leaving analysts ill equipped to handle the ever-evolving cybersecurity landscape. High turnover rates further exacerbate this challenge, making it difficult to maintain consistent processes, implement effective automation, and foster a culture of collaboration.

3.2.2 Process-Related Factors.

Ineffective alert prioritisation contributes to the problem of alert fatigue. When the prioritisation process fails to effectively distinguish between false positives and genuine security incidents, analysts are burdened with an overwhelming influx of low-priority or irrelevant alerts [2, 52]. This flood of unimportant notifications consumes valuable time and attention, diverting resources away from addressing actual threats and increasing frustration among analysts. Poor alert prioritisation may result from a lack of clear prioritisation criteria and the poor use of automation.
Another contributing factor is the lack of automation and orchestration in SOC [13, 57, 62]. Many SOCs are still heavily reliant on manual processes, which can be time consuming and error prone. Without automation, SOC analysts may have to manually investigate and respond to a large number of alerts. This can be time consuming and tedious, leading to burnout and alert fatigue. Without orchestration, different security tools and systems may generate conflicting or redundant alerts. This can make it difficult to identify the truly critical alerts, resulting in alert fatigue.
SOCs often lack well-defined processes or playbooks for responding to cyber incidents [13, 52]. This can lead to confusion, delays, and ineffective responses. Without clear procedures for responding to alerts, SOC analysts may spend more time trying to figure out what to do than actually resolving the issue. This can lead to longer response times and increased frustration, which can contribute to alert fatigue.
Finally, there is often a silo mentality within SOCs, where different teams, such as security analysts, incident responders, and network operations, operate in isolation [13, 57]. This lack of collaboration can lead to communication problems and delays in responding to incidents. Security teams may not be aware of what incident response teams are doing, and vice versa. This can lead to duplication of effort and missed opportunities to prevent or mitigate an incident, contributing to alert fatigue.

3.2.3 Technological Factors.

A key technological contributing factor to alert fatigue is overly sensitive detection rules, which can generate a large number of false positives [52, 57]. This can divert analysts’ time and attention from genuine security incidents and overwhelm them, leading to frustration and reduced confidence in the alerting system. Constantly having to investigate and dismiss false positives not only hampers productivity but also increases the risk of overlooking true threats amidst the noise.
Lack of context contributes to alert fatigue in several ways. When analysts are not provided sufficient contextual information, they may not be able to understand the significance of an alert. Without proper context, it becomes difficult to assess the severity, relevance, and potential impact of an alert [13]. Analysts may struggle to determine whether an alert represents a genuine security incident or a benign event, leading to increased investigation time and effort. The absence of contextual details also hinders the ability to prioritise alerts accurately. Analysts may be inundated with a large volume of alerts, making it challenging to discern the most critical threats and allocate resources accordingly. Additionally, the lack of context impedes root-cause analysis and incident correlation.
Lack of correlation between alerts is another contributing factor to alert fatigue in SOC [13]. SOC analysts are often inundated with a high volume of alerts from various sources, each representing a potential security issue. Without proper correlation, these alerts remain isolated events, making it difficult to identify patterns or connections that might indicate a genuine threat. At the same time, the huge number of unrelated threats can overwhelm analysts, leading to alert fatigue.
SOC teams use a variety of security tools that generate alerts independently. Poor integration of these tools [52] can lead to alert fatigue in several ways. Firstly, when multiple tools are not integrated, they can generate redundant alerts, which can overwhelm SOC analysts [13]. Secondly, the lack of integration can make it difficult to correlate alerts from different tools, hindering the identification of genuine threats. Finally, the lack of integration can make information sharing among analysts tedious [57], leading to duplicated efforts and inefficiencies.

3.3 Impact of Alert Fatigue

Several studies from academia and industry have investigated the impacts of alert fatigue on SOC analysts [2, 34, 39, 56]. Most notably, the recent global study conducted by Trend Micro [39] has revealed concerning statistics regarding SOC teams’ challenges with managing alerts. The study found that 54% of SOC teams feel overwhelmed by the sheer volume of alerts they receive, leading to a state of alert fatigue. Additionally, 55% lack confidence in their ability to prioritise and respond to these alerts effectively. The study also found that security experts are spending a significant portion (27%) of their time dealing with false positives, which is a considerable drain on their resources. Participants in the study made several admissions that highlight the struggle faced by SOC teams. Approximately 40% of the alerts were entirely disregarded, indicating a significant challenge in addressing all incoming notifications. Additionally, alerts were turned off 43% of the time, suggesting a level of frustration or inability to handle the alert load. Almost half of the alerts (49%) were presumed to be false positives, potentially resulting in missed genuine security incidents. Furthermore, 50% of the time, team members relied on others to handle alerts, indicating a lack of ownership or accountability within the team. The findings from other industry surveys such as the 2022 Devo SOC Performance Report [52], the 2023 SANS SOC Survey [13], and the 2023 Voice of the SOC Report [57] align with these findings, reporting similar trends.

4 Human-AI Teaming: Overview and Foundational Concepts

In recent times, there has been a growing emphasis on human-AI teaming [22, 26, 45], also referred to as hybrid intelligence [1], particularly in the cybersecurity domain [22, 36]. This approach involves humans and AI working together, leveraging their complementary strengths and compensating for their weaknesses. A human-AI team is defined as “one or more people and one or more AI systems requiring collaboration and coordination to achieve successful task completion” [14]. By combining the unique capabilities of both humans and AI, this collaborative approach aims to achieve goals that may be beyond the reach of either party individually. Various methods exist for integrating humans and AI into teams, including humans overseeing an AI system that functions as a helper, humans collaborating with AI systems as equal teammates, and AI systems overseeing and acting as a limiter of human performance [20]. Irrespective of the type of teaming, the success of human-AI teams depends on humans’ ability to comprehend and anticipate AI system behaviours, establish suitable trust relationships with AI systems, make precise decisions based on AI system input, and exercise timely and appropriate control over the system [40]. This underscores the essential requirement for human oversight of AI systems. Likewise, AI systems should have the capability to maintain goal and task alignment with human team members, share updates regarding functional assignments and task progress, and anticipate human needs and offer support as required. In this section, we provide a concise overview of essential concepts related to human-AI teaming.

4.1 Core Functions in the Decision-Making Process

Human information processing involves four essential functions: information acquisition, information analysis, decision and action selection, and action implementation [44]. Pirelli’s popular model for intelligence analysis [47] organises intelligence analysis into two key activities: (i) foraging, which involves identifying information sources, searching and filtering, and extracting information, and (ii) sensemaking, which involves understanding, reasoning, and making inferences from the information gathered. Foraging is the initial step in information acquisition, whereas sensemaking plays a crucial role during information analysis. The insights gained from sensemaking inform the decision-making process. Within each of these functions, AI can play a role in supporting human capabilities, with varying degrees of involvement and impact. In the SOC context, these functions guide the process of understanding potential threats, evaluating their significance, deciding on countermeasures, and executing responses to strengthen security, thereby contributing directly to incident triage, analysis, and response.

4.2 Types of Decision-Making Tasks in Human-AI Teams

There are three distinct types of decision-making tasks as defined by Puranam [49]. The first type is tasks where AI equals or outperforms humans. An example is the automated identification of malicious patterns in network traffic, which is a challenging task for human perception. The second type is tasks where humans outperform AI. For instance, assessing the intent behind a sophisticated phishing e-mail or evaluating the ethical implications of a security breach requires the nuanced judgement and contextual understanding of human experts. The third type is tasks where there is no clear superiority of humans or AI individually, but the combination can outperform either alone. For instance, aggregating insights from AI-driven threat detection systems and human analysts’ contextual understanding can lead to more accurate and comprehensive threat assessments, showcasing the unique advantage of collaborative decision making in cybersecurity’s dynamic and evolving landscape. In the context of human-AI collaboration, a fourth type of decision-making task can be considered: tasks that neither humans nor AI can independently complete, highlighting the need for their combined capabilities.

4.3 Human-AI Complementarity in Decision Making

The complementarity of humans and AI is most pronounced in decision-making situations characterised by uncertainty, complexity, and equivocality [28]. Cybersecurity is one such domain. Uncertainty is characterised by the lack of information, whereas complexity is characterised by an abundance of variables demanding the processing of volumes of data that is beyond the cognitive capabilities of even the smartest human decision makers. Equivocality refers to the presence of several simultaneous but divergent interpretations within a decision domain. In such situations, the analytical decision making of AI complements the intuitive decision making of humans. For instance, in moments of uncertainty, humans can make swift, intuitive decisions, whereas AI offers the advantage of delivering real-time, data-driven insights. Furthermore, humans can identify key data sources, guiding AI’s data collection efforts, after which AI can collect, curate, process, and analyse the collected data.

4.4 Individual and Shared Mental Models

For humans to engage effectively with the world, they need an internal representation of the world. Mental models are cognitive constructs that facilitate this by providing organised structures for generating descriptions of a system’s purpose and form, explaining its functioning and observed states, and predicting its future states [3, 53]. Within a team context, the mental models revolve around the team itself, with team members directing their cognitive effort towards taskwork and teamwork [8, 54]. Taskwork refers to the specific set of activities, skills, and knowledge that are essential for successfully completing the task and responsibilities within a particular job or domain. This includes understanding the operational procedures, capabilities, and limitations of tools and technology, as well as task-specific procedures, strategies, and constraints. It also involves being prepared for potential contingencies and scenarios that might arise, ensuring that individuals are equipped to handle a range of situations and challenges related to their role [8]. Conversely, teamwork refers to an interrelated set of knowledge, skills, and attitudes that facilitates the collaborative functioning of teams in a synchronised and adaptable manner. It includes an understanding of team members’ roles, responsibilities, mutual dependencies, interaction dynamics, communication modes, and information flows [8]. When team members’ individual mental models align (i.e., when they have a similar understanding of their shared tasks and their individual roles in it), the resulting shared mental model enables improved team performance by facilitating more precise prediction of teammates’ needs and behaviours [3].

4.5 Individual, Team, and Shared SA

SA refers to an individual’s understanding of their current environment, including the elements, events, and dynamics within it. It involves “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future” [18]. Team SA is defined as “the degree to which every team member has the SA required for his or her responsibilities” [18]. Shared SA denotes the “degree to which team members possess the same SA on shared SA requirements” [18], where the SA requirements refer to the information needed to support SA. Shared SA is critical for team performance, and shared mental models contribute to the development of shared SA. To achieve successful collaboration, both humans and AI systems must continuously maintain and update their awareness of the ever-changing world around them, their own capabilities, and the actions of their teammates. This ongoing process of updating and aligning individual and shared SA allows the human-AI team to adapt swiftly to dynamic situations and make well-informed decisions collectively.

4.6 Flexible Autonomy

The way humans and AI interact in a team significantly impacts their collective performance. Flexible autonomy is a key feature in human-AI teams [40] that empowers both humans and AI to dynamically adjust their control and decision authority based on the context, situation, and specific operational requirements. This adaptability can be realised in two ways: adaptive automation and adaptable automation [40]. Adaptive automation empowers the AI system to autonomously adjust its level of control and decision authority based on factors such as time, human performance, team state, or other pre-defined criteria of team effectiveness. In contrast, adaptable automation allows humans to adjust the AI system’s involvement on the fly.

5 \(\mathcal {A}^2\mathcal {C}\) Framework: Conceptualisation of Human-AI Teaming

Building upon the foundational concepts of incident triage, analysis, and response in Section 2, the exploration of key factors contributing to alert fatigue in Section 3, and human-AI teaming concepts in Section 4, this section introduces the \(\mathcal {A}^2\mathcal {C}\) Framework as a conceptual model for human-AI teaming and collaboration. As illustrated in Figure 3, the framework incorporates SA and shared SA to support flexible autonomy through various decision-making modes. These modes can be applied to different domain-specific tasks such as incident triage, analysis, and response in the SOC.
Fig. 3.
Fig. 3. \(\mathcal {A}^2\mathcal {C}\) Framework: conceptual model for human-AI teaming and collaboration.

5.1 SA and Shared SA

For human-AI teams to excel, both individual SA and shared SA play crucial roles: the former ensuring effective individual performance, and the latter fuelling superior team coordination [21]. Development of shared SA requires several key situation models, as depicted in Figure 3. These models include self-awareness, teammate awareness, and world awareness.

5.1.1 Self-Awareness.

Humans must possess meta-awareness of their own capabilities, including awareness of how fatigue, excessive workload, and training gaps impact their abilities. Similarly, AI needs to develop self-awareness of its own strengths and limitations, and use this to determine proactive handovers to human experts when confidence is low or the risk is too high. When humans are cognisant of their own limitations, they can proactively seek assistance from colleagues or AI systems and avoid making decisions independently. Conversely, when AI systems are aware of their own limitations, they can appropriately flag alerts that necessitate further human scrutiny. This approach ensures that critical decisions are not solely reliant on automated processes, emphasising the importance of human involvement in nuanced and complex situations.

5.1.2 Teammate Awareness.

Effective teaming between humans and AI systems hinges on a deep understanding of the capabilities and limitations of both human analysts and AI systems. While self-awareness is crucial for each entity to recognise its own strengths and weaknesses, mutual awareness takes this concept a step further, fostering a collaborative environment where both parties are cognisant of each other’s expertise and limitations. For humans, teammate awareness empowers them to effectively delegate tasks, seek assistance from the AI when appropriate, and interpret the AI’s recommendations with a nuanced understanding of its underlying rationale. Conversely, AI systems equipped with an awareness of the human’s state, including their level of expertise, cognitive load, emotional state, and potential biases, can dynamically adapt their interactions and recommendations to the individual’s current capacity and perspective, ensuring that human judgement is appropriately considered in decision-making processes.

5.1.3 World Awareness.

Just as humans and AI require high levels of self-awareness and teammate awareness to facilitate effective decision making as a team, they must also develop and maintain a comprehensive situation model of the world to underpin their individual decision-making processes. This situation model should encompass current goals, functional assignments, plans, task statuses, and the states and modes of human and AI teammates involved in the work. As roles and responsibilities dynamically transition between human and AI teammates, actively maintaining this model becomes crucial for ensuring the efficacy of human-AI collaboration.

5.2 Flexible Autonomy

Human-AI teams can tackle cognitive tasks in a number of different ways, some of which are illustrated in Figure 3 and discussed briefly in the following. Flexible autonomy empowers both humans and AI to dynamically switch between these decision-making modes by adjusting their control and decision authority based on the context, situation, and specific operational requirements. Shared SA provides the foundation for these dynamic adjustments, playing a crucial role in optimising human-AI team performance across diverse and dynamic scenarios. In doing so, it serves as a linchpin in mitigating alert fatigue.

5.2.1 Full Automation.

In fully automated decision making, decisions are exclusively determined by the AI system without direct human intervention. This approach presents several advantages, including speed, efficiency, and consistency, particularly for routine or well-understood scenarios characterised by clear patterns and well-established decision-making criteria. This ensures that the AI can make accurate decisions with a high degree of confidence, minimising the consequences of potential errors. However, it is important to consider the appropriateness of this approach, as its effectiveness is contingent on the predictability and clarity of the decision-making environment.

5.2.2 Selective Deferral.

Selective deferral is a type of decision-making process employed by AI systems to dynamically handover decision-making responsibility to human experts, contingent upon the level of uncertainty or complexity within a given situation. In this approach, the AI system assesses the available information and evaluates its confidence in rendering a decision. Should the AI system possess a high degree of confidence, it autonomously proceeds with decision making. Conversely, in instances of low confidence or when confronted with complexities surpassing its capabilities, the AI defers the decision to a human expert. In doing so, the AI may provide recommendations or contextual information to aid the expert’s decision making (augmented deferral). Alternatively, the AI may defer without recommendations or additional context, aiming to avoid undue influence on the expert’s decision (passive deferral). This dual deferral strategy enables adaptive and collaborative decision making tailored to the intricacies of diverse scenarios.

5.2.3 Collaborative Exploration.

Collaborative exploration is a dynamic decision-making paradigm designed for scenarios requiring joint efforts from both humans and AI systems to investigate and resolve complex problems. This approach recognises that even experienced professionals may encounter novel or previously unseen situations, necessitating additional information and expertise. By harnessing the strengths of both humans and AI, collaborative exploration seeks to enhance the accuracy and efficiency of decision-making processes.
The typical unfolding of this approach is as follows. When an expert encounters a situation characterised by uncertainty or lacking sufficient context, they initiate collaborative exploration with the AI system. With the collective knowledge acquired through collaborative exploration, the expert formulates well-informed decisions and, when deemed necessary, determines an appropriate course of action to effectively resolve the problem. Additionally, we can also envisage a scenario where the AI system triggers collaborative exploration based on its analysis, proactively engaging human expertise to navigate unforeseen challenges.
A defining feature of collaborative exploration is the nuanced interaction dynamics between the human and the AI, setting it apart from both automation and selective deferral. While automation involves no interaction, and selective deferral involves a one-time transfer of responsibility, collaborative exploration fosters a more interactive dialogue.

5.2.4 Ensemble Decision Making.

In scenarios where neither humans nor AI systems hold a distinct advantage, ensemble decision making combines outcomes from disparate decision makers tackling the same problem, enhancing overall accuracy and robustness. Unlike selective deferral’s reliance on handoffs and collaborative exploration’s emphasis on interactive dialogue, ensemble decision making harnesses independent analyses from multiple team members. This aggregation of diverse perspectives, often achieved through methods like weighted averaging, results in more resilient and precise final decisions.

5.3 Domain-Specific Tasks

In principle, the \(\mathcal {A}^2\mathcal {C}\) Framework can support collaborative decision-making processes involving humans and AI across diverse domains. This section, however, delves into its specific application in the SOC domain, focusing on incident triage, analysis, and response. We discuss how the framework holds promise in alleviating analyst workload and addressing the persistent challenge of alert fatigue through enhanced decision-making processes.
Figure 4 illustrates how the \(\mathcal {A}^2\mathcal {C}\) Framework can be applied to triage analysis and in-depth analysis. The process begins with the generation and queuing of alerts by the SIEM or equivalent system. The AI conducts a preliminary assessment, swiftly categorising routine alerts that do not require human input and forwarding them to the automated validation queue. Within this queue, the AI autonomously determines whether alerts are false positives or genuine alerts demanding in-depth investigation. In cases where the AI encounters less routine and more intricate alerts and lacks confidence in decision making, it defers to human experts, directing the alert to the expert validation queue. SOC analysts analyse alerts in this queue, deciding whether to take no further action because the alert is a false-positive or benign alert, or to conduct a thorough investigation. Novel or previously unexplored alerts may prompt collaborative exploration, where the expert and AI work together to find a resolution. Similarly, in the in-depth analysis stage, the expert and AI engage in collaborative exploration to find a resolution.
Fig. 4.
Fig. 4. An application of the \(\mathcal {A}^2\mathcal {C}\) Framework for alert prioritisation and alert validation in the SOC.
The effectiveness of this workflow relies on SA and shared SA to facilitate dynamic role transitions between the AI and SOC analysts to maintain optimal human-AI performance. The self-awareness of the AI system is crucial, enabling it to defer to the expert when uncertain. During deferral and collaborative exploration, the AI must determine the relevant information to provide to an analyst, aiding efficient alert validation. Furthermore, it must ensure the contextual relevance of the information being shared with the analyst. The AI could do this by leveraging teammate awareness, such as by considering the analyst’s existing knowledge, informational gaps, habitual sources of information consulted by the analyst or the SOC team, and the specific relevance of new information to validating an alert. By seamlessly transitioning between automated AI validation, expert validation, and collaborative validation, the human-AI team can optimally balance decision-making responsibilities, which in turn helps reduce the cognitive load on SOC analysts.
In addition to the segregation of alerts into the automated validation queue and the expert validation queue, human-AI collaboration plays a crucial role in enhancing the prioritisation of alerts within both queues, thereby contributing to the reduction of alert fatigue. By leveraging shared SA, the AI can factor in real-time information about the overall security landscape, ongoing incidents, the current workload of SOC analysts, and their expertise. This intelligent integration of information enables the AI to prioritise and allocate alerts to SOC analysts more effectively, ensuring that the most critical issues are addressed promptly and in alignment with the team’s capabilities.

5.3.1 Illustrative Example: A Financial Company under Phishing Attack.

As a concrete illustrative example, consider the scenario where a financial services company faces a multi-layered phishing attack targeting its employees. Their SIEM detects suspicious activity and generates alerts categorised based on their complexity, which are queued for preliminary assessment by the AI:
Automated validation: When the AI encounters e-mails containing malicious URLs, it leverages pre-defined threat intelligence and real-time analysis to automatically quarantine these e-mails and generate low-complexity alerts. This immediate action neutralises the immediate threat without requiring analyst intervention, freeing them to focus on more complex investigations.
Augmented deferral: Some e-mails have subtle inconsistencies compared to legitimate sources, such as typos or grammatical errors, slight variations in the domain name, or unusual subdomains. Recognising its limitations in interpreting such inconsistencies, the AI highlights them and defers to the analyst for further investigation. The analyst then meticulously examines the inconsistencies and verifies legitimacy through internal directories or direct contact with the supposed sender before reaching a decision. By highlighting inconsistencies and providing relevant context, augmented deferral empowers analysts to make faster and more informed decisions, ultimately reducing alert fatigue.
Collaborative exploration: Some e-mails are highly targeted and employ sophisticated social engineering techniques. These e-mails pose a significant challenge, as they leverage specific details like internal information, projects, or colleagues to appear trustworthy. They might also utilise manipulative tactics like urgency, personalisation, or deceptive attachments to bypass caution. Due to the complexity and potential for manipulation, both the analyst and the AI have limitations in definitively judging the e-mail’s authenticity. To address these complexities, the analyst and the AI collaborate by combining their strengths to explore the situation and develop effective responses as follows:
Analyse attack sophistication: Examine e-mail content, communication patterns, and potential information leaks, to understand the attacker’s method and level of effort.
Identify potential targets and weaknesses: Analyse targeted executives and potential information leaks to identify vulnerable individuals and areas within the organisation requiring attention.
Develop comprehensive response: Create an effective response plan based on the findings, including adjustments to internal security protocols, enhanced monitoring of targeted individuals, and security/user awareness campaigns.
This simple example illustrates how the \(\mathcal {A}^2\mathcal {C}\) Framework can empower SOC teams to combat alert fatigue by leveraging the combined strength of human-AI collaboration. Through its three decision-making modes—automation, selective deferral, and collaborative exploration—the framework effectively prioritises and streamlines the alert handling process.

6 \(\mathcal {A}^2\mathcal {C}\) Framework: Concept Realisation

In the journey towards realisation of the \(\mathcal {A}^2\mathcal {C}\) Framework, this section serves as a research roadmap providing concrete directions that pave the way for the practical implementation of the framework’s foundational principles.

6.1 Realising SA and Shared SA

To achieve SA and shared SA, and to support flexible autonomy, it is important to identify the information needed to support these cognitive processes [21]. In the context of SOC, such information enables the AI to play a pivotal role in helping the human-AI team to collectively understand and respond to alerts. For example, when performing augmented deferral or collaborative exploration, the AI must determine the relevant information to provide to an analyst to improve the team’s world awareness and help the team validate alerts effectively and efficiently. Furthermore, the AI must ensure the information is contextually relevant to the analyst, for example, by utilising teammate awareness. The AI could do this by considering the analyst’s existing knowledge, informational gaps, habitual sources of information consulted by the analyst or the SOC team, and the specific relevance of new information to validating an alert.
Imitation learning [68], a subset of Machine Learning (ML) where algorithms learn to mimic complex human behaviours, presents an opportunity in this context. Analysts consult numerous data sources, including logs, user behaviour, and asset information, to validate alerts. Techniques such as Inverse Reinforcement Learning (IRL) [5] could be used to learn these expert patterns. The focus here is not just on identifying relevant information but on ensuring that the AI and the human analyst are aligned regarding their significance and implications of an alert. For example, when engaging in collaborative exploration, the AI could leverage its understanding of which pieces of information are consulted by other analysts in similar situations [17]. These proposals from the AI could guide the analysts and enable them in turn to guide the AI by selecting which information to consult from the proposed set. Reinforcement Learning with Human Feedback (RLHF) [24] could then be used to refine the AI’s understanding and recommendations, leading to even more effective collaboration. This iterative process would enable the human-AI team to narrow down on the required information and to establish a shared SA around the alert, and validate it.

6.2 Realising Flexible Autonomy

Flexible autonomy can be realised within the \(\mathcal {A}^2\mathcal {C}\) Framework by strategically leveraging insights from concepts such as Learning to Reject (L2R) [27] and Learning to Defer (L2D) [37]. L2R, also known as rejection learning, is a concept proposed to address a key limitation in ML-based AI systems, wherein the ML models are unable to recognise their knowledge boundaries, resulting in overconfident errors. To address this limitation, L2R equips ML models with a rejector option, which enables them to withhold predictions when faced with uncertain inputs or those outside the boundaries of their training data. L2D builds on the L2R concept by pairing an ML model with an external (human) expert, allowing it to defer complex or uncertain decisions downstream. Adaptive L2D further extends the L2D concept, enabling the AI model to make a decision even when uncertain, if the human expert’s judgement is biased or highly inaccurate [38]. Keswani et al. [31] extend L2D to settings where multiple experts are available. Learning to Complement (L2C) [65] is a variation of adaptive L2D where the goal is to focus ML-based AI on problem instances that are difficult for humans while seeking human input for instances that are difficult for the AI.
L2R, L2D, and L2C, and their subsequent adaptations represent a step towards instilling self-awareness and teammate awareness in ML-based AI systems, and the insights gained from them can be leveraged towards advancing the realisation of flexible autonomy within the \(\mathcal {A}^2\mathcal {C}\) Framework, ultimately enhancing task allocation between humans and AI while mitigating alert fatigue.

6.3 Realising Collaborative Exploration

The seminal paper on intelligence analysis by Pirolli and Card [47] organises it into two major activities: foraging and sensemaking. Foraging is a process that involves data gathering from the external environment; sensemaking is a process wherein schematised evidence is connected to hypotheses to inform decision making. Large Language Models (LLMs) are increasingly being used for tackling complex information analysis in diverse domains, ranging from scientific research to creative writing [55], often through conversational interactions between human experts and LLM-powered agents [48]. A recent study by Microsoft’s AI and Productive research team [7] indicates that LLM-powered productivity tools substantially increase productivity on some of the common tasks performed by enterprise workers, including in enterprise security operations [7].
There is considerable promise in exploring the integration of LLM-powered agents for collaborative exploration. Their capability to engage in intelligent conversations, adapt to specific tasks, and demonstrate proficiency in solving complex challenges when broken into simpler subtasks positions them as valuable contributors to dynamic collaborations [67]. Emerging open source frameworks such as AutoGen [66] enable seamless collaboration between LLMs, tools, humans, and any combination thereof, making it easy to rapidly prototype and evaluate innovative approaches.

7 \(\mathcal {A}^2\mathcal {C}\) Framework: Additional Considerations

While the \(\mathcal {A}^2\mathcal {C}\) Framework establishes a solid foundation for human-AI teaming by addressing key factors like SA and shared SA (self-awareness, teammate awareness, and world awareness), and human-AI team interaction (including levels of automation, flexible autonomy, and granularity of control), achieving optimal collaboration requires further consideration of additional factors beyond the scope of this work. Next, we briefly discuss the importance of trust, bias, and verification in this context.

7.1 Trust in Human-AI Teams

Trust is the foundation of effective collaboration in human-AI teams. It is built upon factors such as transparency, explainability, and responsivity. Transparency ensures that the human teammate has a clear understanding of the AI system’s current state, behaviour, recommendations, and predictability of its planned actions, outcomes, and associated uncertainties [10]. However, trust goes beyond simply knowing the ‘what’ and including the ‘why’ behind it. This is where explainability comes into play. The AI system must be able to provide clear, accurate, and efficient explanations for its recommendations, decisions, and actions, tailored specifically to the human teammate. These explanations should consider the user’s level of experience and expertise, the specific contex of the situation, and the complexity of the task at hand [9]. By presenting explanations that align with human cognitive limitations and preferences, the AI fosters a deeper understanding and strengthens the foundations of trust. Finally, responsivity [11] takes trust to the next level. It signifies the AI system’s ability to effectively adapt to its human teammate and the dynamic environment they navigate together. A responsive AI can adjust its behaviour based on cues from the human, such as requests for explanations or signals of trust or doubt. This two-way interaction fosters a sense of partnership and strengthens the foundation of trust between human and AI.

7.2 Bias in Human-AI Teams

Another crucial consideration in human-AI teams is decision bias. This refers to a systematic tendency to favour information or decisions that may not necessarily be the most rational [40]. There are several well-known human decision biases that can impact effective human-AI teaming, including anchoring bias [59], availability bias [59], confirmation bias [41], framing bias [60], knowledge illusion [35], and surrogation [12]. Anchoring bias leads us to heavily rely on the first piece of information when making a decision, whereas availability bias causes us to overestimate the likelihood of easily recalled events. Confirmation bias refers to human tendency to seek out information that confirms one’s preconceptions, views, and expectations. Framing bias refers to how our decisions towards potential rewards or risks are influenced by the way in which information is presented. Finally, surrogation refers to focusing too much on the means rather than the end goal. These biases, such as anchoring and confirmation bias, can significantly hinder effective human-AI teaming by influencing how analysts interpret information and make decisions based on AI outputs.
While humans are susceptible to decision bias, AI systems can also exhibit biased outputs. This is because AI models are trained on data, and if that data is limited [63], contains errors in labelling [42], or reflects underlying social biases, the resulting AI system can perpetuate those biases in its decision making. Additionally, the subjective choices made during algorithm selection and parameter tuning can introduce bias into the AI system [15], as can the incorrect interpretation of results produced by them [59].
The true challenge lies in the interconnected nature of human and AI bias within human-AI teams. Human biases can influence AI development, and AI biases, in turn, can reinforce human biases, creating a negative feedback loop that hinders performance. To address this challenge, it is crucial to develop and implement mitigating strategies throughout the AI development lifecycle and human-AI interaction.

7.3 Verification and Validation in Human-AI Teams

Mitigating the interconnected nature of human and AI bias within human-AI teams necessitates robust verification techniques to enable human teammates’ critical assessment of the AI’s outputs and foster trust. The lack of critical evaluation leads to a cascade of problems including reduced trust in the AI, perpetuation of existing biases, and, ultimately, a decline in the effectiveness of human-AI collaboration. To address this problem, a paradigm shift towards assured autonomy [58] is necessary. This requires moving away from the traditional linear approach of “design, verify, and deploy” towards a more iterative approach aligned with assured autonomy principles. This involves design and initial verification before deployment, followed by continuous verification and adaptation over time.
While a paradigm shift towards assured autonomy is crucial for human-AI teams, re-evaluating the verification techniques themselves is equally important. The non-deterministic nature of AI systems, where slight variations in the input can yield different outputs, challenges the conventional techniques used for testing traditional software, and gives rise to the oracle problem [64], highlighting the difficulty in establishing a definitive ground truth for verification purposes in AI systems. Current verification approaches (benchmarking,1 expert panels,2 metamorphic testing3) as outlined in the work of Kaur et al. [29] offer valuable insight but may not be sufficient for the complexities of human-AI collaboration. These techniques struggle to capture the complexities of real-world situations and the ongoing evolution of AI behaviour. To address this, there is a need for more comprehensive verification that can continuously assess AI performance in real-world contexts and account for its dynamic nature.

8 Conclusion

This article envisions a future in which human-AI teaming and collaboration in SOCs not only optimises operational efficiency but also significantly alleviates the cognitive load on analysts. As a result, this collaborative approach is expected to effectively mitigate alert fatigue, ensuring a more streamlined and responsive security environment. The proposed \(\mathcal {A}^2\mathcal {C}\) Framework empowers flexible autonomy in human-AI teams through shared SA, enabling seamless navigation between three key decision-making modes: automated, augmented (through selective deferral), and collaborative exploration. Automation handles routine tasks, selective deferral allows for expert analysis of complex cases, and collaborative exploration tackles novel threats. Collectively, these three decision-making modes, and potentially others, can help optimise incident handling. By doing so, they effectively reduce the cognitive burden on analysts, alleviating the subsequent alert fatigue and ultimately enhancing overall security.
Looking ahead, we aim to develop and evaluate a prototype system built upon the \(\mathcal {A}^2\mathcal {C}\) Framework. This practical implementation will allow us to delve into the intricate dynamics of human-AI teaming within the SOC environment. Through this, we can effectively address existing challenges, refine the framework’s functionalities, and optimise its potential for incident handling. While our primary focus lies in incident triage, analysis, and response, the transformative potential of human-AI collaboration extends beyond this scope. By leveraging human-AI collaboration across other SOC functions and broader security operations, this approach has the potential to significantly enhance cyber defence capabilities, including proactive threat detection and investigation.

Footnotes

1
Benchmarking involves evaluating the AI’s performance against established datasets to assess its accuracy and generalisability.
2
Expert panel comparison involves comparing the AI’s decisions on carefully designed test cases to those of human experts, to assess alignment with human judgement.
3
Metamorphic testing involves analysing the AI’s output when the input is slightly altered, to assess its robustness to variations in real-world data.

References

[1]
Zeynep Akata, Dan Balliet, Maarten de Rijke, Frank Dignum, Virginia Dignum, Guszti Eiben, Antske Fokkens, Davide Grossi, Koen Hindriks, Holger Hoos, Hayley Hung, Catholijn Jonker, Christof Monz, Mark Neerincx, Frans Oliehoek, Henry Prakken, Stefan Schlobach, Linda van der Gaag, Frank van Harmelen, Herke van Hoof, Birna van Riemsdijk, Aimee van Wynsberghe, Rineke Verbrugge, Bart Verheij, Piek Vossen, and Max Welling. 2020. A research agenda for hybrid intelligence: Augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. Computer 53, 8 (2020), 18–28.
[2]
Bushra A. Alahmadi, Louise Axon, and Ivan Martinovic. 2022. 99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms. In Proceedings of the 31st USENIX Security Symposium (USENIX Security’22). 2783–2800.
[3]
Robert W. Andrews, J. Mason Lilly, Divya Srivastava, and Karen M. Feigh. 2023. The role of shared mental models in human-AI teams: A theoretical review. Theoretical Issues in Ergonomics Science 24, 2 (2023), 129–175.
[4]
Asad Arfeen, Saad Ahmed, Muhammad Asim Khan, and Syed Faraz Ali Jafri. 2021. Endpoint detection & response: A malware identification solution. In Proceedings of the 2021 International Conference on Cyber Warfare and Security (ICCWS’21). IEEE, 1–8.
[5]
Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence 297 (2021), 103500.
[6]
Robert A. Bridges, Ashley E. Rice, Sean Oesch, Jeffrey A. Nichols, Cory Watson, Kevin Spakes, Savannah Norem, Mike Huettel, Brian Jewell, Brian Weber, Connor Gannon, Olivia Bizovi, Samuel C. Hollifield, and Samantha Erwin. 2023. Testing SOAR tools in use. Computers & Security 129 (2023), 103201.
[7]
Alexia Cambon, Brent Hecht, Ben Edelman, Donald Ngwe, Sonia Jaffe, Amy Heger, Mihaela Vorvoreanu, Sida Peng, Jake Hofman, Alex Farach, Margarita Bermejo-Cano, Eric Knudsen, James Bono, Hardik Sanghavi, Sofia Spatharioti, David Rosthschild, Daniel G. Goldstein, Eirini Kalliamvakou, Peter Cihon, Mert Demirer, Michael Schwarz, and Jaime Teevan. 2023. Early LLM-Based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity. Microsoft.
[8]
Janis A. Cannon-Bowers, Eduardo Salas, and Sharolyn Converse. 1993. Shared mental models in expert team decision making. In Individual and Group Decision Making: Current Issues, N. M. Castellan, Jr. (Ed.). Lawrence Erlbaum Associates, 221–246.
[9]
Tathagata Chakraborti, Sarath Sreedharan, Yu Zhang, and Subbarao Kambhampati. 2017. Plan explanations as model reconciliation: Moving beyond explanation as soliloquy. arXiv preprint arXiv:1701.08317 (2017).
[10]
Jessie Y. Chen, Katelyn Procci, Michael Boyce, Julia Wright, Andre Garcia, and Michael Barnes. 2014. Situation Awareness-Based Agent Transparency. ARL-TR-6905. U.S. Army Research Laboratory.
[11]
Erin K. Chiou and John D. Lee. 2023. Trusting automation: Designing for responsivity and resilience. Human Factors 65, 1 (2023), 137–165.
[12]
Jongwoon Choi, Gary W. Hecht, and William B. Tayler. 2012. Lost in translation: The effects of incentive compensation on strategy surrogation. Accounting Review 87, 4 (2012), 1135–1163.
[13]
Chris Crowley, Barbara Filkins, and John Pescatore. 2023. SANS 2023 SOC Survey. White Paper. Escal Institute of Advanced Technologies (SANS Institute). https://www.sans.org/white-papers/2023-sans-soc-survey/
[14]
Haydee M. Cuevas, Stephen M. Fiore, Barrett S. Caldwell, and Laura Strater. 2007. Augmenting team cognition in human-automation teams performing in complex operational environments. Aviation, Space, and Environmental Medicine 78, 5 (2007), B63–B70.
[15]
Mary L. Cummings and Songpo Li. 2021. Subjectivity in the creation of machine learning models. ACM Journal of Data and Information Quality 13, 2 (2021), 1–19.
[16]
Statista Research Department. 2023. Size of Cyber Security Market Worldwide from 2019 to 2030. Retrieved July 23, 2023 from https://www.statista.com/statistics/1256346/worldwide-cyber-security-market-revenues/
[17]
Upol Ehsan, Q. Vera Liao, Michael Muller, Mark O. Riedl, and Justin D. Weisz. 2021. Expanding explainability: Towards social transparency in ai systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–19.
[18]
Mica R. Endsley. 1988. Design and evaluation for situation awareness enhancement. In Proceedings of the Human Factors Society Annual Meeting, Vol. 32. SAGE Publications, Los Angeles, CA, 97–101.
[19]
Mica R. Endsley. 1995. Toward a theory of situation awareness in dynamic systems. Human Factors 37, 1 (1995), 32–64.
[20]
Mica R. Endsley. 2017. From here to autonomy: Lessons learned from human–automation research. Human Factors 59, 1 (2017), 5–27.
[21]
Mica R. Endsley. 2023. Supporting Human-AI teams: Transparency, explainability, and situation awareness. Computers in Human Behavior 140 (March2023), 107574.
[22]
Steven R. Gomez, Vincent Mancuso, and Diane Staheli. 2019. Considerations for human-machine teaming in cybersecurity. In Augmented Cognition. Lecture Notes in Computer Science, Vol. 11580. Springer, 153–168.
[23]
Gustavo González-Granadillo, Susana González-Zarzosa, and Rodrigo Diaz. 2021. Security Information and Event Management (SIEM): Analysis, trends, and usage in critical infrastructures. Sensors 21, 14 (2021), 4759.
[24]
Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea L. Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. Advances in Neural Information Processing Systems 26 (2013), 1–9.
[25]
Wajih Ul Hassan, Adam Bates, and Daniel Marino. 2020. Tactical provenance analysis for endpoint detection and response systems. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP’20). IEEE, 1172–1189.
[26]
Allyson I. Hauptman, Beau G. Schelble, Nathan J. McNeese, and Kapil Chalil Madathil. 2023. Adapt and overcome: Perceptions of adaptive autonomous agents for Human-AI teaming. Computers in Human Behavior 138, C (Jan.2023), 107451.
[27]
Kilian Hendrickx, Lorenzo Perini, Dries Van der Plas, Wannes Meert, and Jesse Davis. 2021. Machine learning with a reject option: A survey. arXiv abs/2107.11277 (2021).
[28]
Mohammad Hossein Jarrahi. 2018. Artificial intelligence and the future of work: Human-AI symbiosis in organizational decision making. Business Horizons 61, 4 (2018), 577–586.
[29]
Davinder Kaur, Suleyman Uslu, Kaley J. Rittichier, and Arjan Durresi. 2022. Trustworthy artificial intelligence: A review. ACM Computing Surveys 55, 2 (2022), 1–38.
[30]
Leon Kersten, Tom Mulders, Emmanuele Zambon, Chris Snijders, and Luca Allodi. 2023. ‘Give Me Structure’: Synthesis and evaluation of a (network) threat analysis process supporting Tier 1 investigations in a security operation center. In Proceedings of the 19th Symposium on Usable Privacy and Security (SOUPS’23). 97–111.
[31]
Vijay Keswani, Matthew Lease, and Krishnaram Kenthapadi. 2021. Towards unbiased and accurate deferral to multiple experts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 154–165.
[32]
Salman Khaliq, Zain Ul Abideen Tariq, and Ammar Masood. 2020. Role of user and entity behavior analytics in detecting insider attacks. In Proceedings of the 2020 International Conference on Cyber Warfare and Security (ICCWS’20). IEEE, 1–6.
[33]
K. Knerler, I. Parker, and C. Zimmerman. 2022. Eleven Strategies of a World-Class Cybersecurity Operations Center. MITRE.
[34]
Faris Bugra Kokulu, Ananta Soneji, Tiffany Bao, Yan Shoshitaishvili, Ziming Zhao, Adam Doupé, and Gail-Joon Ahn. 2019. Matched and mismatched SOCs: A qualitative study on security operations center issues. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1955–1970.
[35]
Justin Kruger and David Dunning. 1999. Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology 77, 6 (1999), 1121.
[36]
Celeste Lyn Paul, Leslie M. Blaha, Corey K. Fallon, Cleotilde Gonzalez, and Robert S. Gutzwiller. 2019. Opportunities and challenges for human-machine teaming in cybersecurity operations. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 63. SAGE Publications, Los Angeles, CA, 442–446.
[37]
David Madras, Toni Pitassi, and Richard Zemel. 2018. Predict responsibly: Improving fairness and accuracy by learning to defer. Advances in Neural Information Processing Systems 31 (2018), 1–11.
[38]
David Madras, Toni Pitassi, and Richard Zemel. 2018. Predict responsibly: Improving fairness and accuracy by learning to defer. Advances in Neural Information Processing Systems 31 (2018), 1–11.
[39]
Trend Micro. 2021. A Global Study: Security Operations on the Backfoot. Retrieved August 16, 2022 from https://www.multivu.com/players/English/8967351-trend-micro-cybersecurity-tool-sprawl-drives-plans-outsource-detection-response
[40]
National Academies of Sciences, Engineering, and Medicine and others. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Consensus Study Report. National Academies of Sciences, Engineering, and Medicine.
[41]
Raymond S. Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2, 2 (1998), 175–220.
[42]
Curtis G. Northcutt, Anish Athalye, and Jonas Mueller. 2021. Pervasive label errors in test sets destabilize machine learning benchmarks. arXiv preprint arXiv:2103.14749 (2021).
[43]
National Institute of Standards and Technology. 2023. NIST Cybersecurity Framework 2.0. Technical Report. U.S. Department of Commerce, Washington, DC.
[44]
Raja Parasuraman, Thomas B. Sheridan, and Christopher D. Wickens. 2000. A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans 30, 3 (2000), 286–297.
[45]
Cecile Paris and Andrew Reeson. 2021. What’s the secret to making sure AI doesn’t steal your job? Work with it, not against it. The Conversation. Retrieved May 31, 2024 from https://theconversation.com/whats-the-secret-to-making-sure-ai-doesnt-steal-your-job-work-with-it-not-against-it-172691
[46]
Ani Petrosyan. 2023. Estimated Cost of Cybercrime Worldwide 2017-2028. Retrieved July 23, 2023 from https://www.statista.com/forecasts/1280009/cost-cybercrime-worldwide
[47]
Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proceedings of International Conference on Intelligence Analysis, Vol. 5. 2–4.
[48]
Alun Preece, Will Webberley, and Dave Braines. 2015. Conversational sensemaking. In Next-Generation Analyst III, Vol. 9499. SPIE, 121–129.
[49]
Phanish Puranam. 2021. Human–AI collaborative decision-making as an organization design problem. Journal of Organization Design 10, 2 (2021), 75–80.
[50]
PurpleSec. 2023. Cyber Security Statistics: The Ultimate List of Stats Data, & Trends for 2023. Retrieved April 9, 2023 from https://purplesec.us/resources/cyber-security-statistics
[51]
Andreas Reisser, Manfred Vielberth, Sofia Fohringer, and Günther Pernul. 2022. Security operations center roles and skills: A comparison of theory and practice. In Data and Applications Security and Privacy XXXVI. Lecture Notes in Computer Science, Vol. 13383. Springer, 316–327.
[52]
Wakefield Research. 2022. 2022 Devo SOC Performance Report. White Paper. Wakefield Research. https://www.devo.com/resources/analyst-research/2022-devo-soc-performance-report/
[53]
William B. Rouse and Nancy M. Morris. 1986. On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin 100, 3 (1986), 349.
[54]
Matthias Scheutz, Scott A. DeLoach, and Julie A. Adams. 2017. A framework for developing and using shared mental models in human-agent teams. Journal of Cognitive Engineering and Decision Making 11, 3 (2017), 203–224.
[55]
Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: Enabling multilevel exploration and sensemaking with large language models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST’23). ACM, Article 1, 18 pages.
[56]
Sathya Chandran Sundaramurthy, Alexandru G. Bardas, Jacob Case, Xinming Ou, Michael Wesch, John McHugh, and S. Raj Rajagopalan. 2015. A human capital model for mitigating security analyst burnout. In Proceedings of the 11th Symposium on Usable Privacy and Security (SOUPS’15). 347–359.
[57]
Tines. 2023. Voice of the SOC 2023 Report. White Paper. Tines. https://www.tines.com/reports/voice-of-the-soc-2023
[58]
Ufuk Topcu, Nadya Bliss, Nancy Cooke, Missy Cummings, Ashley Llorens, Howard Shrobe, and Lenore Zuck. 2020. Assured autonomy: Path toward living with autonomous systems we can trust. arXiv preprint arXiv:2010.14443 (2020).
[59]
Amos Tversky and Daniel Kahneman. 1974. Judgment under uncertainty: Heuristics and biases: Biases in judgments reveal some heuristics of thinking under uncertainty. Science 185, 4157 (1974), 1124–1131.
[60]
Amos Tversky and Daniel Kahneman. 1981. The framing of decisions and the psychology of choice. Science 211, 4481 (1981), 453–458.
[61]
Manfred Vielberth, Fabian Böhm, Ines Fichtinger, and Günther Pernul. 2020. Security operations center: A systematic study and open challenges. IEEE Access 8 (2020), 227756–227779.
[62]
Virtual Intelligence Briefing. 2021. The State of Security Automation. Technical Report. Palo Alto Networks. https://start.paloaltonetworks.com/The-State-of-SOAR-Automation
[63]
Sarah Myers West, Meredith Whittaker, and Kate Crawford. 2019. Discriminating Systems: Gender, Race, and Power in AI—Report. AI Now Institute.
[64]
Elaine J. Weyuker. 1982. On testing non-testable programs. Computer Journal 25, 4 (1982), 465–470.
[65]
Bryan Wilder, Eric Horvitz, and Ece Kamar. 2021. Learning to complement humans. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI’20). Article 212, 8 pages.
[66]
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. 2023. AutoGen: Enabling next-gen LLM applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023).
[67]
Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI chains: Transparent and controllable human-AI interaction by chaining large language model prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–22.
[68]
Boyuan Zheng, Sunny Verma, Jianlong Zhou, Ivor W. Tsang, and Fang Chen. 2022. Imitation learning: Progress, taxonomies and challenges. IEEE Transactions on Neural Networks and Learning Systems. Published Online, October 25, 2022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 24, Issue 3
August 2024
108 pages
EISSN:1557-6051
DOI:10.1145/3613642
  • Editor:
  • Ling Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2024
Online AM: 30 May 2024
Accepted: 23 May 2024
Revised: 02 March 2024
Received: 03 January 2024
Published in TOIT Volume 24, Issue 3

Check for updates

Author Tags

  1. Human-AI teaming
  2. human-AI collaboration
  3. security operations centre
  4. alert triage
  5. alert prioritisation
  6. shared situation awareness

Qualifiers

  • Research-article

Funding Sources

  • CSIRO’s Collaborative Intelligence (CINTEL) Future Science Platform

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 1,633
    Total Downloads
  • Downloads (Last 12 months)1,633
  • Downloads (Last 6 weeks)963
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media