Log Analysis of Cyber Security Training Exercises: Procedia Manufacturing July 2015
Log Analysis of Cyber Security Training Exercises: Procedia Manufacturing July 2015
net/publication/281639127
CITATIONS READS
21 1,214
6 authors, including:
Chris Forsythe
Sandia National Laboratories
109 PUBLICATIONS 841 CITATIONS
SEE PROFILE
All content following this page was uploaded by Chris Forsythe on 10 September 2015.
ScienceDirect
Procedia Manufacturing 00 (2015) 000–000
www.elsevier.com/locate/procedia
6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the
Affiliated Conferences, AHFE 2015
Abstract
Cyber security is a pervasive issue that impacts public and private organizations. While several published accounts describe the
task demands of cyber security analysts, it is only recently that research has begun to investigate the cognitive and performance
factors that distinguish novice from expert cyber security analysts. Research in this area is motivated by the need to understand
how to better structure the education and training of cyber security professionals, a desire to identify selection factors that are
predictive of professional success in cyber security and questions related to the development of software tools to augment human
performance of cyber security tasks. However, a common hurdle faced by researchers involves gaining access to cyber security
professionals for data collection activities, whether controlled experiments or semi-naturalistic observations. An often readily
available and potentially valuable source of data may be found in the records generated through cyber security training exercises.
These events frequently entail semi-realistic challenges that may be modeled on real-world occurrences, and occur outside
normal operational settings, freeing participants from the sensitivities regarding information disclosure within operational
environments. This paper describes an infrastructure tailored for the collection of human performance data within the context of
cyber security training exercises. Techniques are described for mining the resulting data logs for relevant human performance
variables. The results provide insights that go beyond current descriptive accounts of the cognitive processes and demands
associated with cyber security job performance, providing quantitative characterizations of the activities undertaken in solving
problems within this domain.
1. Introduction
Cyber security professionals have become essential within organizations providing defense against various
criminal, adversarial and malicious threats. However, the available pool of qualified personnel is insufficient to
meet current demands. Furthermore, as the reliance upon information technologies continues to grow, the demand
for cyber professionals will also grow. There has been substantial research, as well as monetary expenditures for
commercial products, focused on software solutions to augment the performance of cyber security professionals.
However, it is difficult to imagine closing the gap between the availability of cyber professionals and the demand for
their services primarily through technology. The human remains a vital, inescapable element in the cyber defense of
organizations (Forsythe et al., 2013). The human component in cyber defense was illustrated in research assessing
the utility of intrusion detection software (Sommestad & Hunstad, 2013). Intrusion detection software monitors
network activity and generates alerts in response to suspicious patterns of activity. It was found that human
operators enhanced the overall effectiveness of these products by increasing the proportion of alarms corresponding
to legitimate threats that were detected, without decreasing the likelihood that attacks were detected. It has been
recognized that there is a critical interplay between technology solutions and human operators (Haack et al., 2009).
A second mechanism for improving cyber defense is through the education and training of cyber professionals.
Recent research has focused on identifying the knowledge and skills that underlie the progression from novice to
competent to expert to elite cyber defenders. Analysis has been reported characterizing the tasks, decisions,
workflow and demands associated with cyber security operations (D’Amico et al., 2005; Erbacher et al., 2010; Reed
et al., 2013). Paul and Whitley (2013) described the process followed by cyber professionals in assessing and
responding to alerts concerning suspicious network activity and the questions that arise at different steps in the
process, with consideration of the domains of knowledge prompting specific questions. Two distinct forms of
knowledge believed to be essential to cyber operations have been described (Goodall, Lutters & Komlodi, 2009).
First, there is knowledge of networking and security. Second, there is situated knowledge reflected in an
understanding of what is normal for a given information network. It was noted that the former lends itself to
transferring from one organization to another, but the latter places cyber professionals at a disadvantage when they
move from one organization to another (Goodall, Lutters & Komlodi, 2004).
A common challenge faced by researchers in conducting research involving cyber security professionals has been
access to the individuals within the operational settings that they work. This is partly attributable to the heavy
workload typical of cyber operations, but also a product of sensitivities regarding capabilities and vulnerabilities.
An alternative to studying actual operations can be found with cyber security training exercises (Sommestad &
Hallberg, 2012; Reed, Nauer & Silva, 2013). These environments may be instrumented to provide detailed data
concerning the activities of participants, use of software tools and success in accomplishing exercise objectives.
This creates the opportunity for observational research. For example, Jariwala et al. (2012) described the importance
of team communication, structure and leadership in effective performance within the context of competitive cyber
exercises. Other research has sought to characterize performance factors contributing to performance. For example,
it was identified that individuals that integrated the use of specialized cyber security software tools with the use of
generalized software tools (e.g., Microsoft Excel, Cygwin) performed better than those who more exclusively
utilized the specialized tools (Silva et al, 2014). Similarly, it was found that participants whose training emphasized
adversary tactics and techniques surpassed the performance of participants with training that emphasized the features
and functions of cyber security software tools (Stevens-Adams et al., 2013).
With competitive cyber security exercises, doubt exists regarding the appropriate measures of performance
(Stevens-Adams et al., 2013), with this doubt a product of uncertainty regarding the appropriate metrics for assessing
cyber security skills in general (Forsythe et al., 2013). The current paper identifies measures that are attainable
within the context of a competitive cyber security exercise. This assessment is based on the Tracer FIRE platform.
Tracer FIRE was developed by the United States Department of Energy as a training environment that provides
operational personnel an opportunity to exercise their skills within a semi-realistic environment. Research
undertaken at Sandia National Laboratory has focused on instrumenting this environment to provide a range of
measures regarding human-machine transactions and performance.
Through the instrumentation of training environments, opportunities exist for collecting real-time data concerning
participant performance (Stevens et al., 2009). Such data may provide the input to automated student performance
assessment. It has been demonstrated that superior training outcomes may be achieved by supplementing human
instructors with software tools that provide automated assessments (Stevens-Adams et al., 2010). Benefits derive
from lessening the workload on instructors by automating detection of mundane facets of performance, allowing
instructors to devote time to more complex, higher-level considerations. Furthermore, automated measures provide
Author name / Procedia Manufacturing 00 (2015) 000–000 3
a degree of standardization that is sometimes difficult to achieve otherwise. The current paper lays the groundwork
and provides an initial quantitative evaluation of techniques for automated assessment of student performance within
cyber security training exercises.
2.0 Methods
2.1 Subjects
Subjects consisted on a total of 26 individuals who consented to data collection during two separate Tracer FIRE
cyber security training exercises. There were 11 subjects from the first event which occurred during the spring of
2014 and 15 subjects from the second event that occurred in the summer of 2014.
2.2 Procedure
The Tracer FIRE exercise consisted of a multi-day event that combined classroom instruction in the use of cyber
security software tools, forensic analysis techniques, and adversary tactics and techniques with a team competition
exercise. At the beginning of the competition, there was an announcement concerning the study and those willing to
consent to data collection underwent the informed consent process. Date collection regarding human-machine
transactions occurred non-intrusively through automated data logging as subjects participated in the exercise.
The exercise presented teams a multi-level challenge. At a low level, there was a series of puzzles that allowed
participants to exercise their cyber forensic analysis skills, as well as the cyber security software tools. At a higher
level, there was a complex scenario partially based on real-world events that involved multiple adversaries with
differing objectives operating individually and in collaboration with one another. As participants solved the
individual puzzles they received points that were tallied on a scoreboard and unlocked more puzzles. Additionally,
by solving individual puzzles, participants obtained clues to the overall scenario that would helpful in solving
subsequent puzzles. At the conclusion, each team presented their interpretation of the overall scenario and the
ultimate outcome hinged upon how closely the team interpretations corresponded with the ground truth of the actual
events.
A Sandia National Laboratories software tool known as Hyperion was used to capture human-machine
transactions. This included the use of software applications, Internet accesses, windows events and keystrokes and
mouse clicks. The data collected from Hyperion was combined and synchronized with the game server logs and
logs from of the news server to provide a combined record encompassing the activities of each individual
participant.
which there is no activity associated with a challenge. Ultimately, the mechanisms for parsing log entries into
blocks of time during which participants are focused on specific high-level objectives would be applicable to
contexts extending beyond post-event analysis of Tracer FIRE exercises, and be generalizable to operational
settings.
The logs generated from the Tracer FIRE exercise consisted of a time synchronized record combining multiple
sources of data. For each human-machine transaction, the data included:
- Participant UserID
- Timestamp
- Interval since previous transaction (i.e., duration)
- Challenge ID, for transactions involving the game server
- Event Type, for transactions involving game server
- Submission, answer submitted for transactions involving submitting answer to game server
- Points Awarded, for transactions involving submitting answers to the game server
- Software Tool, for transaction involving software tools
- Class of Event (Windows, Game Server or News Server)
- Article ID, for transactions involving the News Server
In parsing logs into blocks of activity, the first condition involved periods of inactivity. It was assumed that a
period of 15 minutes or more with no activity represented a boundary between two blocks. The one exception to
this rule addressed situations in which no activities are logged because the participant is reading material accessed
by searching the Internet. Accordingly, when periods of inactivity of up to 30 minutes were observed and the
inactivity was immediately preceded by actions consistent with the participant accessing reading material (e.g.,
Firefox followed by Adobe reader consistent with downloading and reading a pdf document), the period of
inactivity did not serve as a partition between blocks.
As described previously, challenges were accessed via a game server. When a participant opened a challenge,
the action appeared in the log as a “Set” event. Likewise, when a participant submitted an answer, the action was
recorded in the log as a “Submission” and when they abandoned a challenge, the log recorded an “Abandon.”
Activities occurring prior to a Set event were not included in the block of activities with the Set event, with it
generally assumed that a Set event (i.e., opening a challenge) represented the beginning of a sequence of related
activities.
However, there were three exceptions to this rule. First, if a participant had previously worked on a challenge or
another member of their team had worked on a challenge, the participant could know and work toward the solution
to a challenge without actually opening the challenge. Within the logs, this situation was reflected by instances in
which there was a Set event immediately followed by a Submission. In these cases, the block of activity could
extend to include activities prior to the Set event. Second, in solving a challenge, the answer could be recorded in an
application such as Notepad or WordPad, or require the participant to access another software application (e.g., copy
and paste a URL from Firefox). Consequently, a Set and Submit event would be separated by other activities. To
addresses these situations, a rule was adopted that if Set and Submit events were separated by 3 or fewer actions, the
block of activity could begin prior to the Set event. Third, participants would often make an incorrect submission
for a specific challenge and soon thereafter, make another submission. Sometimes, this involved making minor
modifications to their answer (e.g., changing the syntax) and other times, additional work was done. In the logs,
these situations appeared as Set and Submit events involving the same challenge that were either successive or
separated by other activities. For this case, when there were multiple Set and Submit events involving the same
challenge, it was assumed that each Set event corresponded to a continuation of preceding work on the challenge,
resulting in blocks of activity that included multiple Set events.
Submission of a correct answer was considered the end of a block of activities. Likewise, abandonment of a
challenge followed by opening a different challenge was considered the end of a block of activities.
While News items provided the context framing the individual challenges, they generally did not directly address
the challenges. News events were pushed to participants, with participants free to access the News server to retrieve
the news items at their discretion. It is unlikely that a participant would go to the News server to look for
Author name / Procedia Manufacturing 00 (2015) 000–000 5
information to use in solving a specific challenge, but instead periodically check news items to see if there was
anything of interest. Events associated with accessing the News server were not included within blocks of activity.
Sessions began with a series of activities associated with configuring the laptops computers used by participants
and verifying their operation with these activities recorded in the logs. These activities generally involved command
line activities (i.e., cmd.exe) and use Windows Explorer, as well as Internet browsers to download software or other
files. Activities at the beginning of sessions were not included in the analysis if they involved use of the command
line accompanied by Windows Explorer or an Internet browser.
Activities involving Hyperion, which is the software that supports the collection of data logs, were excluded.
Likewise, instances in which participants engaged in activities that clearly did not relate to solving the challenges
(e.g., game play with Minesweeper), and adjacent potentially related activities were excluded from the analysis of
the data logs.
3.0 Results
Through the automated parsing of the data logs, a total of 379 blocks of activity were identified. As shown in
Figure 1, the vast majority of blocks were less than 25 minutes duration, with some extending much longer. It
should be noted that the number of blocks of activity varied significantly across subjects. On average, there were
14.5 blocks for each subject (sd=9.0). Table 1 provides descriptive statistics for several key variables concerning
the automatically derived blocks of activity. On average, a block of activity extended for approximately 17-18 min
and involved approximately 45 distinct actions. Within a block of activity, on average, participants used 4 to 5
different software tools, with there being approximately 22 transitions between software tools and 19 instances
where a participant returned to a tool that had been previously used within the block of activity.
A consideration of software applications, found that participants employed 62 distinct software tools. Figure 2
shows the nine software tools that were used by the most participants. The most frequently used software
application was Explorer, however, it should be noted that the game server required the use of Explorer to access the
exercise content. Yet, the utility of an Internet browser is evidenced by Firefox being the software tool used by the
second most participants, with almost half the participants additionally using Chrome. This observation is further
evidenced in Figure 3, which shows the total number of instances, summed across subjects that each software
application was used.
6 Author name / Procedia Manufacturing 00 (2015) 000–000
Table 1. Descriptive statistics for automatically derived blocks of activity based on averaging the results across blocks for each subject and then,
averaging these means across subjects
Figure 3. Total number of instances software applications were used summed across subjects.
Further analysis considered the transitions between software applications. For this analysis, the twelve most
frequently utilized software applications were considered. The transition diagram is shown in Figure 4.
Author name / Procedia Manufacturing 00 (2015) 000–000 7
Figure 4. The size of nodes corresponds to the frequency of transitions to/from a software application. The links between nodes represent more
frequent transitions between software applications to another with links weighted more heavily for more transitions.
4.0 Discussion
The emphasis with the current paper has been to demonstrate the instrumentation of a cyber security exercise and
the use of automated techniques to parse the resulting data logs into meaningful units that may provide the basis for
further analysis and assessment of human performance. Previous studies have cast doubt upon the utility of
performance measures derived based on the scores obtained within the context of competitive exercises (Stevens-
Adams et al., 2013). Instead, more meaningful insights may be gained from the work processes and the use of
software tools to facilitate these work processes. For instance, it has been shown that the more effective performers
tend to utilize general purpose tools in support of their use of specialized cyber security tools (Silva et al., 2014).
Automated parsing of logs is an essential step in development of techniques for automated performance assessment.
However, at present, uncertainty exists concerning the appropriate metrics for assessing performance within cyber
security exercises (Forsythe et al., 2013). An accompanying paper (McClain et al., addresses this topic through
further analysis of the current data set to compare the behavioural characteristics of expert and novice participants.
Acknowledgements
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a
wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear
Security Administration under contract DE-AC04-94AL85000. (SAND2014-2123 C)
References
D'Amico, A., Whitley, K., Tesone, D., O'Brien, B., & Roth, E. (2005). Achieving cyber defense situational awareness: A cognitive task analysis
of information assurance analysts. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, SAGE
Publications, 229-233.
Erbacher, R. F., Frincke, D. A., Wong, P. C., Moody, S., & Fink, G. (2010). A multi-phase network situational awareness cognitive task analysis.
Information Visualization, 9(3), 204-219.
Forsythe, C., Silva, A., Stevens-Adams, S. & Bradshaw, J. (2013). Human Dimension in Cyber Operations Research and Development Priorities.
Proceedings of the Human-Computer Interaction International Conference, Las Vegas, NV.
8 Author name / Procedia Manufacturing 00 (2015) 000–000
Goodall, J. R., Lutters, W. G., & Komlodi, A. (2004, November). I know my network: Collaboration and expertise in intrusion detection. In
Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, ACM, pp342-345.
Goodall, J. R., Lutters, W. G., & Komlodi, A. (2009). Developing expertise for network intrusion detection. Information Technology & People,
22(2), 92-108.
Haack, J. N., Fink, G. A., Maiden, W. M., McKinnon, D., & Fulp, E. W. (2009, May). Mixed-Initiative Cyber Security: Putting humans in the
right loop. In The First International Workshop on Mixed-Initiative Multiagent Systems (MIMS) at AAMAS.
Jariwala, S., Champion, M., Rajivan, P., & Cooke, N. J. (2012, September). Influence of Team Communication and Coordination on the
Performance of Teams at the iCTF Competition. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting,
SAGE Publications, pp. 458-462.
McClain, J.T., Silva, A., Emmanuel, G., Anderson, B., Nauer, K. & Forsythe, C. (2015). Human Performance Factors in Cyber Security Forensic
Analysis. Proceedings of the Applied Human Factors and Ergonomics Conference, Las Vegas, NV.
Paul, C. L., & Whitley, K. (2013). A taxonomy of cyber awareness questions for the user-centered design of cyber situation awareness. In Human
Aspects of Information Security, Privacy, and Trust, Springer Berlin Heidelberg, pp. 145-154.
Reed, T., Abbott, R., Anderson, B., Nauer, K. & Forsythe, C. (2014). Simulation of workflow and threat characteristics for cyber security
incident response teams. Proceedings of the 2014 International Annual Meeting of the Human Factors and Ergonomics Society,
Chicago, IL.
Reed, T., Nauer, K., & Silva, A. (2013). Instrumenting competition-based exercises to evaluate cyber defender situation awareness. In
Foundations of Augmented Cognition, Springer Berlin Heidelberg, pp. 80-89.
Silva, A., McClain, J., Reed, T., Anderson, B., Nauer, K., Abbott, R. & Forsythe, C. (2014). Factors impacting performance in competitive cyber
exercises. Proceedings of the Interservice/Interagency Training, Simulation and Education Conference, Orlando FL.
Sommestad, T., & Hallberg, J. (2012). Cyber security exercises and competitions as a platform for cyber security experiments. In Secure IT
Systems, Springer Berlin Heidelberg, pp. 47-60.
Sommestad, T., & Hunstad, A. (2013). Intrusion detection and the role of the system administrator. Information Management & Computer
Security, 21(1), 30-40.
Stevens, S., Forsythe, C., Abbott, R. & Giesesler, C. (2009). Experimental assessment of automated knowledge capture. Proceedings of the
Interservice/Interagency, Training, Simulation and Education Conference, Orlando, FL.
Stevens-Adams, S., Basilico, J., Abbott, R.A., Gieseler, C. & Forsythe, C. (2010). Using after-action review based on automated performance
assessment to enhance training effectiveness. Proceedings of the Human Factors and Ergonomics Society, San Francisco, CA.
Stevens-Adams, S., Carbajal, A., Silva, A., Nauer, K., Anderson, B., Reed, T. & Forsythe. C. (2013). Enhanced training for cyber situational
awareness. Proceedings of the Human-Computer Interactional International Conference, Las Vegas, NV.