Invited Keynotes by Mark Scanlon
Given the ever-increasing prevalence of technology in modern life, there is a corresponding incre... more Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being encountered by law enforcement agencies throughout the world. It can be anticipated that the number of cases requiring digital forensic analysis will greatly increase in the future. It is also likely that each case will require the analysis of an increasing number of devices including computers, smartphones, tablets, cloud-based services, Internet of Things devices, wearables, etc. The variety of new digital evidence sources pose new and challenging problems for the digital investigator from an identification, acquisition, storage and analysis perspective. This talk explores the current challenges contributing to the backlog in digital forensics from a technical standpoint and outlines a number of future research topics that could greatly contribute to a more efficient digital forensic process.
Journal Articles by Mark Scanlon
Digital Investigation, 2018
Current malware detection and classification approaches generally rely on time consuming and know... more Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification.
Digital Investigation, 2018
Historically, radio-equipment has solely been used as a two-way analogue communication device. To... more Historically, radio-equipment has solely been used as a two-way analogue communication device. Today, the use of radio communication equipment is increasing by numerous organisations and businesses. The functionality of these traditionally short-range devices have expanded to include private call, address book, call-logs, text messages, lone worker, telemetry, data communication, and GPS. Many of these devices also integrate with smartphones, which delivers Push-To-Talk services that make it possible to setup connections between users using a two-way radio and a smartphone. In fact, these devices can be used to connect users only using smartphones. To date, there is little research on the digital traces in modern radio communication equipment. In fact, increasing the knowledge base about these radio communication devices and services can be valuable to law enforcement in a police investigation. In this paper, we investigate what kind of radio communication equipment and services law enforcement digital investigators can encounter at a crime scene or in an investigation. Subsequent to seizure of this radio communication equipment we explore the traces, which may have a forensic interest and how these traces can be acquired. Finally, we test our approach on sample radio communication equipment and services.
Organised crime, as well as individual criminals, is benefiting from the protection of private br... more Organised crime, as well as individual criminals, is benefiting from the protection of private browsers provide to those who would carry out illegal activity, such as money laundering, drug trafficking, the online exchange of child-abuse material, etc. The protection afforded to users of the Epic Privacy Browser illustrates these benefits. This browser is currently in use in approximately 180 countries worldwide. This paper outlines the location and type of evidence available through live and post-mortem state analyses of the Epic Privacy Browser. This study identifies the manner in which the browser functions during use, where evidence can be recovered after use, as well as the tools and effective presentation of the recovered material.
Bytewise approximate matching algorithms have in recent years shown significant promise in detect... more Bytewise approximate matching algorithms have in recent years shown significant promise in detecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g., a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a "Hierarchical Bloom Filter Tree" (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness.
Education and training in digital forensics requires a variety of suitable challenge corpora cont... more Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on the part of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant time and resources and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this paper, we introduce a more capable methodology and system as an alternative to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely base operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a " diffing " of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an " evidence package ". Evidence packages can be created for different personae, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated into the base images. A number of additional applications in digital forensic challenge creation for tool testing and validation, proficiency testing, and malware analysis are also discussed as a result of using EviPlant.
The task of generating network-based evidence to support network forensic investigation is becomi... more The task of generating network-based evidence to support network forensic investigation is becoming increasingly prominent. Undoubtedly, such evidence is significantly imperative as it not only can be used to diagnose and respond to various network-related issues (i.e., performance bottlenecks, routing issues, etc.) but more importantly, can be leveraged to infer and further investigate network security intrusions and infections. In this context, this paper proposes a proactive approach that aims at generating accurate and actionable network-based evidence related to groups of compromised network machines (i.e., campaigns). The approach is envisioned to guide investigators to promptly pinpoint such malicious groups for possible immediate mitigation as well as empowering network and digital forensic specialists to further examine those machines using auxiliary collected data or extracted digital artifacts. On one hand, the promptness of the approach is successfully achieved by monitoring and correlating perceived probing activities, which are typically the very first signs of an infection or misdemeanors. On the other hand, the generated evidence is accurate as it is based on an anomaly inference that fuses data behavioral analytics in conjunction with formal graph theoretic concepts. We evaluate the proposed approach in two deployment scenarios, namely, as an enterprise edge engine and as a global capability in a security operations center model. The empirical evaluation that employs 10 GB of real botnet traffic and 80 GB of real darknet traffic indeed demonstrates the accuracy, effectiveness and simplicity of the generated network-based evidence.
Due to budgetary constraints and the high level of training required, digital forensic analysts a... more Due to budgetary constraints and the high level of training required, digital forensic analysts are in short supply in police forces the world over. This inevitably leads to a prolonged time taken between an investigator sending the digital evidence for analysis and receiving the analytical report back. In an attempt to expedite this procedure, various process models have been created to place the forensic analyst in the field conducting a triage of the digital evidence. By conducting triage in the field, an investigator is able to act upon pertinent information quicker, while waiting on the full report. The work presented as part of this paper focuses on the training of front-line personnel in the field triage process, without the need of a forensic analyst attending the scene. The premise has been successfully implemented within regular/non-digital forensics, i.e., crime scene investigation. In that field, front-line members have been trained in specific tasks to supplement the trained specialists. The concept of front-line members conducting triage of digital evidence in the field is achieved through the development of a new process model providing guidance to these members. To prove the model's viability, an implementation of this new process model is presented and evaluated. The results outlined demonstrate how a tiered response involving digital evidence specialists and non-specialists can better deal with the increasing number of investigations involving digital evidence.
In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser Project Ma... more In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser Project Maelstrom into public beta. The browser facilitates a new alternative website distribution paradigm to the traditional HTTP based, client-server model. This decentralised web is powered by each of the users accessing each Maelstrom hosted website. Each user shares their copy of the website with other new visitors to the website. As a result, a Maelstrom hosted website cannot be taken offline by law enforcement or any other parties. Due to this open distribution model, a number of interesting censorship, security and privacy considerations are raised. This paper explores the application, its protocol, sharing Maelstrom content and its new visitor powered ``web-hosting'' paradigm.
High availability is no longer just a business continuity concern. Users are increasingly dependa... more High availability is no longer just a business continuity concern. Users are increasingly dependant on devices that consume and produce data in ever increasing volumes. A popular solution is to have a central repository which each device accesses after centrally managed authentication. This model of use is facilitated by cloud based file synchronisation services such as Dropbox, OneDrive, Google Drive and Apple iCloud. Cloud architecture allows the provisioning of storage space with ``always-on'' access. Recent concerns over unauthorised access to third party systems and large scale exposure of private data have made an alternative solution desirable. These events have caused users to assess their own security practices and the level of trust placed in third party storage services. One option is BitTorrent Sync, a cloudless synchronisation utility provides data availability and redundancy. This utility replicates files stored in shares to remote peers with access controlled by keys and permissions. While lacking the economies brought about by scale, complete control over data access has made this a popular solution. The ability to replicate data without oversight introduces risk of abuse by users as well as difficulties for forensic investigators. This paper suggests a methodology for investigation and analysis of the protocol to assist in the control of data flow across security perimeters.
Digital Investigation, 2014
Conference Papers by Mark Scanlon
Electromagnetic noise emitted from running computer displays modulates information about the pict... more Electromagnetic noise emitted from running computer displays modulates information about the picture frames being displayed on screen. Attacks have been demonstrated on eavesdropping computer displays by utilising these emissions as a side-channel vector. The accuracy of reconstructing a screen image depends on the emission sampling rate and bandwidth of the attackers signal acquisition hardware. The cost of radio frequency acquisition hardware increases with increased supported frequency range and bandwidth. A number of enthusiast-level, affordable software defined radio equipment solutions are currently available facilitating a number of radio-focused attacks at a more reasonable price point. This work investigates three accuracy influencing factors, other than the sample rate and bandwidth, namely noise removal, image blending, and image quality adjustments, that affect the accuracy of monitor image reconstruction through electromagnetic side-channel attacks.
The ever-growing backlog of digital evidence waiting for analysis has become a significant issue ... more The ever-growing backlog of digital evidence waiting for analysis has become a significant issue for law enforcement agencies throughout the world. This is due to an increase in the number of cases requiring digital forensic analysis coupled with the increasing volume of data to process per case. This has created a demand for a paradigm shift in the method that evidence is acquired, stored, and analyzed. The ultimate goal of the research presented in this paper is to revolutionize the current digital forensic process through the leveraging of centralized deduplicated acquisition and processing approach. Focusing on this first step in digital evidence processing, acquisition, a system is presented enabling deduplicated evidence acquisition with the capability of automated, forensically-sound complete disk image reconstruction. As the number of cases acquired by the proposed system increases, the more duplicate artifacts will be encountered, and the more efficient the processing of each new case will become. This results in a time saving for digital investigators, and provides a platform to enable non-expert evidence processing, alongside the benefits of reduced storage and bandwidth requirements.
Digital forensics is fast-growing eld involving the discovery and analysis of digital evidence ac... more Digital forensics is fast-growing eld involving the discovery and analysis of digital evidence acquired from electronic devices to assist investigations for law enforcement. Traditional digital forensic investigative approaches are often hampered by the data contained on these devices being encrypted. Furthermore, the increasing use of IoT devices with limited standardisation makes it difficult to analyse them with traditional techniques. is paper argues that electromagnetic side-channel analysis has significant potential to progress investigations obstructed by data encryption. Several potential avenues towards this goal are discussed.
In today's world, closed circuit television, cellphone photographs and videos, open-source intell... more In today's world, closed circuit television, cellphone photographs and videos, open-source intelligence (i.e., social media/web data mining), and other sources of photographic evidence are commonly used by police forces to identify suspects and victims of both online and offline crimes. Human characteristics , such as age, height, weight, gender, hair color, etc., are often used by police officers and witnesses in their description of unidentified suspects. In certain circumstances, the age of the victim can result in the determination of the crime's cate-gorization, e.g., child abuse investigations. Various automated machine learning-based techniques have been implemented for the analysis of digital images to detect soft biometric traits, such as age and gender, and thus aid detectives and investigators in progressing their cases. This paper documents an evaluation of existing cognitive age prediction services. The evaluative and comparative analysis of the various services was conducted to identify trends and issues inherent to their performance. One significant contributing factor impeding the accurate development of the services investigated is the notable lack of sufficient sample images in specific age ranges, i.e., underage and elderly. To overcome this issue, a dataset generator was developed, which harnesses collections of several unbalanced datasets and forms a balanced, curated dataset of digital images annotated with their corresponding age and gender.
In criminal investigations, telecommunication wiretaps have become a common technique used by law... more In criminal investigations, telecommunication wiretaps have become a common technique used by law enforcement. While phone-based wiretapping is well documented and the procedure for their execution are well known, the same cannot be said for Internet taps. Lawfully intercepted network traffic often contains a lot of encrypted traffic making it increasingly difficult to find useful information inside the traffic captured. The advent of Internet-of-Things further complicates the process for non-technical investigators. The current level of complexity of intercepted network traffic is close to a point where data cannot be analysed without supervision of a digital investigator with advanced network knowledge. Current investigations focus on analysing all traffic in a chronological manner and are predominately conducted on the data contents of the intercepted traffic. This approach often becomes overly arduous when the amount of data to be analysed becomes very large. In this paper, we propose a novel approach to analyse large amounts of intercepted network traffic based on network metadata. Our approach significantly reduces the duration of the analysis and also produces an insight view of analysing results for the non-technical investigator. We also test our approach with a large sample of network traffic data.
Perhaps the most common task encountered by digital forensic investigators consists of searching ... more Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of "known-illegal" files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way. In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness.
Uploads
Invited Keynotes by Mark Scanlon
Journal Articles by Mark Scanlon
Conference Papers by Mark Scanlon