Python Digital Forensics Tutorial
Python Digital Forensics Tutorial
Audience
This tutorial will be useful for graduates, post graduates, and research students who either
have an interest in this subject or have this subject as a part of their curriculum. Any
reader who is enthusiastic about gaining knowledge digital forensics using Python
programming language can also pick up this tutorial.
Prerequisites
This tutorial is designed by making an assumption that the reader has a basic knowledge
about operating system and computer networks. You are expected to have a basic
knowledge of Python programming.
If you are novice to any of these subjects or concepts, we strongly suggest you go through
tutorials based on these, before you start your journey with this tutorial.
All the content and graphics published in this e-book are the property of Tutorials Point (I)
Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish
any contents or a part of contents of this e-book in any manner without written consent
of the publisher.
We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our
website or its contents including this tutorial. If you discover any errors on our website or
in this tutorial, please notify us at contact@tutorialspoint.com
i
Python Digital Forensics
Table of Contents
About the Tutorial .................................................................................................................................... i
Audience .................................................................................................................................................. i
Prerequisites ............................................................................................................................................ i
Limitations .............................................................................................................................................. 4
ii
Python Digital Forensics
Introduction .......................................................................................................................................... 15
iTunes Backups...................................................................................................................................... 17
Wi - Fi .................................................................................................................................................... 22
Introduction .......................................................................................................................................... 28
Understanding VirusShare..................................................................................................................... 52
iii
Python Digital Forensics
Introduction .......................................................................................................................................... 68
Prefetch Files......................................................................................................................................... 86
12. PYTHON DIGITAL FORENSICS – INVESTIGATION OF LOG BASED ARTIFACTS .................... 103
Timestamps......................................................................................................................................... 103
iv
1. Python Digital Forensics — Introduction Python Digital Forensics
This chapter will give you an introduction to what digital forensics is all about, and its historical
review. You will also understand where you can apply digital forensics in real life and its
limitations.
For example, you can rely on digital forensics extract evidences in case somebody steals some
data on an electronic device.
1
Python Digital Forensics
practices for Computer Forensics”. Another feather in the cap was a European led
international treaty namely “The Convention on Cybercrime” was signed by 43 nations
and ratified by 16 nations. Even after such standards, still there is a need to resolve some
issues which has been identified by researchers.
A computer forensics investigation process involves three major phases as explained below:
Phase 2: Analysis
The input of this phase is the data acquired in the acquisition phase. Here, this data was
examined to identify evidences. This phase gives three kinds of evidences as follows:
Evidence of tampering: These evidences show that the system was tempered to avoid
identification. It includes examining the files and directory content for recovering the
deleted files.
2
Python Digital Forensics
Criminal Law
In criminal law, the evidence is collected to support or oppose a hypothesis in the court.
Forensics procedures are very much similar to those used in criminal investigations but with
different legal requirements and limitations.
Private Investigation
Mainly corporate world uses digital forensics for private investigation. It is used when
companies are suspicious that employees may be performing an illegal activity on their
computers that is against company policy. Digital forensics provides one of the best routes
for company or person to take when investigating someone for digital misconduct.
Computer Forensics
This branch of digital forensics deals with computers, embedded systems and static memories
such as USB drives. Wide range of information from logs to actual files on drive can be
investigated in computer forensics.
Mobile Forensics
This deals with investigation of data from mobile devices. This branch is different from
computer forensics in the sense that mobile devices have an inbuilt communication system
which is useful for providing useful information related to location .
Network Forensics
This deals with the monitoring and analysis of computer network traffic, both local and
WAN(wide area network) for the purposes of information gathering, evidence collection, or
intrusion detection.
Database Forensics
This branch of digital forensics deals with forensics study of databases and their metadata.
3
Python Digital Forensics
Technical Skills
A digital forensics examiner must have good technological skills because this field requires
the knowledge of network, how digital system interacts.
Communication Skills
Good communication skills are a must to coordinate with various teams and to extract any
missing data or information.
Limitations
Digital forensic investigation offers certain limitations as discussed here:
Investigating Tools
The effectiveness of digital investigation entirely lies on the expertise of digital forensics
examiner and the selection of proper investigation tool. If the tool used is not according to
specified standards then in the court of law, the evidences can be denied by the judge.
4
Python Digital Forensics
Cost
Producing digital evidences and preserving them is very costly. Hence this process may not
be chosen by many people who cannot afford the cost.
5
2. Python Digital Forensics – Getting Started with Python Digital Forensics
Python
In the previous chapter, we learnt the basics of digital forensics, its advantages and
limitations. This chapter will make you comfortable with Python, the essential tool that we are
using in this digital forensics investigation.
Some of the unique features of Python programming language that makes it a good fit for
digital forensics projects are given below:
Help and Support: Being an open source programming language, Python enjoys
excellent support from the developer’s and users’ community.
Features of Python
Python, being a high-level, interpreted, interactive and object-oriented scripting language,
provides the following features:
Easy to Learn: Python is a developer friendly and easy to learn language, because it
has fewer keywords and simplest structure.
Expressive and Easy to read: Python language is expressive in nature; hence its
code is more understandable and readable.
Provides Various Modules and Functions: Python has large standard library which
allows us to use rich set of modules and functions for our script.
6
Python Digital Forensics
Supports Dynamic Type Checking: Python supports dynamic type checking and
provides very high-level dynamic data types.
Installing Python
Python distribution is available for various platforms such as Windows, UNIX, Linux, and Mac.
We only need to download the binary code as per our platform. In case if the binary code for
any platform is not available, we must have a C compiler so that source code can be compiled
manually.
This section will make you familiar with installation of Python on various platforms:
Step4: If you wish to customize some options, you can edit the Modules/Setup file.
Once you have successfully completed the steps given above, Python will be installed at its
standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX where XX
is the version of Python.
Step2: Download the Windows installer python-XYZ.msi file, where XYZ is the version we
need to install.
Step3: Now run that MSI file after saving the installer file to your local machine.
7
Python Digital Forensics
Step4: Run the downloaded file which will bring up the Python installation wizard.
You can use the following command to install Homebrew, incase you do not have it on your
system:
If you need to update the package manager, then it can be done with the help of following
command:
$ brew update
8
Python Digital Forensics
Running Python
You can choose any of the following three methods to start the Python interpreter:
Step2: Start coding right away in the interactive interpreter using the commands shown
below:
$python # Unix/Linux
or
python% # Unix/Linux
or
C:> python # Windows/DOS
Windows IDE: Windows has PythonWin, the first Windows interface for Python along
with GUI.
Macintosh IDE: Macintosh has IDLE IDE which is available from the main website,
downloadable as either MacBinary or BinHex'd files.
9
3. Python Digital Forensics – Artifact Report Python Digital Forensics
Now that you are comfortable with installation and running Python commands on your local
system, let us move into the concepts of forensics in detail. This chapter will explain various
concepts involved in dealing with artifacts in Python digital forensics.
It is the document in which digital forensic examiner outlines the investigation process
and its findings.
A good digital forensic report can be referenced by another examiner to achieve same
result by given same repositories.
It is a technical and scientific document that contains facts found within the 1s and 0s
of digital evidence.
Summary: The report must contain the brief summary of information so that the
reader can ascertain the report’s purpose.
Tools used: We must mention the tools which have been used for carrying the process
of digital forensics, including their purpose.
Recommendations for counsel: The report must have the recommendations for
counsel to continue or cease investigation based on the findings in report.
10
Python Digital Forensics
CSV Reports
One of the most common output formats of reports is a CSV spreadsheet report. You can
create a CSV to create a report of processed data using the Python code as shown below:
We are using the following global variable to represent sample data types:
Next, let us define the method to proceed for further operations. We open the file in the “w”
mode and set the newline keyword argument to an empty string.
11
Python Digital Forensics
If you run the above script, you will get the following details stored in report1.csv file.
Mohan 25 Chandigarh HR
Parkash 45 Delhi IT
Excel Reports
Another common output format of reports is Excel (.xlsx) spreadsheet report. We can create
table and also plot the graph by using Excel. We can create report of processed data in Excel
format using Python code as shown below:
import xlsxwriter
Now, create a workbook object. For this, we need to use Workbook() constructor.
workbook = xlsxwriter.Workbook('report2.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
12
Python Digital Forensics
workbook.close()
The above script will create an Excel file named report2.xlsx having the following data:
Ram 32 Bhopal
Mohan 25 Chandigarh
Parkash 45 Delhi
image=ImageGrab.grab()
Use the following line of code to save the screenshot to the given location:
image.save('d:/image123.png')
13
Python Digital Forensics
Now, if you want to pop up the screenshot as a graph, you can use the following Python code:
import numpy as np
import matplotlib.pyplot as plt
import pyscreenshot as ImageGrab
imageg = ImageGrab.grab()
plt.imshow(image, cmap='gray', interpolation='bilinear')
plt.show()
14
4. Python Digital Forensics – Mobile Device Forensics Python Digital Forensics
This chapter will explain Python digital forensics on mobile devices and the concepts involved.
Introduction
Mobile device forensics is that branch of digital forensics which deals with the acquisition and
analysis of mobile devices to recover digital evidences of investigative interest. This branch is
different from computer forensics because mobile devices have an inbuilt communication
system which is useful for providing useful information related to location.
Messages: These are the useful artifacts which can reveal the state of mind of the
owner and can even give some previous unknown information to the investigator.
Location History: The location history data is a useful artifact which can be used by
investigators to validate about the particular location of a person.
15
Python Digital Forensics
following Python code will open and read PLIST file. Note that before proceeding into this, we
must create our own Info.plist file.
First, install a third party library named biplist by the following command:
import biplist
import os
import sys
Now, use the following command under main method can be used to read plist file into a
variable:
def main(plist):
try:
data = biplist.readPlist(plist)
except (biplist.InvalidPlistException,biplist.NotBinaryPlistException) as e:
print("[-] Invalid PLIST file - unable to be opened by biplist")
sys.exit(1)
Now, we can either read the data on the console or directly print it, from this variable.
SQLite Databases
SQLite serves as the primary data repository on mobile devices. SQLite an in-process library
that implements a self-contained, server-less, zero-configuration, transactional SQL database
engine. It is a database, which is zero-configured, you need not configure it in your system,
unlike other databases.
If you are a novice or unfamiliar with SQLite databases, you can follow the link
https://www.tutorialspoint.com/sqlite/index.htm. Additionally, you can follow the link
https://www.tutorialspoint.com/sqlite/sqlite_python.htm in case you want to get into detail
of SQLite with Python.
During mobile forensics, we can interact with the sms.db file of a mobile device and can
extract valuable information from message table. Python has a built in library named sqlite3
for connecting with SQLite database. You can import the same with the following command:
import sqlite3
16
Python Digital Forensics
Now, with the help of following command, we can connect with the database, say sms.db in
case of mobile devices:
Conn = sqlite3.connect(‘sms.db’)
C = conn.cursor()
Here, C is the cursor object with the help of which we can interact with the database.
Now, suppose if we want to execute a particular command, say to get the details from the
abc table, it can be done with the help of following command:
The result of the above command would be stored in the cursor object. Similarly we can use
fetchall() method to dump the result into a variable we can manipulate.
We can use the following command to get column names data of message table in sms.db:
c.execute(“pragma table_info(message)”)
table_data = c.fetchall()
columns = [x[1] for x in table_data
Observe that here we are using SQLite PRAGMA command which is special command to be
used to control various environmental variables and state flags within SQLite environment. In
the above command, the fetchall() method returns a tuple of results. Each column’s name
is stored in the first index of each tuple.
Now, with the help of following command we can query the table for all of its data and store
it in the variable named data_msg:
The above command will store the data in the variable and further we can also write the above
data in CSV file by using csv.writer() method.
iTunes Backups
iPhone mobile forensics can be performed on the backups made by iTunes. Forensic examiners
rely on analyzing the iPhone logical backups acquired through iTunes. AFC (Apple file
connection) protocol is used by iTunes to take the backup. Besides, the backup process does
not modify anything on the iPhone except the escrow key records.
Now, the question arises that why it is important for a digital forensic expert to understand
the techniques on iTunes backups? It is important in case we get access to the suspect’s
computer instead of iPhone directly because when a computer is used to sync with iPhone,
then most of the information on iPhone is likely to be backed up on the computer.
17
Python Digital Forensics
OS Backup Location
Win7 C:\Users\[username]\AppData\Roaming\AppleComputer\MobileSync\Back
up\
For processing the iTunes backup with Python, we need to first identify all the backups in
backup location as per our operating system. Then we will iterate through each backup and
read the database Manifest.db.
Now, with the help of following Python code we can do the same:
Now, provide two positional arguments namely INPUT_DIR and OUTPUT_DIR which is
representing iTunes backup and desired output folder:
if __name__ == "__main__":
parser.add_argument("INPUT_DIR",help="Location of folder containing iOS
backups, ""e.g. ~\Library\Application Support\MobileSync\Backup folder")
parser.add_argument("OUTPUT_DIR", help="Output Directory")
parser.add_argument("-l", help="Log file path",default=__file__[:-2] + "log")
18
Python Digital Forensics
if args.v:
logger.setLevel(logging.DEBUG)
else:
logger.setLevel(logging.INFO)
The following line of code will create necessary folders for the desired output directory by
using os.makedirs() function:
if not os.path.exists(args.OUTPUT_DIR):
os.makedirs(args.OUTPUT_DIR)
Now, pass the supplied input and output directories to the main() function as follows:
19
Python Digital Forensics
Now, write main() function which will further call backup_summary() function to identify
all the backups present in input folder:
print("Backup Summary")
print("=" * 20)
if len(backups) > 0:
for i, b in enumerate(backups):
print("Backup No.: {} \n""Backup Dev. Name: {} \n""# Files: {}
\n""Backup Size (Bytes): {}\n".format(i, b, backups[b][1], backups[b][2]))
Now, dump the contents of the Manifest.db file to the variable named db_items.
try:
db_items = process_manifest(backups[b][0])
except IOError:
logger.warn("Non-iOS 10 backup encountered or " "invalid backup.
Continuing to next backup.")
continue
20
Python Digital Forensics
Now, let us define a function that will take the directory path of the backup:
def process_manifest(backup):
manifest = os.path.join(backup, "Manifest.db")
if not os.path.exists(manifest):
logger.error("Manifest DB not found in {}".format(manifest))
raise IOError
c = conn.cursor()
items = {}
for row in c.execute("SELECT * from Files;"):
items[row[0]] = [row[2], row[1], row[3]]
return items
create_files(in_dir, out_dir, b, db_items)
print("=" * 20)
else:
logger.warning("No valid backups found. The input directory should be
" "the parent-directory immediately above the SHA-1 hash " "iOS device backups")
sys.exit(2)
21
Python Digital Forensics
try:
copyfile(path, filepath)
except IOError:
logger.debug("File not found in backup: {}".format(path))
files_not_found += 1
if files_not_found > 0:
logger.warning("{} files listed in the Manifest.db not" "found in
backup".format(files_not_found))
copyfile(os.path.join(in_dir, b, "Info.plist"), os.path.join(out_dir, b,
"Info.plist"))
copyfile(os.path.join(in_dir, b, "Manifest.db"), os.path.join(out_dir, b,
"Manifest.db"))
copyfile(os.path.join(in_dir, b, "Manifest.plist"), os.path.join(out_dir, b,
"Manifest.plist"))
copyfile(os.path.join(in_dir, b, "Status.plist"),os.path.join(out_dir, b,
"Status.plist"))
With the above Python script, we can get the updated back up file structure in our output
folder. We can use pycrypto python library to decrypt the backups.
Wi - Fi
Mobile devices can be used to connect to the outside world by connecting through Wi-Fi
networks which are available everywhere. Sometimes the device gets connected to these
open networks automatically.
In case of iPhone, the list of open Wi-Fi connections with which the device has got connected
is stored in a PLIST file named com.apple.wifi.plist. This file will contain the Wi-Fi SSID,
BSSID and connection time.
We need to extract Wi-Fi details from standard Cellebrite XML report using Python. For this,
we need to use API from Wireless Geographic Logging Engine (WIGLE), a popular platform
which can be used for finding the location of a device using the names of Wi-Fi networks.
We can use Python library named requests to access the API from WIGLE. It can be installed
as follows:
22
Python Digital Forensics
Now, provide two positional arguments namely INPUT_FILE and OUTPUT_CSV which will
represent the input file with Wi-Fi MAC address and the desired output CSV file respectively:
if __name__ == "__main__":
parser.add_argument("INPUT_FILE", help="INPUT FILE with MAC Addresses")
parser.add_argument("OUTPUT_CSV", help="Output CSV File")
parser.add_argument("-t", help="Input type: Cellebrite XML report or TXT
file",choices=('xml', 'txt'), default="xml")
parser.add_argument('--api', help="Path to API key
file",default=os.path.expanduser("~/.wigle_api"),
type=argparse.FileType('r'))
args = parser.parse_args()
Now following lines of code will check if the input file exists and is a file. If not, it exits the
script:
api_key = args.api.readline().strip().split(":")
def parse_xml(xml_file):
wifi = {}
xmlns = "{http://pa.cellebrite.com/report/2.0}"
print("[+] Opening {} report".format(xml_file))
xml_tree = ET.parse(xml_file)
print("[+] Parsing report for all connected WiFi addresses")
root = xml_tree.getroot()
Now, we will check that ‘ssid’ string is present in the value’s text or not:
if "SSID" in value.text:
bssid, ssid = value.text.split("\t")
24
Python Digital Forensics
bssid = bssid[7:]
ssid = ssid[6:]
Now, we need to add BSSID, SSID and timestamp to the wifi dictionary as follows:
if bssid in wifi.keys():
wifi[bssid]["Timestamps"].append(ts)
wifi[bssid]["SSID"].append(ssid)
else:
wifi[bssid] = {"Timestamps": [ts], "SSID":
[ssid],"Wigle": {}}
return wifi
The text parser which is much simpler that XML parser is shown below:
def parse_txt(txt_file):
wifi = {}
print("[+] Extracting MAC addresses from {}".format(txt_file))
with open(txt_file) as mac_file:
for line in mac_file:
wifi[line.strip()] = {"Timestamps": ["N/A"], "SSID":
["N/A"],"Wigle": {}}
return wifi
Now, let us use requests module to make WIGLE API calls and need to move on to the
query_wigle() method:
query_url = "https://api.wigle.net/api/v2/network/search?" \
"onlymine=false&freenet=false&paynet=false" \ "&netid={}".format(mac_addr)
req = requests.get(query_url, auth=(api_key[0], api_key[1]))
25
Python Digital Forensics
return req.json()
Actually there is a limit per day for WIGLE API calls, if that limit exceeds then it must show
an error as follows:
try:
if wigle_results["resultCount"] == 0:
wifi_dictionary[mac]["Wigle"]["results"] = []
continue
else:
wifi_dictionary[mac]["Wigle"] = wigle_results
except KeyError:
if wigle_results["error"] == "too many queries today":
print("[-] Wigle daily query limit exceeded")
wifi_dictionary[mac]["Wigle"]["results"] = []
continue
else:
print("[-] Other error encountered for " "address {}: {}".format(mac,
wigle_results['error']))
wifi_dictionary[mac]["Wigle"]["results"] = []
continue
prep_output(out_csv, wifi_dictionary)
Now, we will use prep_output() method to flattens the dictionary into easily writable chunks:
g_map_url = "{}{},{}".format(
google_map, shortres["trilat"],
shortres["trilong"])
Now, we can write the output in CSV file as we have done in earlier scripts in this chapter by
using write_csv() function.
27
5. Python Digital Forensics – Investigating Embedded Python Digital Forensics
Metadata
In this chapter, we will learn in detail about investigating embedded metadata using Python
digital forensics.
Introduction
Embedded metadata is the information about data stored in the same file which is having the
object described by that data. In other words, it is the information about a digital asset stored
in the digital file itself. It is always associated with the file and can never be separated.
In case of digital forensics, we cannot extract all the information about a particular file. On
the other side, embedded metadata can provide us information critical to the investigation.
For example, a text file’s metadata may contain information about the author, its length,
written date and even a short summary about that document. A digital image may include
the metadata such as the length of the image, the shutter speed etc.
You can use the following Python script to extract common attributes or metadata from audio
or MP3 file and a video or a MP4 file.
Note that for this script, we need to install a third party python library named mutagen which
allows us to extract metadata from audio and video files. It can be installed with the help of
the following command:
Some of the useful libraries we need to import for this Python script are as follows:
28
Python Digital Forensics
The command line handler will take one argument which represents the path to the MP3 or
MP4 files. Then, we will use mutagen.file() method to open a handle to the file as follows:
if __name__ == '__main__':
parser = argparse.ArgumentParser('Python Metadata Extractor')
parser.add_argument("AV_FILE", help="File to extract metadata from")
args = parser.parse_args()
av_file = mutagen.File(args.AV_FILE)
file_ext = args.AV_FILE.rsplit('.', 1)[-1]
if file_ext.lower() == 'mp3':
handle_id3(av_file)
elif file_ext.lower() == 'mp4':
handle_mp4(av_file)
Now, we need to use two handles, one to extract the data from MP3 and one to extract data
from MP4 file. We can define these handles as follows:
def handle_id3(id3_file):
id3_frames = {'TIT2': 'Title', 'TPE1': 'Artist', 'TALB': 'Album','TXXX':
'Custom', 'TCON': 'Content Type', 'TDRL': 'Date released','COMM': 'Comments',
'TDRC': 'Recording Date'}
print("{:15} | {:15} | {:38} | {}".format("Frame", "Description","Text",
"Value"))
print("-" * 85)
for frames in id3_file.tags.values():
frame_name = id3_frames.get(frames.FrameID, frames.FrameID)
desc = getattr(frames, 'desc', "N/A")
text = getattr(frames, 'text', ["N/A"])[0]
value = getattr(frames, 'value', "N/A")
if "date" in frame_name.lower():
text = str(text)
print("{:15} | {:15} | {:38} | {}".format(
frame_name, desc, text, value))
def handle_mp4(mp4_file):
cp_sym = u"\u00A9"
qt_tag = {
cp_sym + 'nam': 'Title', cp_sym + 'art': 'Artist',
cp_sym + 'alb': 'Album', cp_sym + 'gen': 'Genre',
29
Python Digital Forensics
The above script will give us additional information about MP3 as well as MP4 files.
Images
Images may contain different kind of metadata depending upon its file format. However, most
of the images embed GPS information. We can extract this GPS information by using third
party Python libraries. You can use the following Python script can be used to do the same:
First, download third party python library named Python Imaging Library (PIL) as follows:
We can also write the GPS details embedded in images to KML file, but for this we need to
download third party Python library named simplekml as follows:
30
Python Digital Forensics
Now, the command line handler will accept one positional argument which basically represents
the file path of the photos.
Now, we need to specify the URLs that will populate the coordinate information. The URLs are
gmaps and open_maps. We also need a function to convert the degree minute seconds
(DMS) tuple coordinate, provided by PIL library, into decimal. It can be done as follows:
gmaps = "https://www.google.com/maps?q={},{}"
open_maps = "http://www.openstreetmap.org/?mlat={}&mlon={}"
def process_coords(coord):
coord_deg = 0
for count, values in enumerate(coord):
coord_deg += (float(values[0]) / values[1]) / 60**count
return coord_deg
Now, we will use image.open() function to open the file as PIL object.
img_file = Image.open(args.PICTURE_FILE)
exif_data = img_file._getexif()
if exif_data is None:
print("No EXIF data found")
sys.exit()
for name, value in exif_data.items():
gps_tag = TAGS.get(name, name)
if gps_tag is not 'GPSInfo':
continue
31
Python Digital Forensics
After finding the GPSInfo tag, we will store the GPS reference and process the coordinates
with the process_coords() method.
kml = simplekml.Kml()
kml.newpoint(name=args.PICTURE_FILE, coords=[(lon, lat)])
kml.save(args.PICTURE_FILE + ".kml")
PDF Documents
PDF documents have a wide variety of media including images, text, forms etc. When we
extract embedded metadata in PDF documents, we may get the resultant data in the format
called Extensible Metadata Platform (XMP). We can extract metadata with the help of the
following Python code:
First, install a third party Python library named PyPDF2 to read metadata stored in XMP
format. It can be installed as follows:
Now, import the following libraries for extracting the metadata from PDF files:
32
Python Digital Forensics
Now, the command line handler will accept one positional argument which basically represents
the file path of the PDF file.
Now we can use getXmpMetadata() method to provide an object containing the available
metadata as follows:
pdf_file = PdfFileReader(args.PDF_FILE)
xmpm = pdf_file.getXmpMetadata()
if xmpm is None:
print("No XMP metadata found in document.")
sys.exit()
We can use custom_print() method to extract and print the relevant values like title,
creator, contributor etc. as follows:
We can also define custom_print() method in case if PDF is created using multiple software
as follows:
33
Python Digital Forensics
We can also extract any other custom property saved by the software as follows:
if xmpm.custom_properties:
print("Custom Properties:")
for k, v in xmpm.custom_properties.items():
print("\t{}: {}".format(k, v))
The above script will read the PDF document and will print the metadata stored in XMP format
including some custom properties stored by the software with the help of which that PDF has
been made.
For this purpose, first install the third party Python library pefile. It can be done as follows:
34
Python Digital Forensics
Once you successfully install this, import the following libraries as follows:
Now, the command line handler will accept one positional argument which basically represents
the file path of the executable file. You can also choose the style of output, whether you need
it in detailed and verbose way or in a simplified manner. For this you need to give an optional
argument as shown below:
Now, we will load the input executable file by using PE class. We will also dump the executable
data to a dictionary object by using dump_dict() method.
pe = PE(args.EXE_FILE)
ped = pe.dump_dict()
We can extract basic file metadata such as embedded authorship, version and compilation
time using the code shown below:
file_info = {}
for structure in pe.FileInfo:
if structure.Key == b'StringFileInfo':
for s_table in structure.StringTable:
for key, value in s_table.entries.items():
if value is None or len(value) == 0:
value = "Unknown"
file_info[key] = value
print("File Information: ")
print("==================")
for k, v in file_info.items():
35
Python Digital Forensics
if isinstance(k, bytes):
k = k.decode()
if isinstance(v, bytes):
v = v.decode()
print("{}: {}".format(k, v))
comp_time = ped['FILE_HEADER']['TimeDateStamp']['Value']
comp_time = comp_time.split("[")[-1].strip("]")
time_stamp, timezone = comp_time.rsplit(" ", 1)
comp_time = datetime.strptime(time_stamp, "%a %b %d %H:%M:%S %Y")
print("Compiled on {} {}".format(comp_time, timezone.strip()))
Now, extract the listing of imports and exports from executable files as shown below:
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
print("\nImports: ")
print("=========")
for dir_entry in pe.DIRECTORY_ENTRY_IMPORT:
dll = dir_entry.dll
if not args.verbose:
print(dll.decode(), end=", ")
continue
name_list = []
for impts in dir_entry.imports:
if getattr(impts, "name", b"Unknown") is None:
name = b"Unknown"
else:
name = getattr(impts, "name", b"Unknown")
36
Python Digital Forensics
name_list.append([name.decode(), hex(impts.address)])
name_fmt = ["{} ({})".format(x[0], x[1]) for x in name_list]
print('- {}: {}'.format(dll.decode(), ", ".join(name_fmt)))
if not args.verbose:
print()
Now, print exports, names and addresses using the code as shown below:
if hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
print("\nExports: ")
print("=========")
for sym in pe.DIRECTORY_ENTRY_EXPORT.symbols:
print('- {}: {}'.format(sym.name.decode(), hex(sym.address)))
The above script will extract the basic metadata, information from headers from windows
executable files.
Note that metadata from 2007 format of word (.docx), excel (.xlsx) and powerpoint (.pptx)
is stored in a XML file. We can process these XML files in Python with the help of following
Python script shown below:
37
Python Digital Forensics
Now, check if the file is a ZIP file. Else, raise an error. Now, open the file and extract the key
elements for processing using the following code:
zipfile.is_zipfile(args.Office_File)
zfile = zipfile.ZipFile(args.Office_File)
core_xml = etree.fromstring(zfile.read('docProps/core.xml'))
app_xml = etree.fromstring(zfile.read('docProps/app.xml'))
core_mapping = {
'title': 'Title',
'subject': 'Subject',
'creator': 'Author(s)',
'keywords': 'Keywords',
'description': 'Description',
'lastModifiedBy': 'Last Modified By',
'modified': 'Modified Date',
'created': 'Created Date',
'category': 'Category',
'contentStatus': 'Status',
'revision': 'Revision'
}
Use iterchildren() method to access each of the tags within the XML file:
38
Python Digital Forensics
Similarly, do this for app.xml file which contains statistical information about the contents of
the document:
app_mapping = {
'TotalTime': 'Edit Time (minutes)',
'Pages': 'Page Count',
'Words': 'Word Count',
'Characters': 'Character Count',
'Lines': 'Line Count',
'Paragraphs': 'Paragraph Count',
'Company': 'Company',
'HyperlinkBase': 'Hyperlink Base',
'Slides': 'Slide count',
'Notes': 'Note Count',
'HiddenSlides': 'Hidden Slide Count',
}
for element in app_xml.getchildren():
for key, title in app_mapping.items():
if key in element.tag:
if 'date' in title.lower():
text = dt.strptime(element.text, "%Y-%m-%dT%H:%M:%SZ")
else:
text = element.text
print("{}: {}".format(title, text))
Now after running the above script, we can get the different details about the particular
document. Note that we can apply this script on Office 2007 or later version documents only.
39
6. Python Digital Forensics – Network Forensics-I Python Digital Forensics
This chapter will explain the fundamentals involved in performing network forensics using
Python.
Use of IEF
Due to its popularity, IEF is used by forensics professionals to a great extent. Some of the
uses of IEF are as follows:
Due to its powerful search capabilities, it is used to search multiple files or data media
simultaneously.
It is also used to recover deleted data from the unallocated space of RAM through new
carving techniques.
If investigators want to rebuild web pages in their original format on the date they
were opened, then they can use IEF.
40
Python Digital Forensics
First, generate IEF result database which will be a SQLite database file ending with .db
extension.
Python Code
Let us see how to use Python code for this purpose:
if __name__ == '__main__':
parser = argparse.ArgumentParser('IEF to CSV')
parser.add_argument("IEF_DATABASE", help="Input IEF database")
parser.add_argument("OUTPUT_DIR", help="Output DIR")
args = parser.parse_args()
if not os.path.exists(args.OUTPUT_DIR):
os.makedirs(args.OUTPUT_DIR)
if os.path.exists(args.IEF_DATABASE) and \
os.path.isfile(args.IEF_DATABASE):
main(args.IEF_DATABASE, args.OUTPUT_DIR)
else:
41
Python Digital Forensics
Now, as we did in earlier scripts, make the connection with SQLite database as follows to
execute the queries through cursor:
The following lines of code will fetch the names of the tables from the database:
Now, we will select all the data from the table and by using fetchall() method on the cursor
object we will store the list of tuples containing the table’s data in its entirety in a variable:
Now, by using CSV_Writer() method we will write the content in CSV file:
42
Python Digital Forensics
The above script will fetch all the data from tables of IEF database and write the contents to
the CSV file of our choice.
The following is the Python script for accessing the cached data information from Yahoo mail,
accessed on Google Chrome, by using IEF database. Note that the steps would be more or
less same as followed in the last Python script.
Now, provide the path to IEF database file along with two positional arguments accepts by
command-line handler as done in the last script:
if __name__ == '__main__':
parser = argparse.ArgumentParser('IEF to CSV')
parser.add_argument("IEF_DATABASE", help="Input IEF database")
parser.add_argument("OUTPUT_DIR", help="Output DIR")
args = parser.parse_args()
directory = os.path.dirname(args.OUTPUT_CSV)
if not os.path.exists(directory):os.makedirs(directory)
if os.path.exists(args.IEF_DATABASE) and \ os.path.isfile(args.IEF_DATABASE):
main(args.IEF_DATABASE, args.OUTPUT_CSV)
else: print("Supplied input file {} does not exist or is not a "
"file".format(args.IEF_DATABASE))
sys.exit(1)
43
Python Digital Forensics
Now, make the connection with SQLite database as follows to execute the queries through
cursor:
You can use the following lines of code to fetch the instances of Yahoo Mail contact cache
record:
print("Querying IEF database for Yahoo Contact Fragments from " "the Chrome
Cache Records Table")
try:
c.execute("select * from 'Chrome Cache Records' where URL like "
"'https://data.mail.yahoo.com" "/classicab/v2/contacts/?format=json%'")
except sqlite3.OperationalError:
print("Received an error querying the database -- database may be"
"corrupt or not have a Chrome Cache Records table")
sys.exit(2)
Now, the list of tuples returned from above query to be saved into a variable as follows:
contact_cache = c.fetchall()
contact_data = process_contacts(contact_cache)
write_csv(contact_data, out_csv)
Note that here we will use two methods namely process_contacts() for setting up the result
list as well as iterating through each contact cache record and json.loads() to store the JSON
data extracted from the table into a variable for further manipulation:
def process_contacts(contact_cache):
print("[+] Processing {} cache files matching Yahoo contact cache " "
data".format(len(contact_cache)))
results = []
for contact in contact_cache:
url = contact[0]
first_visit = contact[1]
last_visit = contact[2]
last_sync = contact[3]
loc = contact[8]
44
Python Digital Forensics
contact_json = json.loads(contact[7].decode())
total_contacts = contact_json["total"]
total_count = contact_json["count"]
if "contacts" not in contact_json:
continue
for c in contact_json["contacts"]:
name, anni, bday, emails, phones, links = ("", "", "", "", "", "")
if "name" in c:
name = c["name"]["givenName"] + " " + \
c["name"]["middleName"] + " " + c["name"]["familyName"]
if "anniversary" in c:
anni = c["anniversary"]["month"] + \"/" +
c["anniversary"]["day"] + "/" + \c["anniversary"]["year"]
if "birthday" in c:
bday = c["birthday"]["month"] + "/" + \c["birthday"]["day"] +
"/" + c["birthday"]["year"]
if "emails" in c:
emails = ', '.join([x["ep"] for x in c["emails"]])
if "phones" in c:
phones = ', '.join([x["ep"] for x in c["phones"]])
if "links" in c:
links = ', '.join([x["ep"] for x in c["links"]])
Now for company, title and notes, the get method is used as shown below:
Now, let us append the list of metadata and extracted data elements to the result list as
follows:
45
Python Digital Forensics
Now, by using CSV_Writer() method, we will write the content in CSV file:
With the help of above script, we can process the cached data from Yahoo mail by using IEF
database.
46
7. Python Digital Forensics – Network Forensics-II Python Digital Forensics
The previous chapter dealt with some of the concepts of network forensics using Python. In
this chapter, let us understand network forensics using Python at a deeper level.
Web page preservation or web archiving is the process of gathering the data from World Wide
Web, ensuring that the data is preserved in an archive and making it available for future
researchers, historians and the public. Before proceeding further into the web page
preservation, let us discuss some important issues related to web page preservation as given
below:
Large Quantity of Resources: Another issue related to web page preservation is the
large quantity of resources which is to be preserved.
Dealing with multimedia data: While preserving web pages we need to deal with
multimedia data also, and these might cause issues while doing so.
Providing access: Besides preserving, the issue of providing access to web resources
and dealing with issues of ownership needs to be solved too.
In this chapter, we are going to use Python library named Beautiful Soup for web page
preservation.
47
Python Digital Forensics
Note that before using it, we must install a third party library using the following command:
Next, using Anaconda package manager, we can install Beautiful Soup as follows:
Note that this script will take two positional arguments, one is URL which is to be preserved
and other is the desired output directory as shown below:
if __name__ == "__main__":
parser = argparse.ArgumentParser('Web Page preservation')
parser.add_argument("DOMAIN", help="Website Domain")
parser.add_argument("OUTPUT_DIR", help="Preservation Output Directory")
parser.add_argument("-l", help="Log file path",
default=__file__[:-3] + ".log")
args = parser.parse_args()
48
Python Digital Forensics
Now, setup the logging for the script by specifying a file and stream handler for being in loop
and document the acquisition process as shown:
logger.setLevel(logging.DEBUG)
msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-10s""%(levelname)-8s
%(message)s")
strhndl = logging.StreamHandler(sys.stderr)
strhndl.setFormatter(fmt=msg_fmt)
fhndl = logging.FileHandler(args.l, mode='a')
fhndl.setFormatter(fmt=msg_fmt)
logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting BS Preservation")
logger.debug("Supplied arguments: {}".format(sys.argv[1:]))
logger.debug("System " + sys.platform)
logger.debug("Version " + sys.version)
Now, let us do the input validation on the desired output directory as follows:
if not os.path.exists(args.OUTPUT_DIR):
os.makedirs(args.OUTPUT_DIR)
main(args.DOMAIN, args.OUTPUT_DIR)
Now, we will define the main() function which will extract the base name of the website by
removing the unnecessary elements before the actual name along with additional validation
on the input URL as follows:
49
Python Digital Forensics
Now, we need to open a connection with the URL by using urlopen() method. Let us use try-
except block as follows:
try:
index = urlopen(website, context=context).read().decode("utf-8")
except urllib.error.HTTPError as e:
logger.error("Exiting preservation - unable to access page:
{}".format(website))
sys.exit(2)
logger.debug("Successfully accessed {}".format(website))
recurse_pages() function to iterate through and discover all links on the web page.
We need to log some details about the web page and then we log the hash of the data by
using hash_data() method as follows:
Now, define hash_data() method with the help of which we read the UTF-8 encoded data
and then generate the SHA-256 hash of it as follows:
def hash_data(data):
sha256 = hashlib.sha256()
sha256.update(data.encode("utf-8"))
return sha256.hexdigest()
def hash_file(file):
sha256 = hashlib.sha256()
with open(file, "rb") as in_file:
sha256.update(in_file.read())
return sha256.hexdigest()
Now, let us create a Beautifulsoup object out of the web page data under find_links()
method as follows:
Now, we need to define recurse_pages() method by providing it the inputs of the website
URL, current link queue, the unverified SSL context and the output directory as follows:
processed.append(link)
try:
page = urlopen(link, context=context).read().decode("utf-8")
except urllib.error.HTTPError as e:
msg = "Error accessing webpage: {}".format(link)
logger.error(msg)
continue
Now, write the output of each web page accessed in a file by passing the link name, page
data, output directory and the counter as follows:
Now, when we run this script by providing the URL of the website, the output directory and a
path to the log file, we will get the details about that web page that can be used for future
use.
Virus Hunting
Have you ever wondered how forensic analysts, security researchers, and incident
respondents can understand the difference between useful software and malware? The
answer lies in the question itself, because without studying about the malware, rapidly
generating by hackers, it is quite impossible for researchers and specialists to tell the
difference between useful software and malware. In this section, let us discuss about
VirusShare, a tool to accomplish this task.
Understanding VirusShare
VirusShare is the largest privately owned collection of malware samples to provide security
researchers, incident responders, and forensic analysts the samples of live malicious code. It
contains over 30 million samples.
The benefit of VirusShare is the list of malware hashes that is freely available. Anybody can
use these hashes to create a very comprehensive hash set and use that to identify potentially
malicious files. But before using VirusShare, we suggest you to visit https://virussshare.com
for more details.
52
Python Digital Forensics
For this script, we need a third party Python library tqdm which can be downloaded as follows:
Note that in this script, first we will read the VirusShare hashes page and dynamically identify
the most recent hash list. Then we will initialize the progress bar and download the hash list
in the desired range.
This script will take one positional argument, which would be the desired path for the hash
set:
if __name__ == '__main__':
parser = argparse.ArgumentParser('Hash set from VirusShare')
parser.add_argument("OUTPUT_HASH", help="Output Hashset")
parser.add_argument("--start", type=int,
help="Optional starting location")
args = parser.parse_args()
directory = os.path.dirname(args.OUTPUT_HASH)
if not os.path.exists(directory):
os.makedirs(directory)
if args.start:
main(args.OUTPUT_HASH, start=args.start)
53
Python Digital Forensics
else:
main(args.OUTPUT_HASH)
Now we need to define main() function with **kwargs as an argument because this will
create a dictionary we can refer to support supplied key arguments as shown below:
try:
index = urlopen(url, context=context).read().decode("utf-8")
except urllib.error.HTTPError as e:
print("[-] Error accessing webpage - exiting..")
sys.exit(1)
Now, identify latest hash list from downloaded pages. You can do this by finding the last
instance of the HTML href tag to VirusShare hash list. It can be done with the following lines
of code:
54
Python Digital Forensics
Now, we will use tqdm.trange() method to create a loop and progress bar as follows:
After performing the above steps succefully, we will open the hash set text file in a+ mode
to append to the bottom of text file.
After running the above script, you will get the latest hash list containing MD5 hash values in
text format.
55
8. Python Digital Forensics – Investigation using Python Digital Forensics
Emails
The previous chapters discussed about the importance and the process of network forensics
and the concepts involved. In this chapter, let us learn about the role of emails in digital
forensics and their investigation using Python.
The negative side of emails is that criminals may leak important information about their
company. Hence, the role of emails in digital forensics has been increased in recent years. In
digital forensics, emails are considered as crucial evidences and Email Header Analysis has
become important to collect evidence during forensic process.
Fake Emails
The biggest challenge in email forensics is the use of fake e-mails that are created by
manipulating and scripting headers etc. In this category criminals also use temporary email
which is a service that allows a registered user to receive email at a temporary address that
expires after a certain time period.
Spoofing
Another challenge in email forensics is spoofing in which criminals used to present an email
as someone else’s. In this case the machine will receive both fake as well as original IP
address.
56
Python Digital Forensics
Anonymous Re-emailing
Here, the Email server strips identifying information from the email message before
forwarding it further. This leads to another big challenge for email investigations.
Some of the common techniques which can be used for email forensic investigation are:
Header Analysis
Server investigation
In the following sections, we are going to learn how to fetch information using Python for the
purpose of email investigation.
An EML file stores email headers, body content, attachment data as plain text. It uses base64
to encode binary data and Quoted-Printable (QP) encoding to store content information. The
Python script that can be used to extract information from EML file is given below:
57
Python Digital Forensics
In the above libraries, quopri is used to decode the QP encoded values from EML files. Any
base64 encoded data can be decoded with the help of base64 library.
Next, let us provide argument for command-line handler. Note that here it will accept only
one argument which would be the path to EML file as shown below:
if __name__ == '__main__':
parser = ArgumentParser('Extracting information from EML file')
parser.add_argument("EML_FILE",help="Path to EML File", type=FileType('r'))
args = parser.parse_args()
main(args.EML_FILE)
Now, we need to define main() function in which we will use the method named
message_from_file() from email library to read the file like object. Here we will access the
headers, body content, attachments and other payload information by using resulting variable
named emlfile as shown in the code given below:
def main(input_file):
emlfile = message_from_file(input_file)
for key, value in emlfile._headers:
print("{}: {}".format(key, value))
print("\nBody\n")
if emlfile.is_multipart():
for part in emlfile.get_payload():
process_payload(part)
else:
process_payload(emlfile[1])
Now, we need to define process_payload() method in which we will extract message body
content by using get_payload() method. We will decode QP encoded data by using
quopri.decodestring() function. We will also check the content MIME type so that it can
handle the storage of the email properly. Observe the code given below:
def process_payload(payload):
print(payload.get_content_type() + "\n" + "=" *
len(payload.get_content_type()))
body = quopri.decodestring(payload.get_payload())
if payload.get_charset():
body = body.decode(payload.get_charset())
else:
try:
58
Python Digital Forensics
body = body.decode()
except UnicodeDecodeError:
body = body.decode('cp1252')
if payload.get_content_type() == "text/html":
outfile = os.path.basename(args.EML_FILE.name) + ".html"
open(outfile, 'w').write(body)
elif payload.get_content_type().startswith('application'):
outfile = open(payload.get_filename(), 'wb')
body = base64.b64decode(payload.get_payload())
outfile.write(body)
outfile.close()
print("Exported: {}\n".format(outfile.name))
else:
print(body)
After executing the above script, we will get the header information along with various
payloads on the console.
In this section, we will learn how to extract information from MSG file using Outlook API. Note
that the following Python script will work only on Windows. For this, we need to install third
party Python library named pywin32 as follows:
59
Python Digital Forensics
Now, let us provide an argument for command-line handler. Here it will accept two arguments
one would be the path to MSG file and other would be the desired output folder as follows:
if __name__ == '__main__':
parser = ArgumentParser(‘Extracting information from MSG file’)
parser.add_argument("MSG_FILE", help="Path to MSG file")
parser.add_argument("OUTPUT_DIR", help="Path to output folder")
args = parser.parse_args()
out_dir = args.OUTPUT_DIR
if not os.path.exists(out_dir):
os.makedirs(out_dir)
main(args.MSG_FILE, args.OUTPUT_DIR)
Now, we need to define main() function in which we will call win32com library for setting
up Outlook API which further allows access to the MAPI namespace.
Now, define different functions which we are using in this script. The code given below shows
defining the display_msg_attribs() function that allow us to display various attributes of a
message like subject, to , BCC, CC, Size, SenderName, sent, etc.
def display_msg_attribs(msg):
attribs = [
'Application', 'AutoForwarded', 'BCC', 'CC', 'Class',
'ConversationID', 'ConversationTopic', 'CreationTime',
'ExpiryTime', 'Importance', 'InternetCodePage', 'IsMarkedAsTask',
'LastModificationTime', 'Links','ReceivedTime', 'ReminderSet',
'ReminderTime', 'ReplyRecipientNames', 'Saved', 'Sender',
'SenderEmailAddress', 'SenderEmailType', 'SenderName', 'Sent',
'SentOn', 'SentOnBehalfOfName', 'Size', 'Subject',
'TaskCompletedDate', 'TaskDueDate', 'To', 'UnRead'
60
Python Digital Forensics
]
print("\nMessage Attributes")
for entry in attribs:
print("{}: {}".format(entry, getattr(msg, entry, 'N/A')))
Now, define the display_msg_recipeints() function that iterates through the messages and
displays the recipient details.
def display_msg_recipients(msg):
recipient_attrib = [
'Address', 'AutoResponse', 'Name', 'Resolved', 'Sendable'
]
i = 1
while True:
try:
recipient = msg.Recipients(i)
except pywintypes.com_error:
break
print("\nRecipient {}".format(i))
print("=" * 15)
for entry in recipient_attrib:
print("{}: {}".format(entry, getattr(recipient, entry, 'N/A')))
i += 1
Next, we define extract_msg_body() function that extracts the body content, HTML as well
as Plain text, from the message.
61
Python Digital Forensics
Next, we shall define the extract_attachments() function that exports attachment data into
desired output directory.
Once all the functions are defined, we will print all the attributes to the console with the
following line of codes:
print("\nAttachment {}".format(i))
print("=" * 15)
for entry in attachment_attribs:
print('{}: {}'.format(entry, getattr(attachment, entry,"N/A")))
outfile =
os.path.join(os.path.abspath(out_dir),os.path.split(args.MSG_FILE)[-1])
if not os.path.exists(outfile):
os.makedirs(outfile)
outfile = os.path.join(outfile, attachment.FileName)
attachment.SaveAsFile(outfile)
print("Exported: {}".format(outfile))
i += 1
After running the above script, we will get the attributes of message and its attachments in
the console window along with several files in the output directory.
62
Python Digital Forensics
In this section, you will see a Python script, where we will be structuring MBOX files got from
Google Takeouts. But before that we must know that how we can generate these MBOX files
by using our Google account or Gmail account.
Python Code
Now, the MBOX file discussed above can be structured using Python as shown below:
All the libraries have been used and explained in earlier scripts, except the mailbox library
which is used to parse MBOX files.
63
Python Digital Forensics
Now, provide an argument for command-line handler. Here it will accept two arguments: one
would be the path to MBOX file, and the other would be the desired output folder.
if __name__ == '__main__':
parser = ArgumentParser('Parsing MBOX files')
parser.add_argument("MBOX", help="Path to mbox file")
parser.add_argument("OUTPUT_DIR",help="Path to output directory to write
report ""and exported content")
args = parser.parse_args()
main(args.MBOX, args.OUTPUT_DIR)
Now, will define main() function and call mbox class of mailbox library with the help of which
we can parse a MBOX file by providing its path:
def custom_reader(data_stream):
data = data_stream.read()
try:
content = data.decode("ascii")
except (UnicodeDecodeError, UnicodeEncodeError) as e:
content = data.decode("cp1252", errors="replace")
return mailbox.mboxMessage(content)
parsed_data = []
attachments_dir = os.path.join(output_dir, "attachments")
if not os.path.exists(attachments_dir):
os.makedirs(attachments_dir)
columns = ["Date", "From", "To", "Subject", "X-Gmail-Labels", "Return-Path",
"Received", "Content-Type", "Message-ID","X-GM-THRID", "num_attachments_exported",
"export_path"]
64
Python Digital Forensics
Next, use tqdm to generate a progress bar and to track the iteration process as follows:
Now, check weather message is having payloads or not. If it is having then we will define
write_payload() method as follows:
if len(message.get_payload()):
export_path = write_payload(message, attachments_dir)
msg_data['num_attachments_exported'] = len(export_path)
msg_data['export_path'] = ", ".join(export_path)
Now, data need to be appended. Then we will call create_report() method as follows:
parsed_data.append(msg_data)
create_report(
parsed_data, os.path.join(output_dir, "mbox_report.csv"), columns)
def write_payload(msg, out_dir):
pyld = msg.get_payload()
export_path = []
if msg.is_multipart():
for entry in pyld:
export_path += write_payload(entry, out_dir)
else:
content_type = msg.get_content_type()
if "application/" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "image/" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
65
Python Digital Forensics
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "audio/" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "text/csv" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "info/" in content_type.lower():
export_path.append(export_content(msg, out_dir,
msg.get_payload()))
elif "text/calendar" in content_type.lower():
export_path.append(export_content(msg, out_dir,
msg.get_payload()))
elif "text/rtf" in content_type.lower():
export_path.append(export_content(msg, out_dir,
msg.get_payload()))
else:
if "name=" in msg.get('Content-Disposition', "N/A"):
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "name=" in msg.get('Content-Type', "N/A"):
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
return export_path
Observe that the above if-else statements are easy to understand. Now, we need to define a
method that will extract the filename from the msg object as follows:
66
Python Digital Forensics
Now, with the help of following lines of code, you can actually export the file:
if isinstance(content_data, str):
open(file_name, 'w').write(content_data)
else:
open(file_name, 'wb').write(content_data)
return file_name
Now, let us define a function to extract filenames from the message to accurately represent
the names of these files as follows:
def get_filename(msg):
if 'name=' in msg.get("Content-Disposition", "N/A"):
fname_data = msg["Content-Disposition"].replace("\r\n", " ")
fname = [x for x in fname_data.split("; ") if 'name=' in x]
file_name = fname[0].split("=", 1)[-1]
elif 'name=' in msg.get("Content-Type", "N/A"):
fname_data = msg["Content-Type"].replace("\r\n", " ")
fname = [x for x in fname_data.split("; ") if 'name=' in x]
file_name = fname[0].split("=", 1)[-1]
else:
file_name = "NO_FILENAME"
fchars = [x for x in file_name if x.isalnum() or x.isspace() or x == "."]
return "".join(fchars)
Now, we can write a CSV file by defining the create_report() function as follows:
Once you run the script given above, we will get the CSV report and directory full of
attachments.
67
9. Python Digital Forensics – Important Artifacts in Python Digital Forensics
Windows-I
This chapter will explain various concepts involved in Microsoft Windows forensics and the
important artifacts that an investigator can obtain from the investigation process.
Introduction
Artifacts are the objects or areas within a computer system that have important information
related to the activities performed by the computer user. The type and location of this
information depends upon the operating system. During forensic analysis, these artifacts play
a very important role in approving or disapproving the investigator’s observation.
Around 90% of the traffic in world comes from the computers using Windows as their
operating system. That is why for digital forensics examiners Windows artifacts are
very essentials.
The Windows operating system stores different types of evidences related to the user
activity on computer system. This is another reason which shows the importance of
Windows artifacts for digital forensics.
Many times the investigator revolves the investigation around old and traditional areas
like user crated data. Windows artifacts can lead the investigation towards non-
traditional areas like system created data or the artifacts.
Great abundance of artifacts is provided by Windows which are helpful for investigators
as well as for companies and individuals performing informal investigations.
Increase in cyber-crime in recent years is another reason that Windows artifacts are
important.
Recycle Bin
It is one of the important Windows artifacts for forensic investigation. Windows recycle bin
contains the files that have been deleted by the user, but not physically removed by the
system yet. Even if the user completely removes the file from system, it serves as an
important source of investigation. This is because the examiner can extract valuable
information, like original file path as well as time that it was sent to Recycle Bin, from the
deleted files.
68
Python Digital Forensics
Note that the storage of Recycle Bin evidence depends upon the version of Windows. In the
following Python script, we are going to deal with Windows 7 where it creates two files: $R
file that contains the actual content of the recycled file and $I file that contains original file
name, path, file size when file was deleted.
For Python script we need to install third party modules namely pytsk3, pyewf and
unicodecsv. We can use pip to install them. We can follow the following steps to extract
information from Recycle Bin:
First, we need to use recursive method to scan through the $Recycle.bin folder and
select all the files starting with $I.
Next, we will read the contents of the files and parse the available metadata structures.
Now, we will search for the associated $R file.
At last, we will write the results into CSV file for review.
Next, we need to provide argument for command-line handler. Note that here it will accept
three arguments – first is the path to evidence file, second is the type of evidence file and
third is the desired output path to the CSV report, as shown below:
if __name__ == '__main__':
parser = argparse.ArgumentParser('Recycle Bin evidences')
parser.add_argument('EVIDENCE_FILE', help="Path to evidence file")
parser.add_argument('IMAGE_TYPE', help="Evidence file format",
choices=('ewf', 'raw'))
parser.add_argument('CSV_REPORT', help="Path to CSV report")
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.CSV_REPORT)
69
Python Digital Forensics
Now, define the main() function that will handle all the processing. It will search for $I file
as follows:
Now, if we found $I file, then it must be sent to process_dollar_i() function which will
accept the tsk_util object as well as the list of $I files, as shown below:
recycle_file_path =
os.path.join('/$Recycle.bin',dollar_i[1].rsplit("/", 1)[0][1:])
dollar_r_files = tsk_util.recurse_files("$R" +
dollar_i[0][2:],path=recycle_file_path, logic="startswith")
if dollar_r_files is None:
dollar_r_dir = os.path.join(recycle_file_path,"$R" +
dollar_i[0][2:])
dollar_r_dirs = tsk_util.query_directory(dollar_r_dir)
if dollar_r_dirs is None:
file_attribs['dollar_r_file'] = "Not Found"
file_attribs['is_directory'] = 'Unknown'
70
Python Digital Forensics
else:
file_attribs['dollar_r_file'] = dollar_r_dir
file_attribs['is_directory'] = True
else:
dollar_r = [os.path.join(recycle_file_path, r[1][1:])for r in
dollar_r_files]
file_attribs['dollar_r_file'] = ";".join(dollar_r)
file_attribs['is_directory'] = False
processed_files.append(file_attribs)
return processed_files
Now, define read_dollar_i() method to read the $I files, in other words, parse the metadata.
We will use read_random() method to read the signature’s first eight bytes. This will return
none if signature does not match. After that, we will have to read and unpack the values from
$I file if that is a valid file.
def read_dollar_i(file_obj):
if file_obj.read_random(0, 8) != '\x01\x00\x00\x00\x00\x00\x00\x00':
return None
raw_file_size = struct.unpack('<q', file_obj.read_random(8, 8))
raw_deleted_time = struct.unpack('<q', file_obj.read_random(16, 8))
raw_file_path = file_obj.read_random(24, 520)
Now, after extracting these files we need to interpret the integers into human-readable values
by using sizeof_fmt() function as shown below:
file_size = sizeof_fmt(raw_file_size[0])
deleted_time = parse_windows_filetime(raw_deleted_time[0])
file_path = raw_file_path.decode("utf16").strip("\x00")
return {'file_size': file_size, 'file_path': file_path,'deleted_time':
deleted_time}
Now, define a function for interpreted integers into formatted date and time as follows:
def parse_windows_filetime(date_value):
microseconds = float(date_value) / 10
ts = datetime.datetime(1601, 1, 1) + datetime.timedelta(
microseconds=microseconds)
return ts.strftime('%Y-%m-%d %H:%M:%S.%f')
Now, we will define write_csv() method to write the processed results into a CSV file as
follows:
When you run the above script, we will get the data from $I and $R file.
Sticky Notes
Windows Sticky Notes replaces the real world habit of writing with pen and paper. These notes
used to float on the desktop with different options for colors, fonts etc. In Windows 7 the
Sticky Notes file is stored as an OLE file hence in the following Python script we will investigate
this OLE file to extract metadata from Sticky Notes.
For this Python script, we need to install third party modules namely olefile, pytsk3, pyewf
and unicodecsv. We can use the command pip to install them.
We can follow the steps discussed below for extracting the information from Sticky note file
namely StickyNote.snt:
Firstly, open the evidence file and find all the StickyNote.snt files.
Then, parse the metadata and content from the OLE stream and write the RTF content
to files.
Lastly, create CSV report of this metadata.
72
Python Digital Forensics
Python Code
Let us see how to use Python code for this purpose:
Next, define a global variable which will be used across this script:
Next, we need to provide argument for command-line handler. Note that here it will accept
three arguments – first is the path to evidence file, second is the type of evidence file and
third is the desired output path as follows:
if __name__ == '__main__':
parser = argparse.ArgumentParser('Evidence from Sticky Notes')
parser.add_argument('EVIDENCE_FILE', help="Path to evidence file")
parser.add_argument('IMAGE_TYPE', help="Evidence file format",choices=('ewf',
'raw'))
parser.add_argument('REPORT_FOLDER', help="Path to report folder")
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.REPORT_FOLDER)
Now, we will define main() function which will be similar to the previous script as shown
below:
Now, let us iterate through the resulting files. Then we will call parse_snt_file() function to
process the file and then we will write RTF file with the write_note_rtf() method as follows:
73
Python Digital Forensics
report_details = []
for note_file in note_files:
user_dir = note_file[1].split("/")[1]
file_like_obj = create_file_like_obj(note_file[2])
note_data = parse_snt_file(file_like_obj)
if note_data is None:
continue
write_note_rtf(note_data, os.path.join(report_folder, user_dir))
report_details += prep_note_report(note_data, REPORT_COLS,"/Users" +
note_file[1])
write_csv(os.path.join(report_folder, 'sticky_notes.csv'),
REPORT_COLS,report_details)
First of all we will define create_file_like_obj() function for reading the size of the file by
taking pytsk file object. Then we will define parse_snt_file() function that will accept the
file-like object as its input and is used to read and interpret the sticky note file.
def parse_snt_file(snt_file):
if not olefile.isOleFile(snt_file):
print("This is not an OLE file")
return None
ole = olefile.OleFileIO(snt_file)
note = {}
for stream in ole.listdir():
if stream[0].count("-") == 3:
if stream[0] not in note:
note[stream[0]] = {"created":
ole.getctime(stream[0]),"modified": ole.getmtime(stream[0])}
content = None
if stream[1] == '0':
content = ole.openstream(stream).read()
elif stream[1] == '3':
content = ole.openstream(stream).read().decode("utf-16")
if content:
note[stream[0]][stream[1]] = content
74
Python Digital Forensics
return note
Now, we will translate the nested dictionary into a flat list of dictionaries that are more
appropriate for a CSV spreadsheet. It will be done by defining prep_note_report() function.
Lastly, we will define write_csv() function.
After running the above script, we will get the metadata from Sticky Notes file.
Registry Files
75
Python Digital Forensics
Windows registry files contain many important details which are like a treasure trove of
information for a forensic analyst. It is a hierarchical database that contains details related to
operating system configuration, user activity, software installation etc. In the following Python
script we are going to access common baseline information from the SYSTEM and
SOFTWARE hives.
For this Python script, we need to install third party modules namely pytsk3, pyewf and
registry. We can use pip to install them.
We can follow the steps given below for extracting the information from Windows registry:
Python Code
Let us see how to use Python code for this purpose:
Now, provide argument for the command-line handler. Here it will accept two arguments -
first is the path to the evidence file, second is the type of evidence file, as shown below:
if __name__ == '__main__':
parser = argparse.ArgumentParser('Evidence from Windows Registry')
parser.add_argument('EVIDENCE_FILE', help="Path to evidence file")
parser.add_argument('IMAGE_TYPE', help="Evidence file format",
choices=('ewf', 'raw'))
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE)
Now we will define main() function for searching SYSTEM and SOFTWARE hives within
/Windows/System32/config folder as follows:
76
Python Digital Forensics
Now, define the function for opening the registry file. For this purpose, we need to gather the
size of file from pytsk metadata as follows:
def open_file_as_reg(reg_file):
file_size = reg_file.info.meta.size
file_content = reg_file.read_random(0, file_size)
file_like_obj = StringIO.StringIO(file_content)
return Registry.Registry(file_like_obj)
Now, with the help of following method, we can process SYSTEM hive:
def process_system_hive(hive):
root = hive.root()
current_control_set = root.find_key("Select").value("Current").value()
control_set = root.find_key("ControlSet{:03d}".format(current_control_set))
raw_shutdown_time = struct.unpack('<Q',
control_set.find_key("Control").find_key("Windows").value("ShutdownTime").value())
shutdown_time = parse_windows_filetime(raw_shutdown_time[0])
print("Last Shutdown Time: {}".format(shutdown_time))
time_zone =
control_set.find_key("Control").find_key("TimeZoneInformation").value("TimeZoneKey
Name").value()
print("Machine Time Zone: {}".format(time_zone))
computer_name =
control_set.find_key("Control").find_key("ComputerName").find_key("ComputerName").
value("ComputerName").value()
print("Machine Name: {}".format(computer_name))
77
Python Digital Forensics
last_access =
control_set.find_key("Control").find_key("FileSystem").value("NtfsDisableLastAcces
sUpdate").value()
last_access = "Disabled" if last_access == 1 else "enabled"
print("Last Access Updates: {}".format(last_access))
Now, we need to define a function for interpreted integers into formatted date and time as
follows:
def parse_windows_filetime(date_value):
microseconds = float(date_value) / 10
ts = datetime.datetime(1601, 1, 1) + datetime.timedelta(
microseconds=microseconds)
return ts.strftime('%Y-%m-%d %H:%M:%S.%f')
def parse_unix_epoch(date_value):
ts = datetime.datetime.fromtimestamp(date_value)
return ts.strftime('%Y-%m-%d %H:%M:%S.%f')
Now with the help of following method we can process SOFTWARE hive:
def process_software_hive(hive):
root = hive.root()
nt_curr_ver = root.find_key("Microsoft").find_key("Windows
NT").find_key("CurrentVersion")
print("Product name: {}".format(nt_curr_ver.value("ProductName").value()))
print("CSD Version: {}".format(nt_curr_ver.value("CSDVersion").value()))
print("Current Build: {}".format(nt_curr_ver.value("CurrentBuild").value()))
print("Registered Owner:
{}".format(nt_curr_ver.value("RegisteredOwner").value()))
print("Registered Org:
{}".format(nt_curr_ver.value("RegisteredOrganization").value()))
raw_install_date = nt_curr_ver.value("InstallDate").value()
install_date = parse_unix_epoch(raw_install_date)
print("Installation Date: {}".format(install_date))
After running the above script, we will get the metadata stored in Windows Registry files.
78
10. Python Digital Forensics – Important Artifacts in Python Digital Forensics
Windows-II
This chapter talks about some more important artifacts in Windows and their extraction
method using Python.
User Activities
Windows having NTUSER.DAT file for storing various user activities. Every user profile is
having hive like NTUSER.DAT, which stores the information and configurations related to
that user specifically. Hence, it is highly useful for the purpose of investigation by forensic
analysts.
The following Python script will parse some of the keys of NTUSER.DAT for exploring the
actions of a user on the system. Before proceeding further, for Python script, we need to
install third party modules namely Registry, pytsk3, pyewf and Jinja2. We can use pip to
install them.
We can follow the following steps to extract information from NTUSER.DAT file:
Then parse the WordWheelQuery, TypePath and RunMRU key for each
NTUSER.DAT file.
At last we will write these artifacts, already processed, to an HTML report by using
Jinja2 module.
Python Code
Let us see how to use Python code for this purpose:
79
Python Digital Forensics
Now, provide argument for command-line handler. Here it will accept three arguments - first
is the path to evidence file, second is the type of evidence file and third is the desired output
path to the HTML report, as shown below:
if __name__ == '__main__':
parser = argparse.ArgumentParser('Information from user activities')
parser.add_argument('EVIDENCE_FILE',help="Path to evidence file")
parser.add_argument('IMAGE_TYPE',help="Evidence file format",choices=('ewf',
'raw'))
parser.add_argument('REPORT',help="Path to report file")
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.REPORT)
Now, let us define main() function for searching all NTUSER.DAT files, as shown:
Now, we will try to find the key in NTUSER.DAT file and once you find it, define the user
processing functions as shown below:
open_ntuser = open_file_as_reg(ntuser[2])
try:
explorer_key =
open_ntuser.root().find_key("Software").find_key("Microsoft").find_key("Windows").
find_key("CurrentVersion").find_key("Explorer")
except Registry.RegistryKeyNotFoundException:
continue
nt_rec['wordwheel']['data'] += parse_wordwheel(explorer_key, uname)
nt_rec['typed_path']['data'] += parse_typed_paths(explorer_key, uname)
80
Python Digital Forensics
nt_rec['wordwheel']['headers'] = \
nt_rec['wordwheel']['data'][0].keys()
nt_rec['typed_path']['headers'] = \
nt_rec['typed_path']['data'][0].keys()
nt_rec['run_mru']['headers'] = \
nt_rec['run_mru']['data'][0].keys()
Now, pass the dictionary object and its path to write_html() method as follows:
write_html(report, nt_rec)
Now, define a method, that takes pytsk file handle and read it into the Registry class via the
StringIO class.
def open_file_as_reg(reg_file):
file_size = reg_file.info.meta.size
file_content = reg_file.read_random(0, file_size)
file_like_obj = StringIO.StringIO(file_content)
return Registry.Registry(file_like_obj)
Now, we will define the function that will parse and handles WordWheelQuery key from
NTUSER.DAT file as follows:
81
Python Digital Forensics
search_list = []
for count, val in enumerate(mru_order):
ts = "N/A"
if count == 0:
ts = wwq.timestamp()
search_list.append({
'timestamp': ts,
'username': username,
'order': count,
'value_name': str(val),
'search': wwq.value(str(val)).value().decode("UTF-
16").strip("\x00")
})
return search_list
Now, we will define the function that will parse and handles TypedPaths key from
NTUSER.DAT file as follows:
82
Python Digital Forensics
Now, we will define the function that will parse and handles RunMRU key from NTUSER.DAT
file as follows:
Now, the following function will handle the creation of HTML report:
83
Python Digital Forensics
At last we can write HTML document for report. After running the above script, we will get the
information from NTUSER.DAT file in HTML document format.
LINK files
Shortcuts files are created when a user or the operating system creates shortcut files for the
files which are frequently used, double clicked or accessed from system drives such as
attached storage. Such kinds of shortcut files are called link files. By accessing these link files,
an investigator can find the activity of window such as the time and location from where these
files have been accessed.
Let us discuss the Python script that we can use to get the information from these Windows
LINK files.
For Python script, install third party modules namely pylnk, pytsk3, pyewf. We can follow
the following steps to extract information from lnk files:
Python Code
Let us see how to use Python code for this purpose:
Now, provide the argument for command-line handler. Here it will accept three arguments –
first is the path to evidence file, second is the type of evidence file and third is the desired
output path to the CSV report, as shown below:
if __name__ == '__main__':
parser = argparse.ArgumentParser('Parsing LNK files')
parser.add_argument('EVIDENCE_FILE', help="Path to evidence file")
84
Python Digital Forensics
Now, interpret the evidence file by creating an object of TSKUtil and iterate through the file
system to find files ending with lnk. It can be done by defining main() function as follows:
Now with the help of following code, we will iterate through lnk files by creating a function as
follows:
parsed_lnks = []
for entry in lnk_files:
lnk = open_file_as_lnk(entry[2])
lnk_data = {'lnk_path': entry[1], 'lnk_name': entry[0]}
for col in columns:
lnk_data[col] = getattr(lnk, col, "N/A")
lnk.close()
parsed_lnks.append(lnk_data)
write_csv(report, columns + ['lnk_path', 'lnk_name'], parsed_lnks)
85
Python Digital Forensics
Now we need to define two functions, one will open the pytsk file object and other will be
used for writing CSV report as shown below:
def open_file_as_lnk(lnk_file):
file_size = lnk_file.info.meta.size
file_content = lnk_file.read_random(0, file_size)
file_like_obj = StringIO.StringIO(file_content)
lnk = pylnk.file()
lnk.open_file_object(file_like_obj)
return lnk
def write_csv(outfile, fieldnames, data):
with open(outfile, 'wb') as open_outfile:
csvfile = csv.DictWriter(open_outfile, fieldnames)
csvfile.writeheader()
csvfile.writerows(data)
After running the above script, we will get the information from the discovered lnk files in a
CSV report.
Prefetch Files
Whenever an application is running for the first time from a specific location, Windows creates
prefetch files. These are used to speed up the application startup process. The extension for
these files is .PF and these are stored in the ”\Root\Windows\Prefetch” folder.
Digital forensic experts can reveal the evidence of program execution from a specified location
along with the details of the user. Prefetch files are useful artifacts for the examiner because
their entry remains even after the program has been deleted or un-installed.
Let us discuss the Python script that will fetch information from Windows prefetch files as
given below:
For Python script, install third party modules namely pylnk, pytsk3 and unicodecsv. Recall
that we have already worked with these libraries in the Python scripts that we have discussed
in the previous chapters.
We have to follow steps given below to extract information from prefetch files:
86
Python Digital Forensics
Python Code
Let us see how to use Python code for this purpose:
Now, provide an argument for command-line handler. Here it will accept two arguments, first
would be the path to evidence file and second would be the type of evidence file. It also
accepts an optional argument for specifying the path to scan for prefetch files:
if __name__ == "__main__":
parser = argparse.ArgumentParser('Parsing Prefetch files')
parser.add_argument("EVIDENCE_FILE", help="Evidence file path")
parser.add_argument("TYPE", help="Type of Evidence",choices=("raw", "ewf"))
parser.add_argument("OUTPUT_CSV", help="Path to write output csv")
parser.add_argument("-d", help="Prefetch directory to
scan",default="/WINDOWS/PREFETCH")
args = parser.parse_args()
if os.path.exists(args.EVIDENCE_FILE) and \
os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.TYPE, args.OUTPUT_CSV, args.d)
else:
print("[-] Supplied input file {} does not exist or is not a
""file".format(args.EVIDENCE_FILE))
sys.exit(1)
Now, interpret the evidence file by creating an object of TSKUtil and iterate through the file
system to find files ending with .pf. It can be done by defining main() function as follows:
87
Python Digital Forensics
Now, define a method that will do the validation of signatures as shown below:
def check_signature(prefetch_file):
version, signature = struct.unpack("<2i", prefetch_file.read_random(0, 8))
if signature == 1094927187:
return version
else:
return None
if pf_version is None:
continue
pf_name = hit[0]
if pf_version == 17:
parsed_data = parse_pf_17(prefetch_file, pf_name)
parsed_data.append(os.path.join(path, hit[1].lstrip("//")))
prefetch_data.append(parsed_data)
Now, start processing Windows prefetch files. Here we are taking the example of Windows XP
prefetch files:
88
Python Digital Forensics
Now, extract the data embedded within the prefetched files by using struct as follows:
As we have provided the prefetch version for Windows XP but what if it will encounter prefetch
versions for other Windows. Then it must have to display an error message as follows:
Now, define the method for writing result into CSV report as follows:
After running the above script, we will get the information from prefetch files of Windows XP
version into a spreadsheet.
90
11. Python Digital Forensics – Important Artifacts in Python Digital Forensics
Windows-III
This chapter will explain about further artifacts that an investigator can obtain during forensic
analysis on Windows.
Event Logs
Windows event log files, as name –suggests, are special files that stores significant events
like when user logs on the computer, when program encounter an error, about system
changes, RDP access, application specific events etc. Cyber investigators are always
interested in event log information because it provides lots of useful historical information
about the access of system. In the following Python script we are going to process both legacy
and current Windows event log formats.
For Python script, we need to install third party modules namely pytsk3, pyewf,
unicodecsv, pyevt and pyevtx. We can follow the steps given below to extract information
from event logs:
First, search for all the event logs that match the input argument.
Now, process each event log found with the appropriate library.
Python Code
Let us see how to use Python code for this purpose:
91
Python Digital Forensics
Now, provide the arguments for command-line handler. Note that here it will accept three
arguments – first is the path to evidence file, second is the type of evidence file and third is
the name of the event log to process.
if __name__ == "__main__":
parser = argparse.ArgumentParser('Information from Event Logs')
parser.add_argument("EVIDENCE_FILE", help="Evidence file path")
parser.add_argument("TYPE", help="Type of Evidence",choices=("raw", "ewf"))
parser.add_argument("LOG_NAME",help="Event Log Name (SecEvent.Evt,
SysEvent.Evt, ""etc.)")
parser.add_argument("-d", help="Event log directory to
scan",default="/WINDOWS/SYSTEM32/WINEVT")
parser.add_argument("-f", help="Enable fuzzy search for either evt or"" evtx
extension", action="store_true")
args = parser.parse_args()
if os.path.exists(args.EVIDENCE_FILE) and \
os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.TYPE, args.LOG_NAME, args.d, args.f)
else:
print("[-] Supplied input file {} does not exist or is not a
""file".format(args.EVIDENCE_FILE))
sys.exit(1)
Now, interact with event logs toquery the existence of the user supplied path by creating our
TSKUtil object. It can be done with the help of main() method as follows:
92
Python Digital Forensics
event_file = hit[2]
temp_evt = write_file(event_file)
Now, we need to perform signature verification followed by defining a method that will write
the entire content to the current directory:
def write_file(event_file):
with open(event_file.info.name.name, "w") as outfile:
outfile.write(event_file.read_random(0, event_file.info.meta.size))
return event_file.info.name.name
if pyevt.check_file_signature(temp_evt):
evt_log = pyevt.open(temp_evt)
print("[+] Identified {} records in
{}".format(evt_log.number_of_records, temp_evt))
for i, record in enumerate(evt_log.records):
strings = ""
for s in record.strings:
if s is not None:
strings += s + "\n"
event_data.append([
i, hit[0], record.computer_name,
record.user_security_identifier,
record.creation_time, record.written_time,
record.event_category, record.source_name,
record.event_identifier, record.event_type,
strings, "",
os.path.join(win_event, hit[1].lstrip("//"))
])
elif pyevtx.check_file_signature(temp_evt):
evtx_log = pyevtx.open(temp_evt)
print("[+] Identified {} records in {}".format(
evtx_log.number_of_records, temp_evt))
for i, record in enumerate(evtx_log.records):
strings = ""
for s in record.strings:
93
Python Digital Forensics
if s is not None:
strings += s + "\n"
event_data.append([
i, hit[0], record.computer_name,
record.user_security_identifier, "",
record.written_time, record.event_level,
record.source_name, record.event_identifier,
"", strings, record.xml_string,
os.path.join(win_event, hit[1].lstrip("//"))
])
else:
print("[-] {} not a valid event log. Removing temp
"
"file...".format(temp_evt))
os.remove(temp_evt)
continue
write_output(event_data)
else:
print("[-] {} Event log not found in {} directory".format(log,
win_event))
sys.exit(3)
else:
print("[-] Win XP Event Log Directory {} not found".format(win_event))
sys.exit(2)
def write_output(data):
output_name = "parsed_event_logs.csv"
print("[+] Writing {} to current working directory: {}".format(
output_name, os.getcwd()))
with open(output_name, "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow([
"Index", "File name", "Computer Name", "SID",
"Event Create Date", "Event Written Date",
94
Python Digital Forensics
Once you successfully run the above script, we will get the information of events log in
spreadsheet.
Internet History
Internet history is very much useful for forensic analysts; as most cyber-crimes happen over
the internet only. Let us see how to extract internet history from the Internet Explorer, as we
discussing about Windows forensics, and Internet Explorer comes by default with Windows.
On Internet Explorer, the internet history is saved in index.dat file. Let us look into a Python
script, which will extract the information from index.dat file.
For Python script we need to install third party modules namely pylnk, pytsk3, pymsiecf
and unicodecsv.
We can follow the steps given below to extract information from index.dat files:
Then, extract the information from that file by iterating through them.
Python Code
Let us see how to use Python code for this purpose:
95
Python Digital Forensics
Now, provide arguments for command-line handler. Note that here it will accept two
arguments – first would be the path to evidence file and second would be the type of evidence
file.
if __name__ == "__main__":
parser = argparse.ArgumentParser('getting information from internet history')
parser.add_argument("EVIDENCE_FILE", help="Evidence file path")
parser.add_argument("TYPE", help="Type of Evidence",choices=("raw", "ewf"))
parser.add_argument("-d", help="Index.dat directory to
scan",default="/USERS")
args = parser.parse_args()
if os.path.exists(args.EVIDENCE_FILE) and os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.TYPE, args.d)
else:
print("[-] Supplied input file {} does not exist or is not a
""file".format(args.EVIDENCE_FILE))
sys.exit(1)
Now, interpret the evidence file by creating an object of TSKUtil and iterate through the file
system to find index.dat files. It can be done by defining the main() function as follows:
Now, define a function with the help of which we can copy the information of index.dat file to
the current working directory and later on they can be processed by a third party module:
def write_file(index_file):
with open(index_file.info.name.name, "w") as outfile:
96
Python Digital Forensics
outfile.write(index_file.read_random(0, index_file.info.meta.size))
return index_file.info.name.name
Now, use the following code to perform the signature validation with the help of the built-in
function namely check_file_signature():
if pymsiecf.check_file_signature(temp_index):
index_dat = pymsiecf.open(temp_index)
print("[+] Identified {} records in {}".format(
index_dat.number_of_items, temp_index))
for i, record in enumerate(index_dat.items):
try:
data = record.data
if data is not None:
data = data.rstrip("\x00")
except AttributeError:
if isinstance(record,
pymsiecf.redirected):
index_data.append([i, temp_index, "",
"", "", "", "",record.location, "", "", record.offset,os.path.join(path,
hit[1].lstrip("//"))])
elif isinstance(record,
pymsiecf.leak):
index_data.append([i, temp_index,
record.filename, "","", "", "", "", "", "", record.offset,os.path.join(path,
hit[1].lstrip("//"))])
continue
index_data.append([
i, temp_index, record.filename,
record.type, record.primary_time,
record.secondary_time,
record.last_checked_time,
record.location,
record.number_of_hits, data, record.offset,
os.path.join(path,
hit[1].lstrip("//"))
])
97
Python Digital Forensics
else:
print("[-] {} not a valid index.dat file. Removing
"
"temp file..".format(temp_index))
os.remove("index.dat")
continue
os.remove("index.dat")
write_output(index_data)
else:
print("[-] Index.dat files not found in {}
directory".format(path))
sys.exit(3)
else:
print("[-] Directory {} not found".format(win_event))
sys.exit(2)
Now, define a method that will print the output in CSV file, as shown below:
def write_output(data):
output_name = "Internet_Indexdat_Summary_Report.csv"
print("[+] Writing {} with {} parsed index.dat files to current "
"working directory: {}".format(output_name,
len(data),os.getcwd()))
with open(output_name, "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(["Index", "File Name", "Record Name",
"Record Type", "Primary Date", "Secondary
Date",
"Last Checked Date", "Location", "No. of
Hits",
"Record Data", "Record Offset", "File
Path"])
writer.writerows(data)
After running above script we will get the information from index.dat file in CSV file.
98
Python Digital Forensics
With the help of these VSS files, forensic experts can have some historical information about
how the system changed over time and what files existed on the computer. Shadow copy
technology requires the file system to be NTFS for creating and storing shadow copies.
In this section, we are going to see a Python script, which helps in accessing any volume of
shadow copies present in the forensic image.
For Python script we need to install third party modules namely pytsk3, pyewf, unicodecsv,
pyvshadow and vss. We can follow the steps given below to extract information from VSS
files:
First, access the volume of raw image and identify all the NTFS partitions.
Then, extract the information from that shadow copies by iterating through them.
Now, at last we need to create a file listing of data within the snapshots.
Python Code
Let us see how to use Python code for this purpose:
Now, provide arguments for command-line handler. Here it will accept two arguments – first
is the path to evidence file and second is the output file.
99
Python Digital Forensics
if __name__ == "__main__":
parser = argparse.ArgumentParser('Parsing Shadow Copies')
parser.add_argument("EVIDENCE_FILE", help="Evidence file path")
parser.add_argument("OUTPUT_CSV",
help="Output CSV with VSS file listing")
args = parser.parse_args()
Now, validate the input file path’s existence and also separate the directory from output file.
directory = os.path.dirname(args.OUTPUT_CSV)
if not os.path.exists(directory) and directory != "":
os.makedirs(directory)
if os.path.exists(args.EVIDENCE_FILE) and \
os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.OUTPUT_CSV)
else:
print("[-] Supplied input file {} does not exist or is not a "
"file".format(args.EVIDENCE_FILE))
sys.exit(1)
Now, interact with evidence file’s volume by creating the TSKUtil object. It can be done with
the help of main() method as follows:
explore_vss(evidence, part.start *
img_vol.info.block_size,output)
else:
print("[-] Must be a physical preservation to be compatible ""with
this script")
sys.exit(2)
100
Python Digital Forensics
Now, define a method for exploring the parsed volume shadow file as follows:
Lastly, define the method for writing the result in spreadsheet as follows:
Once you successfully run this Python script, we will get the information residing in VSS into
a spreadsheet.
101
Python Digital Forensics
102
12. Python Digital Forensics – Investigation of Log Python Digital Forensics
Based Artifacts
Till now, we have seen how to obtain artifacts in Windows using Python. In this chapter, let
us learn about investigation of log based artifacts using Python.
Introduction
Log-based artifacts are the treasure trove of information that can be very useful for a digital
forensic expert. Though we have various monitoring software for collecting the information,
the main issue for parsing useful information from them is that we need lot of data.
Timestamps
Timestamp conveys the data and time of the activity in the log. It is one of the important
elements of any log file. Note that these data and time values can come in various formats.
The Python script shown below will take the raw date-time as input and provides a formatted
timestamp as its output.
First, set up the arguments that will take the raw data value along with source of data
and the data type.
Now, provide a class for providing common interface for data across different date
formats.
Python Code
Let us see how to use Python code for this purpose:
103
Python Digital Forensics
Now as usual we need to provide argument for command-line handler. Here it will accept
three arguments, first would be the date value to be processed, second would be the source
of that date value and third would be its type.
if __name__ == '__main__':
parser = ArgumentParser('Timestamp Log-based artifact')
parser.add_argument("date_value", help="Raw date value to parse")
parser.add_argument("source", help="Source format of
date",choices=ParseDate.get_supported_formats())
parser.add_argument("type", help="Data type of input
value",choices=('number', 'hex'), default='int')
args = parser.parse_args()
date_parser = ParseDate(args.date_value, args.source, args.type)
date_parser.run()
print(date_parser.timestamp)
Now, we need to define a class which will accept the arguments for date value, date source,
and the value type.
class ParseDate(object):
def __init__(self, date_value, source, data_type):
self.date_value = date_value
self.source = source
self.data_type = data_type
self.timestamp = None
Now we will define a method that will act like a controller just like the main() method:
def run(self):
if self.source == 'unix-epoch':
self.parse_unix_epoch()
elif self.source == 'unix-epoch-ms':
self.parse_unix_epoch(True)
elif self.source == 'windows-filetime':
self.parse_windows_filetime()
@classmethod
def get_supported_formats(cls):
return ['unix-epoch', 'unix-epoch-ms', 'windows-filetime']
104
Python Digital Forensics
Now, we need to define two methods which will process Unix epoch time and FILETIME
respectively:
After running the above script, by providing a timestamp we can get the converted value in
easy-to-read format.
105
Python Digital Forensics
Now, we need to define the patterns that will be parsed from the logs:
iis_log_format = [
("date", re.compile(r"\d{4}-\d{2}-\d{2}")),
("time", re.compile(r"\d\d:\d\d:\d\d")),
("s-ip", re.compile(
r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
("cs-method", re.compile(
r"(GET)|(POST)|(PUT)|(DELETE)|(OPTIONS)|(HEAD)|(CONNECT)")),
("cs-uri-stem", re.compile(r"([A-Za-z0-1/\.-]*)")),
("cs-uri-query", re.compile(r"([A-Za-z0-1/\.-]*)")),
("s-port", re.compile(r"\d*")),
("cs-username", re.compile(r"([A-Za-z0-1/\.-]*)")),
("c-ip", re.compile(
r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
("cs(User-Agent)", re.compile(r".*")),
("sc-status", re.compile(r"\d*")),
("sc-substatus", re.compile(r"\d*")),
("sc-win32-status", re.compile(r"\d*")),
("time-taken", re.compile(r"\d*"))]
106
Python Digital Forensics
Now, provide an argument for command-line handler. Here it will accept two arguments, first
would be the IIS log to be processed, second would be the desired CSV file path.
if __name__ == '__main__':
parser = ArgumentParser('Parsing Server Based Logs')
parser.add_argument('iis_log', help="Path to IIS Log",type=FileType('r'))
parser.add_argument('csv_report', help="Path to CSV report")
parser.add_argument('-l', help="Path to processing log",default=__name__ +
'.log')
args = parser.parse_args()
logger.setLevel(logging.DEBUG)
msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-10s ""%(levelname)-8s
%(message)s")
strhndl = logging.StreamHandler(sys.stdout)
strhndl.setFormatter(fmt=msg_fmt)
fhndl = logging.FileHandler(args.log, mode='a')
fhndl.setFormatter(fmt=msg_fmt)
logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting IIS Parsing ")
logger.debug("Supplied arguments: {}".format(", ".join(sys.argv[1:])))
logger.debug("System " + sys.platform)
logger.debug("Version " + sys.version)
main(args.iis_log, args.csv_report, logger)
iologger.info("IIS Parsing Complete")
Now we need to define main() method that will handle the script for bulk log information:
else:
line_iter = line.split(" ")
for count, split_entry in enumerate(line_iter):
col_name, col_pattern = iis_log_format[count]
if col_pattern.match(split_entry):
log_entry[col_name] = split_entry
else:
logger.error("Unknown column pattern discovered. "
"Line preserved in full below")
logger.error("Unparsed Line: {}".format(line))
parsed_logs.append(log_entry)
logger.info("Parsed {} lines".format(len(parsed_logs)))
cols = [x[0] for x in iis_log_format]
logger.info("Creating report file: {}".format(report_file))
write_csv(report_file, cols, parsed_logs)
logger.info("Report created")
Lastly, we need to define a method that will write the output to spreadsheet:
After running the above script we will get the web server based logs in a spreadsheet.
108
Python Digital Forensics
We can follow the steps given below for using YARA rules to scan files:
Python Code
Let us see how to use Python code for this purpose:
Next, provide argument for command-line handler. Note that here it will accept two
arguments – first is the path to YARA rules, second is the file to be scanned.
if __name__ == '__main__':
parser = ArgumentParser('Scanning files by YARA')
parser.add_argument('yara_rules',help="Path to Yara rule to scan with. May be
file or folder path.")
parser.add_argument('path_to_scan',help="Path to file or folder to scan")
parser.add_argument('--output',help="Path to output a CSV report of scan
results")
args = parser.parse_args()
main(args.yara_rules, args.path_to_scan, args.output)
Now we will define the main() function that will accept the path to the yara rules and file to
be scanned:
109
Python Digital Forensics
Now, define a method that will iterate through the directory and passes the result to another
method for further processing:
Next, define two functions. Note that first we will use match() method to yrules object and
another will report that match information to the console if the user does not specify any
output file. Observe the code shown below:
110
Python Digital Forensics
'file_name': file_path,
'rule_name': rule_set.rule,
'rule_tag': ",".join(rule_set.tags),
'hit_offset': hit[0],
'rule_string': hit[1],
'hit_value': hit[2]
})
return match_info
def write_stdout(columns, match_info):
for entry in match_info:
for col in columns:
print("{}: {}".format(col, entry[col]))
print("=" * 30)
Lastly, we will define a method that will write the output to CSV file, as shown below:
Once you run the above script successfully, we can provide appropriate arguments at the
command-line and can generate a CSV report.
111