DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap

FIRMWARE SLAP:
AUTOMATING DISCOVERY OF
EXPLOITABLE VULNERABILITIES IN
FIRMWARE
CHRISTOPHER ROBERTS

WHO AM I
• Researcher at REDLattice Inc.
• Interested in finding bugs in embedded systems
• Interested in program analysis
• CTF Player

A QUICK BACKGROUND
IN EXPLOITABLE BUGS

DARPA CYBER
GRAND
CHALLENGE
• Automated cyber reasoning
systems:
• Find vulnerabilities
• Exploit vulnerabilities
• Patch vulnerabilities
• Automatically generates full
exploits and proof of
concepts.

PREVENTING
BUGS
AUTOMATICALLY
• Source level protections
• LLVM’s Clang static analyzers
• Compile time protections
• Non-executable stack
• Stack canaries
• RELRO
• _FORTIFY_SOURCE
• Operating system protections
• ASLR

PREVENTING
BUGS
AUTOMATICALLY
• LLVM’s Clang static analyzers - Maybe
• Non-executable stack - Maybe
• Stack canaries
• RELRO
• _FORTIFY_SOURCE
• ASLR
In Embedded
Devices

EXPLOIT
MITIGATIONS
• There has to be an exploit to
mitigate it, right?

Non-executable
stack
Stack Canaries RELRO _FORTIFY
SOURCE
ASLR

DEMO
• CVE-2019-13087
• CVE-2019-13088
• CVE-2019-13089
• CVE-2019-13090
• CVE-2019-13091
• CVE-2019-13092

CONCOLIC ANALYSIS
• Symbolic Analysis + Concrete Analysis
• Lots of talks already on this subject.
• Really good at find specific inputs to trigger code
paths
• For my work in Firmware Slap I used angr!
• Concolic analysis
• CFG analysis
• Used in Cyber Grand Challenge for 3rd place!

BUILDING REAL INPUTS
FROM SYMBOLIC DATA
• LLVM’s Clang static analyzers
• Non-executable stack
• Stack canaries
• RELRO
• _FORTIFY_SOURCE
• ASLR
• Symbolic Variable Here
• get_user_input()
• To get our “You did it”
output
• angr will create several
program states
• One has the constraints:
• x >= 200
• x < 250
• angr sends these
constraints to it’s
theorem prover to give:
• X=231 or x=217
or x=249…

• Symbolically represent more of the program state.
• Registers, Call Stack, Files
• Query the analysis for more interesting conditions
• Does a network read influence or corrupt the program counter?
• Does network data get fed into sensitive system calls?
• Can we track all reads and writes required to trigger a vulnerability?

WHERE DOES CONCOLIC
ANALYSIS FAIL?
Memory Usage
• Big code bases
• Angr is trying to map out every single
potential path through a program. Programs
of non-trivial size will eat all your resources.
• A compiled lightttpd binary might be
~200KB
• Angr will run your computer out of memory
before it can example every potential
program state in a webserver
• Embedded system’s firmware can be a lot
larger…

• Challenge:
• Model complicated binaries with limited resources
• Model unknown input
• Identify vulnerabilities in binaries
• Find binaries and functions that similar to one-another

Start
Parse
Config
Setup
sockets
Parse user
input
Action 1
Nothing
interesting
Action 2
Nothing
interesting
Action 3
Nothing
interesting
Action 4
Nothing
interesting
Action 5
Vulnerable
code

• Underconstraining concolic analysis:
• Values from hardware peripherals and NVRAM are UNKNOWN
• Spin up and initialization consumes valuable time and resources
• Configs can be setup any number of ways
• Skip the hard stuff
• Make hardware peripherals and NVRAM return symbolic variables
• Start concolic analysis after the initialization steps

• angr can analyze code at this level, but
it needs to know where to start.
• Ghidra can produce a function
prototype that angr can use to analyze
a function…
MODELING FUNCTIONS

• Finding bugs in binaries
• Recover every function prototype using ghidra
• Build an angr program state with information with symbolic arguments from the
prototype
• Run each analysis job in parallel

FINDING BUGS IN FUNCTIONS
• Demo

• With less code to analyze we can introduce more heavy-weight analysis
• Tracking memory instructions imposed by all instructions
• Memory regions tainted by user supplied arguments
• Mapping memory loading actions to values in memory.
• Every step through a program
• Store any new constraints to user input
• Does user input influence a system() call or corrupt the program counter
• Does user input taint a stack or heap variable

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap

FUNCTION
SIMILARITY
• Bindiff and diaphora are the
standard for binary diffing.
• They help us find what code was
actually patched when a CVE and a
patch is published.
• Uses a set of heuristics to build a
signature for every function in a binary
• Basic block count
• Basic block edges
• Function references

• Both of these tools are tied to IDA
• The workflow is built around one-off comparisons

CLUSTERING
• Helps us understand how similar
are two things?
• Extract features from each thing
• For dots on a grid it can be:
• X location
• Y location

K-MEANS CLUSTERING
Extract features
Pick two random points
Categorize each point to one of those
random points
•Use Euclidian or cosine distance to find which is closest
Pick new cluster center by
averaging each category by
feature and using the point closest.
Recategorize all the points into
categories.
• Rinse and repeat until points don’t move!

CLUSTERING – WHY
THIS WORKS
• Features don’t have to be numbers…
• They can be the existence (0 or 1) of:
• String references
• Data references
• Function arguments
• Basic block count
• All of these features can be extracted
from reverse engineering tools like…
• Ghidra, Radare2, or Binary Ninja

IT ONLY WORKS IF YOU GUESS THE RIGHT NUMBER OF CLUSTERS

SUPERVISED CLUSTERING
• Supervised anything machine learning uses KNOWN values to cluster data
• We also know how many clusters there should be
• Our functions inside our binaries could be supervised if every function was
known to be vulnerable or benign
• Embedded systems programming gives us no assurances.

SEMI-SUPERVISED CLUSTERING
• Semi-Supervised clustering uses SOME KNOWN values to cluster data
• If we use public CVE information to find which functions in a binary are
KNOWN vulnerable, we can guess that really similar functions might also be
vulnerable.
• We can set our cluster count to the number of known vulnerable functions in a
binary

• Finding features in binaries to cluster
• Wrote a Ghidra headless plugin to dump all
function information
• Data/String/Call references are changed to
binary (0/1) it exists or it doesn’t
• All numbers are normalized
• Being at offset 0x80000000 shouldn’t matter
more then having 2 function arguments.
• Throw away useless information
• A Chi^2 squared test is used to see how much a
feature defines an item.
• If every function has the same calling convention,
the Chi^2 squared test will throw it away.

• Taking it further…
• Selecting a better number of clusters through cluster scoring
• Silhouette score ranks how similar each cluster of functions are
• This separates functions into clusters of similar tasks
• String operation functions
• Destructors/Constructors
• File manipulation
• Web request handling
• etc..

DATA MINING + CONCOLIC ANALYSIS
• Demo
• CVE-2019-13087

• Findings
• Code clones
• Calling patterns
• Similar function calls
• Similar data references
• Similar file access

FINDING THE
FUNCTION
CLUSTER
COUNT
The trick is generally finding the
biggest “drop” and choosing the
count before that

FINDING THE
FUNCTION
CLUSTER
COUNT
• The trick is generally finding
the biggest “drop” and
choosing the count before that

FIRMWARE SLAP
Export Data to JSON and send into elastic search
Cluster Functions according to best feature set
Extract Best function features using SKlearn
Build and run angr analysis jobs
Recover Function prototypes from every binary
Locate System root
Extract Firmware

VISUALIZING
VULNERABILITY
RESULTS
• All information generated as
JSON from both concolic and
data mining pass
• Includes script to load
information into Elasticsearch
and Kibana

MITIGATIONS
• Use compile time protections
• Enable your operating system’s ASLR
• Buy a better router

• It’s time to bring more automation into checking our embedded systems
• Don’t blindly trust third-party embedded systems
• I’m giving you the tools to find the bugs yourself

RELEASING
• Firmware Slap – The tool behind the demos
• The Ghidra function dumping plugin
• The cleaned-up PoCs
• CVE-2019-13087 - CVE-2019-13092

• Code:
• https://github.com/ChrisTheCoolHut/Firmware_Slap
• Feedback? Questions?
• @0x01_chris

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap

Related slideshows

More Related Content

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap