Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
FIRMWARE SLAP:
AUTOMATING DISCOVERY OF
EXPLOITABLE VULNERABILITIES IN
FIRMWARE
CHRISTOPHER ROBERTS
WHO AM I
• Researcher at REDLattice Inc.
• Interested in finding bugs in embedded systems
• Interested in program analysis
• CTF Player
A QUICK BACKGROUND
IN EXPLOITABLE BUGS
DARPA CYBER
GRAND
CHALLENGE
• Automated cyber reasoning
systems:
• Find vulnerabilities
• Exploit vulnerabilities
• Patch vulnerabilities
• Automatically generates full
exploits and proof of
concepts.
PREVENTING
BUGS
AUTOMATICALLY
• Source level protections
• LLVM’s Clang static analyzers
• Compile time protections
• Non-executable stack
• Stack canaries
• RELRO
• _FORTIFY_SOURCE
• Operating system protections
• ASLR
PREVENTING
BUGS
AUTOMATICALLY
• Source level protections
• LLVM’s Clang static analyzers - Maybe
• Compile time protections
• Non-executable stack - Maybe
• Stack canaries
• RELRO
• _FORTIFY_SOURCE
• Operating system protections
• ASLR
In Embedded
Devices
EXPLOIT
MITIGATIONS
• There has to be an exploit to
mitigate it, right?
Non-executable
stack
Stack Canaries RELRO _FORTIFY
SOURCE
ASLR
ALMOND 3
DEMO
• CVE-2019-13087
• CVE-2019-13088
• CVE-2019-13089
• CVE-2019-13090
• CVE-2019-13091
• CVE-2019-13092
CONCOLIC ANALYSIS
• Symbolic Analysis + Concrete Analysis
• Lots of talks already on this subject.
• Really good at find specific inputs to trigger code
paths
• For my work in Firmware Slap I used angr!
• Concolic analysis
• CFG analysis
• Used in Cyber Grand Challenge for 3rd place!
BUILDING REAL INPUTS
FROM SYMBOLIC DATA
• Source level protections
• LLVM’s Clang static analyzers
• Compile time protections
• Non-executable stack
• Stack canaries
• RELRO
• _FORTIFY_SOURCE
• Operating system protections
• ASLR
• Symbolic Variable Here
• get_user_input()
• To get our “You did it”
output
• angr will create several
program states
• One has the constraints:
• x >= 200
• x < 250
• angr sends these
constraints to it’s
theorem prover to give:
• X=231 or x=217
or x=249…
• Symbolically represent more of the program state.
• Registers, Call Stack, Files
• Query the analysis for more interesting conditions
• Does a network read influence or corrupt the program counter?
• Does network data get fed into sensitive system calls?
• Can we track all reads and writes required to trigger a vulnerability?
WHERE DOES CONCOLIC
ANALYSIS FAIL?
Memory Usage
• Big code bases
• Angr is trying to map out every single
potential path through a program. Programs
of non-trivial size will eat all your resources.
• A compiled lightttpd binary might be
~200KB
• Angr will run your computer out of memory
before it can example every potential
program state in a webserver
• Embedded system’s firmware can be a lot
larger…
• Challenge:
• Model complicated binaries with limited resources
• Model unknown input
• Identify vulnerabilities in binaries
• Find binaries and functions that similar to one-another
Start
Parse
Config
Setup
sockets
Parse user
input
Action 1
Nothing
interesting
Action 2
Nothing
interesting
Action 3
Nothing
interesting
Action 4
Nothing
interesting
Action 5
Vulnerable
code
• Underconstraining concolic analysis:
• Values from hardware peripherals and NVRAM are UNKNOWN
• Spin up and initialization consumes valuable time and resources
• Configs can be setup any number of ways
• Skip the hard stuff
• Make hardware peripherals and NVRAM return symbolic variables
• Start concolic analysis after the initialization steps
Start
Parse
Config
Setup
sockets
Parse user
input
Action 1
Nothing
interesting
Action 2
Nothing
interesting
Action 3
Nothing
interesting
Action 4
Nothing
interesting
Action 5
Vulnerable
code
• angr can analyze code at this level, but
it needs to know where to start.
• Ghidra can produce a function
prototype that angr can use to analyze
a function…
MODELING FUNCTIONS
• Finding bugs in binaries
• Recover every function prototype using ghidra
• Build an angr program state with information with symbolic arguments from the
prototype
• Run each analysis job in parallel
FINDING BUGS IN FUNCTIONS
• Demo
• With less code to analyze we can introduce more heavy-weight analysis
• Tracking memory instructions imposed by all instructions
• Memory regions tainted by user supplied arguments
• Mapping memory loading actions to values in memory.
• Every step through a program
• Store any new constraints to user input
• Does user input influence a system() call or corrupt the program counter
• Does user input taint a stack or heap variable
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
FUNCTION
SIMILARITY
• Bindiff and diaphora are the
standard for binary diffing.
• They help us find what code was
actually patched when a CVE and a
patch is published.
• Uses a set of heuristics to build a
signature for every function in a binary
• Basic block count
• Basic block edges
• Function references
• Both of these tools are tied to IDA
• The workflow is built around one-off comparisons
CLUSTERING
• Helps us understand how similar
are two things?
• Extract features from each thing
• For dots on a grid it can be:
• X location
• Y location
K-MEANS CLUSTERING
Extract features
Pick two random points
Categorize each point to one of those
random points
•Use Euclidian or cosine distance to find which is closest
Pick new cluster center by
averaging each category by
feature and using the point closest.
Recategorize all the points into
categories.
• Rinse and repeat until points don’t move!
CLUSTERING – WHY
THIS WORKS
• Features don’t have to be numbers…
• They can be the existence (0 or 1) of:
• String references
• Data references
• Function arguments
• Basic block count
• All of these features can be extracted
from reverse engineering tools like…
• Ghidra, Radare2, or Binary Ninja
IT ONLY WORKS IF YOU GUESS THE RIGHT NUMBER OF CLUSTERS
SUPERVISED CLUSTERING
• Supervised anything machine learning uses KNOWN values to cluster data
• We also know how many clusters there should be
• Our functions inside our binaries could be supervised if every function was
known to be vulnerable or benign
• Embedded systems programming gives us no assurances.
SEMI-SUPERVISED CLUSTERING
• Semi-Supervised clustering uses SOME KNOWN values to cluster data
• If we use public CVE information to find which functions in a binary are
KNOWN vulnerable, we can guess that really similar functions might also be
vulnerable.
• We can set our cluster count to the number of known vulnerable functions in a
binary
• Finding features in binaries to cluster
• Wrote a Ghidra headless plugin to dump all
function information
• Data/String/Call references are changed to
binary (0/1) it exists or it doesn’t
• All numbers are normalized
• Being at offset 0x80000000 shouldn’t matter
more then having 2 function arguments.
• Throw away useless information
• A Chi^2 squared test is used to see how much a
feature defines an item.
• If every function has the same calling convention,
the Chi^2 squared test will throw it away.
• Taking it further…
• Selecting a better number of clusters through cluster scoring
• Silhouette score ranks how similar each cluster of functions are
• This separates functions into clusters of similar tasks
• String operation functions
• Destructors/Constructors
• File manipulation
• Web request handling
• etc..
SILHOUETTE SCORE
SILHOUETTE SCORE
DATA MINING + CONCOLIC ANALYSIS
• Demo
• CVE-2019-13087
• Findings
• Code clones
• Calling patterns
• Similar function calls
• Similar data references
• Similar file access
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
FINDING THE
FUNCTION
CLUSTER
COUNT
The trick is generally finding the
biggest “drop” and choosing the
count before that
FINDING THE
FUNCTION
CLUSTER
COUNT
• The trick is generally finding
the biggest “drop” and
choosing the count before that
FIRMWARE SLAP
Export Data to JSON and send into elastic search
Cluster Functions according to best feature set
Extract Best function features using SKlearn
Build and run angr analysis jobs
Recover Function prototypes from every binary
Locate System root
Extract Firmware
VISUALIZING
VULNERABILITY
RESULTS
• All information generated as
JSON from both concolic and
data mining pass
• Includes script to load
information into Elasticsearch
and Kibana
MITIGATIONS
• Use compile time protections
• Enable your operating system’s ASLR
• Buy a better router
• It’s time to bring more automation into checking our embedded systems
• Don’t blindly trust third-party embedded systems
• I’m giving you the tools to find the bugs yourself
RELEASING
• Firmware Slap – The tool behind the demos
• The Ghidra function dumping plugin
• The cleaned-up PoCs
• CVE-2019-13087 - CVE-2019-13092
• Code:
• https://github.com/ChrisTheCoolHut/Firmware_Slap
• Feedback? Questions?
• @0x01_chris

More Related Content

DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap

  • 1. FIRMWARE SLAP: AUTOMATING DISCOVERY OF EXPLOITABLE VULNERABILITIES IN FIRMWARE CHRISTOPHER ROBERTS
  • 2. WHO AM I • Researcher at REDLattice Inc. • Interested in finding bugs in embedded systems • Interested in program analysis • CTF Player
  • 3. A QUICK BACKGROUND IN EXPLOITABLE BUGS
  • 4. DARPA CYBER GRAND CHALLENGE • Automated cyber reasoning systems: • Find vulnerabilities • Exploit vulnerabilities • Patch vulnerabilities • Automatically generates full exploits and proof of concepts.
  • 5. PREVENTING BUGS AUTOMATICALLY • Source level protections • LLVM’s Clang static analyzers • Compile time protections • Non-executable stack • Stack canaries • RELRO • _FORTIFY_SOURCE • Operating system protections • ASLR
  • 6. PREVENTING BUGS AUTOMATICALLY • Source level protections • LLVM’s Clang static analyzers - Maybe • Compile time protections • Non-executable stack - Maybe • Stack canaries • RELRO • _FORTIFY_SOURCE • Operating system protections • ASLR In Embedded Devices
  • 7. EXPLOIT MITIGATIONS • There has to be an exploit to mitigate it, right?
  • 10. DEMO • CVE-2019-13087 • CVE-2019-13088 • CVE-2019-13089 • CVE-2019-13090 • CVE-2019-13091 • CVE-2019-13092
  • 11. CONCOLIC ANALYSIS • Symbolic Analysis + Concrete Analysis • Lots of talks already on this subject. • Really good at find specific inputs to trigger code paths • For my work in Firmware Slap I used angr! • Concolic analysis • CFG analysis • Used in Cyber Grand Challenge for 3rd place!
  • 12. BUILDING REAL INPUTS FROM SYMBOLIC DATA • Source level protections • LLVM’s Clang static analyzers • Compile time protections • Non-executable stack • Stack canaries • RELRO • _FORTIFY_SOURCE • Operating system protections • ASLR • Symbolic Variable Here • get_user_input() • To get our “You did it” output • angr will create several program states • One has the constraints: • x >= 200 • x < 250 • angr sends these constraints to it’s theorem prover to give: • X=231 or x=217 or x=249…
  • 13. • Symbolically represent more of the program state. • Registers, Call Stack, Files • Query the analysis for more interesting conditions • Does a network read influence or corrupt the program counter? • Does network data get fed into sensitive system calls? • Can we track all reads and writes required to trigger a vulnerability?
  • 14. WHERE DOES CONCOLIC ANALYSIS FAIL? Memory Usage • Big code bases • Angr is trying to map out every single potential path through a program. Programs of non-trivial size will eat all your resources. • A compiled lightttpd binary might be ~200KB • Angr will run your computer out of memory before it can example every potential program state in a webserver • Embedded system’s firmware can be a lot larger…
  • 15. • Challenge: • Model complicated binaries with limited resources • Model unknown input • Identify vulnerabilities in binaries • Find binaries and functions that similar to one-another
  • 16. Start Parse Config Setup sockets Parse user input Action 1 Nothing interesting Action 2 Nothing interesting Action 3 Nothing interesting Action 4 Nothing interesting Action 5 Vulnerable code
  • 17. • Underconstraining concolic analysis: • Values from hardware peripherals and NVRAM are UNKNOWN • Spin up and initialization consumes valuable time and resources • Configs can be setup any number of ways • Skip the hard stuff • Make hardware peripherals and NVRAM return symbolic variables • Start concolic analysis after the initialization steps
  • 18. Start Parse Config Setup sockets Parse user input Action 1 Nothing interesting Action 2 Nothing interesting Action 3 Nothing interesting Action 4 Nothing interesting Action 5 Vulnerable code
  • 19. • angr can analyze code at this level, but it needs to know where to start. • Ghidra can produce a function prototype that angr can use to analyze a function… MODELING FUNCTIONS
  • 20. • Finding bugs in binaries • Recover every function prototype using ghidra • Build an angr program state with information with symbolic arguments from the prototype • Run each analysis job in parallel
  • 21. FINDING BUGS IN FUNCTIONS • Demo
  • 22. • With less code to analyze we can introduce more heavy-weight analysis • Tracking memory instructions imposed by all instructions • Memory regions tainted by user supplied arguments • Mapping memory loading actions to values in memory. • Every step through a program • Store any new constraints to user input • Does user input influence a system() call or corrupt the program counter • Does user input taint a stack or heap variable
  • 28. FUNCTION SIMILARITY • Bindiff and diaphora are the standard for binary diffing. • They help us find what code was actually patched when a CVE and a patch is published. • Uses a set of heuristics to build a signature for every function in a binary • Basic block count • Basic block edges • Function references
  • 29. • Both of these tools are tied to IDA • The workflow is built around one-off comparisons
  • 30. CLUSTERING • Helps us understand how similar are two things? • Extract features from each thing • For dots on a grid it can be: • X location • Y location
  • 31. K-MEANS CLUSTERING Extract features Pick two random points Categorize each point to one of those random points •Use Euclidian or cosine distance to find which is closest Pick new cluster center by averaging each category by feature and using the point closest. Recategorize all the points into categories. • Rinse and repeat until points don’t move!
  • 32. CLUSTERING – WHY THIS WORKS • Features don’t have to be numbers… • They can be the existence (0 or 1) of: • String references • Data references • Function arguments • Basic block count • All of these features can be extracted from reverse engineering tools like… • Ghidra, Radare2, or Binary Ninja
  • 33. IT ONLY WORKS IF YOU GUESS THE RIGHT NUMBER OF CLUSTERS
  • 34. SUPERVISED CLUSTERING • Supervised anything machine learning uses KNOWN values to cluster data • We also know how many clusters there should be • Our functions inside our binaries could be supervised if every function was known to be vulnerable or benign • Embedded systems programming gives us no assurances.
  • 35. SEMI-SUPERVISED CLUSTERING • Semi-Supervised clustering uses SOME KNOWN values to cluster data • If we use public CVE information to find which functions in a binary are KNOWN vulnerable, we can guess that really similar functions might also be vulnerable. • We can set our cluster count to the number of known vulnerable functions in a binary
  • 36. • Finding features in binaries to cluster • Wrote a Ghidra headless plugin to dump all function information • Data/String/Call references are changed to binary (0/1) it exists or it doesn’t • All numbers are normalized • Being at offset 0x80000000 shouldn’t matter more then having 2 function arguments. • Throw away useless information • A Chi^2 squared test is used to see how much a feature defines an item. • If every function has the same calling convention, the Chi^2 squared test will throw it away.
  • 37. • Taking it further… • Selecting a better number of clusters through cluster scoring • Silhouette score ranks how similar each cluster of functions are • This separates functions into clusters of similar tasks • String operation functions • Destructors/Constructors • File manipulation • Web request handling • etc..
  • 40. DATA MINING + CONCOLIC ANALYSIS • Demo • CVE-2019-13087
  • 41. • Findings • Code clones • Calling patterns • Similar function calls • Similar data references • Similar file access
  • 43. FINDING THE FUNCTION CLUSTER COUNT The trick is generally finding the biggest “drop” and choosing the count before that
  • 44. FINDING THE FUNCTION CLUSTER COUNT • The trick is generally finding the biggest “drop” and choosing the count before that
  • 45. FIRMWARE SLAP Export Data to JSON and send into elastic search Cluster Functions according to best feature set Extract Best function features using SKlearn Build and run angr analysis jobs Recover Function prototypes from every binary Locate System root Extract Firmware
  • 46. VISUALIZING VULNERABILITY RESULTS • All information generated as JSON from both concolic and data mining pass • Includes script to load information into Elasticsearch and Kibana
  • 47. MITIGATIONS • Use compile time protections • Enable your operating system’s ASLR • Buy a better router
  • 48. • It’s time to bring more automation into checking our embedded systems • Don’t blindly trust third-party embedded systems • I’m giving you the tools to find the bugs yourself
  • 49. RELEASING • Firmware Slap – The tool behind the demos • The Ghidra function dumping plugin • The cleaned-up PoCs • CVE-2019-13087 - CVE-2019-13092