Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
0 views

Multi-module vulnerability analysis of web-based application

Uploaded by

hamdan Al thafif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Multi-module vulnerability analysis of web-based application

Uploaded by

hamdan Al thafif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Multi-Module Vulnerability Analysis of Web-based

Applications

Davide Balzarotti, Marco Cova, Viktoria V. Felmetsger, and Giovanni Vigna


Computer Security Group
University of California, Santa Barbara
Santa Barbara, CA, USA
{balzarot, marco, rusvika, vigna}@cs.ucsb.edu

ABSTRACT 1. INTRODUCTION
In recent years, web applications have become tremendously pop- Web applications are growing in popularity. The introduction of
ular, and nowadays they are routinely used in security-critical envi- sophisticated mechanisms for the handling of asynchronous events
ronments, such as medical, financial, and military systems. As the in web browsers and the availability of a number of frameworks for
use of web applications for critical services has increased, the num- the rapid prototyping of server-side components have fostered the
ber and sophistication of attacks against these applications have development of new applications and the transition of “traditional”
grown as well. Current approaches to securing web applications applications (e.g., mail readers) to web-based platforms.
focus either on detecting and blocking web-based attacks using While new technologies have brought in significant advantages
application-level firewalls, or on using vulnerability analysis tech- in terms of support to the development process, improved perfor-
niques to identify security problems before deployment. mance, and increased interoperability, little has been done to tackle
The vulnerability analysis of web applications is made difficult security issues. Therefore, as the complexity of web applications
by a number of factors, such as the use of scripting languages, the increases, the possibility for abuse increases as well. For exam-
structuring of the application logic into separate pages and code ple, a simple analysis of the CVE vulnerability database [4] shows
modules, and the interaction with back-end databases. So far, ap- that the percentage of web-based attacks rose from 25% of the total
proaches to web application vulnerability analysis have focused on number of entries in 2000 to 61% in 2006.
single application modules to identify insecure uses of informa- This situation is made worse by the fact that web applications are
tion provided as input to the application. Unfortunately, these ap- usually reachable through firewalls by design, and, in addition, the
proaches are limited in scope, and, therefore, they cannot detect server-side logic is often developed under time-to-market pressure
multi-step attacks that exploit the interaction among multiple mod- by developers with insufficient security skills. As a result, vulnera-
ules of an application. ble web applications are deployed and made available to the whole
We have developed a novel vulnerability analysis approach that Internet, creating easily-exploitable entry points for the compro-
characterizes both the extended state and the intended workflow of mise of entire networks.
a web application. By doing this, our analysis approach is able to To address the security problems associated with web applica-
take into account inter-module relationships as well as the interac- tions, the research community has proposed a number of solutions.
tion of an application’s modules with back-end databases. As a re- A first class of solutions focuses on detecting (and possibly block-
sult, our vulnerability analysis technique is able to identify sophis- ing) web-based attacks. This can be done by analyzing the requests
ticated multi-step attacks against the application’s workflow that sent to web applications [13, 2, 21, 17, 18] or, in some cases, by an-
were not addressed by previous approaches. We implemented our alyzing the data delivered by the applications to the clients [11, 8].
technique in a prototype tool, called MiMoSA, and tested it on sev- These solutions have the advantage that they do not require any
eral applications, identifying both known and new vulnerabilities. modification to the application being protected. However, they
have a significant impact on the system’s performance, and, in case
Categories and Subject Descriptors: D.2.4 [Software Engineer- of false positives (i.e., wrong detections), they may block legitimate
ing]: Software/Program Verification traffic.
General Terms: Security A second class of solutions focuses on identifying flaws in the
implementation of a web application before the application is de-
Keywords: Web Applications, Multi-step Attacks, Vulnerability ployed. These approaches utilize static and dynamic analysis tech-
Analysis, Static Analysis, Dynamic Analysis niques to identify vulnerabilities in web applications [7,9,14]. Most
of these approaches are based on the assumption that vulnerabilities
in web applications are the result of insecure data flow. Therefore,
these techniques attempt to identify when data originating from
outside the application (e.g., from user input) is used in security-
critical operations without being first checked and sanitized.
Permission to make digital or hard copies of all or part of this work for Even though these approaches are effective at detecting suspi-
personal or classroom use is granted without fee provided that copies are cious uses of unsanitized data, they suffer from three main limita-
not made or distributed for profit or commercial advantage and that copies
tions. First, their scope is limited to a single web application mod-
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific ule, such as a single PHP file or a single ASP component. There-
permission and/or a fee. fore, these techniques are not able to identify vulnerabilities that
CCS’07, October 29–November 2, 2007, Alexandria, Virginia, USA. are caused by the interaction of multiple modules. Second, these
Copyright 2007 ACM 978-1-59593-703-2/07/0011 ...$5.00.
approaches are not able to correctly model the interactions among plications, and we evaluated it on a number of real-world appli-
multiple technologies, such as the use of multiple languages in the cations, finding both known and new vulnerabilities. The results
same application, or the use of back-end databases to store persis- show that our approach is able to identify complex vulnerabilities
tent data. Third, and most important, these techniques do not take that state-of-the-art techniques are not able to identify.
into account either the intended workflow of a web application or
its extended state. The rest of the paper is structured as follows. In Section 2, we
The intended workflow of a web application represents a model present some examples of the vulnerabilities that are the focus of
of the assumptions that the developer has made about how a user our approach. In Section 3, we introduce the web application model
should navigate through the application. Web applications are of- that is at the basis of our analysis. Section 4 and 5 describe our ap-
ten designed to guide the user through a specific sequence of steps. proach to the identification of multi-module vulnerabilities in web
For example, an e-commerce site could be structured so that the applications. Then, Section 6 presents the results of applying our
user first logs in, then browses a catalog and chooses some goods, analysis to real-world applications. Finally, Section 7 presents re-
and eventually checks out and purchases the items. The constraints lated work, and Section 8 briefly concludes.
among operations (e.g., one has to select some goods before pur-
chasing them) define the application’s intended workflow. 2. MULTI-MODULE ATTACKS
A number of mechanisms have been devised to track the progress
of a user through the intended workflow of a web application. These Multi-module attacks can be categorized into two classes: data-
mechanisms provide ways to store information that survives a sin- flow attacks and workflow attacks. Data-flow attacks exploit the
gle client-server interaction and define the extended state of the insecure handling of user-provided information that is stored in the
application. For example, in a LAMP application1 the extended web application’s state and passed from one module to another. In
state could include the request variables used in each module and, workflow attacks, an attacker leverages errors in how the state is
in addition, the PHP session data and the database tables, which handled by the application’s modules in order to use the application
are shared between modules. The extended state can also include in ways that violate its intended workflow.
information that is sent back and forth between the client and the
server to keep track of a user session, such as hidden form fields Data-flow Attacks.
and application-specific cookies. Therefore, the extended state of In multi-module data-flow attacks, the attacker uses a first mod-
an application is a distributed collection of session-related infor- ule to inject some data into the web application’s extended state.
mation, which is accessed and modified by the modules of a web Then, a second module uses the attacker-provided data in an in-
application at different times during a user session. secure way2 . Examples of multi-module data-flow attacks include
Unfortunately, it is possible that different modules of an applica- SQL injection [3] and persistent (or stored) Cross-Site Scripting
tion have different assumptions on how the extended state is stored attacks (XSS) [12].
and handled, leading to vulnerabilities in the application. We call A web application is vulnerable to a SQL injection attack when it
these vulnerabilities multi-module vulnerabilities to emphasize the uses unsanitized user data to compose queries that are later passed
fact that they originate from the interaction of multiple application to a database for evaluation. The exploitation of a SQL injection
modules, which communicate by reading and modifying the appli- vulnerability can lead to the execution of arbitrary queries with the
cation’s extended state. privileges of the vulnerable application and, consequently, to the
In this paper, we present a novel vulnerability analysis approach leakage of sensitive information and/or unauthorized modification
that combines several analysis techniques to identify sophisticated of data. In a typical multi-module SQL injection scenario, the at-
multi-module vulnerabilities in web applications. In our approach, tacker uses a first module to store an attack string containing ma-
we first leverage dynamic approaches to analyze block-level prop- licious SQL directives in a location that is part of the application’s
erties in the code of web application modules. We then use static extended state (e.g., a session variable). Then, a second module
analysis to extract properties at the module level. Finally, we use reads the value of the same location from the extended state and
model checking techniques to identify possible paths in a web ap- uses it to build a query to the database. As a result, the malicious
plication’s workflow that could lead to an insecure state. SQL directives are “injected” into the query.
The contributions of our approach are the following: In cross-site scripting attacks, an attacker forces a web browser
to evaluate attacker-supplied code (typically JavaScript) in the con-
• We introduce a novel model of web application extended text of a trusted web site. The goal of these attacks is to circumvent
state that characterizes permanent storage and is not limited the same-origin policy, which prevents scripts or documents loaded
to the variables and data structures defined in a single proce- from one site from getting or setting the properties of documents
dure or code module. originating from different sites. In a multi-module XSS attack, a
• We present a novel approach to analyze the interaction be- first module is leveraged to store the malicious code in a location
tween the application’s code and back-end databases, which that is part of the extended state of the application, e.g., in a field
allows for the identification of sophisticated data-driven at- of a table in the back-end database. Then, at a later time, the ma-
tacks. licious code is presented to a user by a different module. The user
browser executes the code under the assumption that it originates
• We introduce an approach to derive the intended workflow of from the vulnerable application rather than from the attacker, effec-
a web application and an analysis technique to identify multi- tively circumventing the same-origin policy.
step attacks that violate the expected inter-module workflow
of a web application. Workflow Attacks.
We implemented our approach in a prototype analysis tool, called Most web applications have policies that restrict how they can
MiMoSA (Multi-Module State Analyzer), for PHP-based web ap- be navigated to ensure that their functionality and data is accessed
1 2
A LAMP application is a web application based on the composi- As it will be clear later, this second module can be a second invo-
tion of Linux, Apache, MySQL, and PHP. cation of the module that performed the first step of the attack.
in a well-defined and controlled way. Usually, to implement these on the values of the state entities. The program paths mod-
restrictions a module stores in the web application’s extended state eled by the view can be executed only when the view pre-
the current navigation state, e.g., whether or not the current user condition is true (evaluated in the context of the current ex-
has logged in or has already visited a certain page. Other modules, tended state).
then, use this portion of the state information to deny or authorize
access to other parts of the application. • Π is the set of post-conditions of the view. These conditions
Workflow attacks attempt to circumvent these navigation restric- model, as a sequence of write operations on state entities, the
tions. For example, a workflow attack could try to directly access a way in which the extended state is modified by the execution
page that is not reachable through normal navigation mechanisms, of the program paths represented by the view. Each write
such as hyper-textual links3 . These attacks may allow one to by- operation has the following form:
pass authorization mechanisms (e.g., gaining access to restricted
portions of a web application) or to subvert the correct business
logic of the application (e.g., skipping a required step in the check- write(EL , ER , Ψ).
out sequence of operations on an e-commerce web site).
This operation copies the content of the left entity EL (which
can also be a constant value) to the right entity ER . The
3. A FORMAL CHARACTERIZATION OF set Ψ contains the sanitization operations applied to the left
MULTI-MODULE VULNERABILITIES entity before its value is transferred to the right entity. If the
In the previous sections, we described how the state of a web sanitization set is empty, no sanitization is applied.
application can be maintained in a number of different ways. In • Σ is the set of sinks contained inside the view. Each sink is
order to abstract away from the various language- or technology- a pair (E, Op) where E is a state entity and Op is a poten-
specific mechanisms, we introduce the concept of state entity. A tially dangerous operation (such as a SQL query or an eval
state entity E is similar to a variable in a traditional programming statement) that uses the entity unsanitized. Note that the un-
language, in that it can be used to store parts of the application’s sanitized use of an entity is not necessarily a vulnerability,
state. Different modules can share information by accessing the since the sanitization process may take place inside one of
same state entities. The set of all the state entities corresponds to the other views (belonging to the same module or to another
what we defined in the introduction as the application’s extended module).
state.
We classify the state entities into two classes: server-side and The extended state of an application may change as the user
client-side. Server-side entities model the part of the extended state moves from one web page to another, clicking on links, submit-
that is maintained on the server. For example, a server-side entity ting forms, following redirects, or just jumping to a new URL. In
can represent a field in a database or a PHP session variable. Client- fact, when a view is entered, the extended state S is updated by ap-
side entities are instead used to model the part of the extended state plying the view’s post-conditions to the extended state in which the
stored in and/or generated by the user’s browser. Cookies, GET application was before entering the view. Let Vi = (Φi , Πi , Σi ) be
and POST parameters are examples of this type of entities. the view entered at step i of the user’s navigation process, then:
3.1 Module Views
To summarize the operations that each module performs on the Sinit = ∅ Si = apply(Πi , Si−1 ).
application’s extended state, we introduce the concept of Module In addition to the set of the entity values, the extended state also
View (or simply view hereinafter). Each view represents all the keeps track of the current sanitization state of each entity. An en-
state-equivalent execution paths in a single module, i.e., all the tity E is sanitized in the application state Si (represented by the
paths in the control-flow graph (CFG) of a module that perform the predicate san(E, Si )) if its value is set by sanitizing write opera-
same operations on the state entities. When an application mod- tions. In this work, we take the standard approach of assuming that
ule is executed, e.g., as a consequence of a user request, the path sanitization operations are always effective in removing malicious
followed by the execution in that module is completely included in content from user-provided data.
one and only one of its views. In this case, we say that the view that
contains the executed path is “entered” by the user. We describe the 3.2 Application Paths
algorithm used to summarize a module into its views in Section 4.3. The presence of the pre-condition predicate in each view limits
Consider, for example, the login module of an application. When the possible paths that a user may follow inside the web application.
a user provides correct credentials, the module may define a set of We say that a path P = hV0 , V1 , . . . , Vn i, where Vi is a view,
new session variables (e.g., to track that the user is authenticated belongs to the set of Navigation Paths N if and only if:
and to load her preferences). On the contrary, the module may
redirect unauthorized users to an error page without changing the
extended state. These two different behaviors depend on the cur- ∀i < n, Si |= Φi+1 ,
rent extended state of the application, namely on the values of the
that is, if and only if the state at each intermediate step satisfies the
request parameters and the content of the database that stores the
pre-condition of the following step.
information about the users. The view abstraction allows us to asso-
Since at the beginning of the execution the application state is
ciate with each behavior a compact representation that summarizes
empty, it must be ∅ |= Φ0 . In order for this to happen, the pre-
its effect on the extended state of the application.
condition Φ0 must be empty or it must contain only predicates on
Formally, a view V is represented as a triple (Φ, Π, Σ) where:
client-side entities. This is justified by the fact that pre-conditions
• Φ is the view’s pre-condition, which consists of a predicate containing only client-side entities (for example, those requesting
a particular value for a certain GET parameter) can always be sat-
3
This attack is sometimes referred to as “forceful browsing.” isfied if the user provides the right value. We define the set of
Application Entry Points η as the subset of views that can be used able to identify. In particular, the application contains two vul-
as starting points in a navigation path: nerabilities. The first vulnerability is caused by the fact that the
index.php module uses usernames retrieved from the database
as part of its output page. Usernames are strings arbitrarily cho-
Vi ∈ η iff ∅ |= Φi .
sen by users during the registration process implemented by the
The subset of navigation paths allowed by the application design create.php module. Since these strings are never sanitized in
is called the Intended Path set, I ⊆ N . These paths represent any module, the application is vulnerable to XSS attacks. The sec-
the workflow of the web application, expressed either through the ond vulnerability is contained in the answer.php module. The
use of explicit links provided by the application or through other module incorrectly checks the value of the loggedin variable in-
common user navigation behaviors. We say that a navigation path stead of _SESSION["loggedin"] in order to verify the user
hV0 , . . . , Vn i belongs to the intended path set of the application if status. However, if the PHP register_globals option is acti-
and only if: vated and the _SESSION["loggedin"] variable has not been
defined (i.e., the user is not logged in), an attacker can include a
„ « loggedin parameter in her GET or POST request, effectively
∀i < n Vi+1 ∈ η∨∃Link(Vi , Vi+1 )∨Vi−1 = Vi+1 ∨Vi = Vi+1 . shadowing the session variable with a value of her choosing. This
could be leveraged to bypass the registration mechanism and ac-
In other words, at each step of the path the next view satisfies one of cess the restricted answer.php module without being previously
the following: it is an application entry point, is reachable through authenticated, thus violating the intended workflow of the applica-
a link, is the same as the previous view (which corresponds to the tion.
user pressing the back button in her browser), or is the same as the As it is clear from the examples above, these vulnerabilities are
current view (which corresponds to the use of the refresh button). carried out in multiple steps and involve multiple modules. The
Given the previous definition, we can now provide a formal char- ultimate goal of our analysis is to detect these multi-module vul-
acterization of the two classes of vulnerabilities we introduce in this nerabilities. However, in order to analyze the interactions between
paper. A violation of the intended workflow of the application oc- modules, it is first necessary to analyze the properties of each mod-
curs when: ule. This analysis is the focus of the rest of this section.
∃p ∈ N | p ∈
/ I, 4.1 Control-Flow and Data-Flow Graphs
that is, when there exists a valid navigation path that is not an in- Extraction
tended path. The first step of the intra-module analysis is the extraction of the
A multi-module data-flow vulnerability is defined as: control-flow and data-flow graphs from each module of the applica-
∃p = hV0 , . . . , Vn i ∈ N , ∃Ex ∈ Σn | ¬ san(Ex , Sn−1 ), tion. Our implementation leverages Pixy [9], a static analysis tool
for detecting intra-module vulnerabilities in PHP applications. We
that is, there is a path in the application such that some portion of adopted Pixy’s PHP parser, control-flow graph derivation compo-
the application’s extended state is used in a security-critical opera- nent, and alias analysis component. In addition, we extended Pixy
tion without being properly sanitized. with a data-flow component that computes the def-use chains for a
module using a standard algorithm [1]. The resulting tool provides
4. INTRA-MODULE ANALYSIS all the information needed for the following steps of the analysis.
The analysis performed by MiMoSA consists of two phases: an The main limitation of Pixy, besides being limited to intra-module
intra-module phase, which examines each module of the applica- analysis only, is the lack of support for object-oriented code. Where
tion in isolation, followed by an inter-module phase, where the ap- needed, we manually pre-processed input modules to work around
plication is considered as a whole. this problem.
The goal of the intra-module analysis is to summarize each appli-
cation module into a set of views, by determining its pre-condition,
4.2 Database Analysis
post-conditions, and sinks. From each module, we also extract the Databases are often used by web applications to store data per-
list of all outgoing links and we associate them with the views they manently. This data is usually accessible by every module of the
belong to. This information is then used by the inter-module anal- application. Therefore, it is important to characterize module-da-
ysis to reconstruct the intended workflow of the application. tabase interactions as they could be leveraged to perform a multi-
The main steps of the intra-module phase are shown in Fig- module attack.
ure 1. Note that these steps are obviously language-dependent. The goal of the database analysis is to translate the interaction
Even though in this paper we focus on applications written in the between an application module and the back-end database into a
PHP language, our approach can be easily extended to extract views set of variable assignments. By doing this, the following steps of
from modules written in other programming languages. the analysis (e.g., the view extraction process) can handle database
To better illustrate our technique, we will refer to a simple web operations and assignments to variables in a uniform way.
application whose code is presented in Figure 2. The application For example, consider the following SQL query that writes the
is written in PHP and consists of three modules: index.php, content of the variable uname to the column username in the
which is the application entry point, create.php, which allows database table users:
new users to create an account, and answer.php, which provides
UPDATE users SET username=$uname WHERE...
some information that should be accessible only to registered users.
The application state is maintained using both a relational database, As a result of the database analysis, a new assignment is added after
which contains the users’ accounts, and a PHP session variable, i.e., the call to the function that executes the query. In our example,
_SESSION["loggedin"]. MiMoSA generates the following assignment node:
Even though the application is very simple, it contains repre-
sentative examples of the security problems that our approach is $DB_dbname_users_username = $uname;
P s i n g n d D

a r a a t a 

P H P D s V i w s L i n k s

a t a b a e e

C F G

o w

o d u l n l y s i s i o n i o n

m e a a e x t r a c t e x t r a c t

o n s u i o n n l y s i s

c t r c t a a

V i w

e t

Figure 1: The main steps of the intra-module analysis. The parts in gray are implemented by Pixy.

Note that DB_dbname_users_username is a new variable cre- 4.3 Views Extraction


ated by our analysis to model the part of the database modified by The goal of this step is to summarize a module into a set of
the UPDATE operation. views. This is a key step in our intra-module analysis, because
The PHP language provides a number of internal functions to it produces the module meta-information necessary to perform the
connect to different types of relational databases. In our proto- inter-module vulnerability analysis.
type implementation, we focused on the MySQL library because To extract a module’s views, we first perform state analysis to
of its popularity. However, if the target application uses a different determine all statements in the control-flow graph that are state-
database, our technique can be easily adapted to address a different related, i.e., that either contain state entities or are control- or data-
set of primitives. In PHP, access to the MySQL database is usually dependent on state-related statements. We consider state entities
performed by first calling the mysql_query function to execute of a PHP application the variables used to refer to request param-
a query, and then by using one of the mysql_fetch functions to eters (_GET, _POST, _REQUEST), cookies ( _COOKIE), ses-
access the results of the query in an iterative fashion. sion variables (_SESSION), and the database variables generated
The main challenge in the database analysis is to properly re- by the database analysis step. This allows us to exclude from fur-
construct the values that a query can assume at runtime, so that we ther analysis statements that do not depend on or modify the appli-
can determine the tables and columns that are modified by the op- cation state. Therefore, in the rest of the analysis we consider only
eration. To achieve this, we traverse the control-flow graph of the the subgraph of the CFG that contains state-related nodes. The al-
module, looking for calls to the mysql_query function. Since, gorithm we use in this step is based on the functional data-flow
in general, static analysis cannot provide the value that the query analysis framework of [19], as implemented in Pixy.
will assume at runtime, we apply a dynamic analysis technique
to the block of PHP code that precedes the function call to de-
rive the names and fields of the tables involved in the query. The
4.3.1 Identifying Sinks and State Entities
analysis extracts the largest deterministic path Pe that precedes the To identify sinks, we determine all nodes in the CFG that contain
mysql_query call. A deterministic path is a sequence of nodes an operation relevant to our analysis. In particular, we look for two
in the control-flow graph that only contains branch instructions types of operations: state-related operations and sink-related op-
whose conditional expressions can be statically determined. We erations. State-related operations are those statements that modify
the server-side state. For example, we identify uses of the session
then remove from Pe any input/output related operation, and we re-
mechanism, that is, assignments to the _SESSION array or calls
place any undefined variable in Pe with a placeholder. to the session_register() function. Sink-related operations
The resulting code is passed to the PHP interpreter in order to are statements where state entities are used in sensitive sinks. Our
dynamically determine the value that the query string can assume technique focuses on identifying inter-module XSS and SQL injec-
along the path Pe. If the resulting query performs an UPDATE or tion attacks, and, therefore, we keep track of state entities displayed
an INSERT operation, it is immediately parsed to extract the as- to the user or used in a database query. Consider, for example, the
signment nodes as shown before. Queries that contain a SELECT create.php module in our example. The analysis identifies two
statement are instead analyzed only when the analysis finds that the relevant operations: at line 19, a database query is executed, and, at
corresponding mysql_fetch function is used to assign the result line 21, the variable _SESSION["loggedin"] is modified.
values to one or more PHP variables. After the relevant operations have been identified, we derive their
Consider for instance the mysql_fetch_assoc call at line conditional guards, i.e., the conditions associated with the branches
16 of index.php of our sample application. Following the data- in the CFG that must be taken in order to reach the statement as-
flow edges we reach the corresponding query string at line 12. sociated with the operation. Note that we only keep track of state-
The dynamic analysis along the deterministic path reconstructs the dependent conditions, as identified by the state analysis. In our ex-
query "SELECT * FROM users". The database analyzer then ample, the two operations that we identified in create.php are
checks the database schema to resolve the "*" symbol to the corre- guarded by the conditional statement at line 9. The analysis also
sponding list of column names and it finally generates the resulting recognizes that the true branch of the conditional must be taken to
assignments nodes: trigger the operations.
Then, for each variable that occurs in a conditional guard or in
$row["username"] = $DB_dbname_users_username; a state- or sink-related statement, we reconstruct its dependency
$row["password"] = $DB_dbname_users_password; with respect to state entities. We currently model several types of
dependencies. In particular, propagation dependencies model the
Once these assignments are introduced to the module, the fol- assignment of one variable to another; call dependencies denote
lowing analysis steps are able to treat the application state stored in the fact that a variable takes its value from the result of a func-
a back-end database and the state stored in program variables in a tion call (in particular, we currently model sanitization functions);
uniform way. binary dependencies model the composition of two variables, for
1 <html> 1 <html>
2 <head> 2 <head>
3 <title>The answer to Life, the 3 <title>Create a new user</title>
4 Universe, and Everything</title> 4 </head>
5
5 </head> 6 <body>
6
7 <body> 7
8 8 <?php
9 <?php 9 if (isset($_POST["user"])) {
10 echo "People that know the answer:"; 10
11 11 $user = addslashes($_POST["user"]);
12 $sql = "SELECT * FROM users "; 12 $pass = addslashes($_POST["pass"]);
13 mysql_select_db("dbname"); 13
14 $res = mysql_query($sql); 14 session_start();
15
15
16 while($row = mysql_fetch_assoc($res)) 16 $sql = ’INSERT INTO users ’ .
17 echo $row["username"]; 17 ’VALUES (\’’ . $user .
18 ?> 18 ’\’, \’’ . $pass . ’\’ )’;
19 19 mysql_query($sql);
20 <a href="create.php">Create User</a> 20
21 21 $_SESSION["loggedin"] = "ok";
22 </body> 22
23 </html> 23 header("Location: answer.php");
24 exit;
index.php 25 }
26 ?>
27
28 <form action="create.php"
1 <?php 29 method="POST">
30
2 session_start(); 31 UserName:
3 32 <input name="user" type="text"><br>
4 if ($loggedin != "ok") { 33 Password:
5 header("Location: index.php"); 34 <input name="pass" type="password"><br>
6 exit; 35 <input name="create" type="submit">
7 } 36
8 37 </form>
9 echo "42"; 38
10 ?> 39 </body>
11 40 </html>
12 <html>
13 <head>
14 <title>The final answer is:</title> create.php
15 </head>
16
17 <body>
18 <a href="index.php">Homepage</a> Table: users
19 </body> +----------+-------------+
20 </html> | Field | Type |
+----------+-------------+
answer.php | username | varchar(32) |
| password | varchar(32) |
+----------+-------------+

Database schema

Figure 2: Example application.

example through mathematical or string operators; constant depen- op), where v and u are state entities and op is an operator, is true
dencies denote that a variable takes a constant value; superglobal if and only if the expression v op u is true. MiMoSA currently
dependencies indicate that a variable takes a value from one of the supports the operators <, >, =, and their combinations. The Prop-
superglobal objects in PHP, e.g., from a request or session variable. agate predicate is used in post-conditions: Propagate(v, u, San)
Multiple dependencies are composed together until each variable is denotes that the value of the entity v is propagated to u applying
reduced to either a constant or a state entity. the sanitization operations specified by the set San. For sinks, the
Note that an additional set of conditional guards can be discov- following predicates are used: InSql(v) denotes that the state entity
ered during the dependency reconstruction analysis: for example, v is used in a SQL query; Displayed(v) indicates that v is displayed
a variable used in an operation might assume different values de- to the user. Conditions can be combined with the use of and, or,
pending on some conditions. Such conditions are added to the set and not operators.
of conditional guards for the operation. In addition, we introduce the special Unknown predicate, which
In our example, the variable _SESSION["loggedin"], used is assumed to be always satisfiable, to model the cases where we
in the state-related statement at line 21 in create.php, is asso- cannot resolve the dependency of a program variable to a state en-
ciated with a constant dependency that models the fact that it was tity. This happens, for example, when a variable takes its value
assigned the constant value ok. The conditional guard at line 9 is from a complex series of calls to functions that we do not model.
reduced to the composition of a call dependency (to the isset() As an example of the view creation process, consider the module
function) and a superglobal dependency (to the _POST["user"] create.php of our sample application. MiMoSA summarizes it
variable). into two views, corresponding to the two branches of the condi-
tional statement at line 9. One view (corresponding to the false
4.3.2 Creating the View branch) has pre-condition not Exist($_POST["user"]) and empty
After all sensitive operations and their complete set of condi- post-conditions and sinks. The other view (corresponding to the
tional guards have been identified, we translate them into pre-con- true branch) has pre-condition Exist($_POST["user"]). The as-
ditions, post-conditions, and sinks. Currently the following predi- signments introduced by the database analysis step to model the
cates are used in pre-conditions: Exist(v) is true if and only if the SQL query at line 19 are modeled with the post-conditions Propa-
entity v is defined in the current application state. Compare(v, u, gate($_POST["user"], DB_dbname.users.username, {addslashes})
and Propagate($_POST["pass"], DB_dbname.users.password, <index.php>.view_0
{addslashes}). In both cases, the analysis keeps track of the san-
itization operated by the addslashes() function. Finally, the
assignment to the session variable _SESSION["loggedin"] is HREF
modeled with the post-conditions Exist($_SESSION["loggedin"])
and Propagate("ok", $_SESSION["loggedin"], ∅). The complete
<create.php>.view_1
set of views for our example application is shown in Table 1.
In a module, the number of extracted views is exponential in the
number of state-related conditional statements. As a consequence, REDIRECT FORM HREF
the view extraction process is slow when dealing with very complex
modules. Therefore, whenever the number of views is determined
<create.php>.view_0
to be larger than a certain threshold, MiMoSA can be configured to
switch to a simplified view construction approach. In this approach,
instead of generating views for all the paths in the CFG of a mod- REDIRECT REDIRECT
ule, we only generate the views corresponding to a number of paths
sufficient to include all the state- and sink-related operations con-
<answer.php>.view_0 <answer.php>.view_1
tained in the module. As a result, all the post-conditions and sinks
of the module are extracted and will be analyzed during the detec-
tion phase. However, since not all their possible combinations are Figure 4: Intended workflow of our example application.
considered, the simplified approach might introduce inaccuracies.

4.4 Links Extraction The main steps of the inter-module phase are shown in Figure 3.
The last step before starting the inter-module vulnerability anal- Note that since this phase is built on top of the view abstraction, it
ysis is to extract the links contained in the module and associate is completely independent of the programming languages in which
them with the views they belong to. the modules are developed.
We parse both PHP and HTML code looking for HTML hyper-
links, form actions and inputs, source attributes of frames, and calls
5.1 Intended Workflow
to the PHP function header()4 . We also have a limited support In the first step of the inter-module phase, we use the link infor-
for link extraction from JavaScript code. If the URL of the link is mation extracted during the intra-module analysis to connect all the
dynamic, i.e., it is generated using a block of PHP code, the link views of the application into a single graph.
extraction routine tries to determine its runtime value by applying a We connect a source view Vi to a target view Vj if Vi contains
dynamic analysis technique similar to the one used in the database a link l that references Vj ’s module and the parameters provided
analysis phase. by l satisfy the pre-condition of Vj . In particular, we adopt the
Once all the links have been extracted, we identify the set of following two rules:
views to which each link belongs. In order to do this, we determine 1. If Vj ’s pre-condition contains predicates over client-side state
the conditional branches in the CFG that must be taken in order for entities, we check that the extracted link satisfies these re-
a link to be shown to the user and we compare these branch expres- quirements. For example, if the pre-condition requires the
sions with the pre-conditions of the extracted views. Consider, for presence of a particular GET parameter, we check that the
instance, the link to answer.php contained in the create.php link provides a parameter with the required name.
module of our example application. Our analysis recognizes that it
is displayed only if the execution follows the true branch of the con- 2. If Vj ’s pre-condition contains predicates over server-side state
ditional statement at line 9. <create.php>.view_0 is the only entities, we assume that these predicates are always satisfied.
view compatible with this execution and, therefore, it is identified The rationale is that, in general, it is not possible to deter-
as the source view of the link. mine the extended state of the application considering the
To correctly model the application workflow, in addition to hav- two views in isolation, because it depends on the path that
ing the names of the modules to which one can navigate from a the user has followed to reach Vi . Therefore, we conserva-
given view, we also need to extract the set of inputs that are sub- tively assume that the state can satisfy Vj ’s pre-condition.
mitted along the link. In particular, we need to determine which When both conditions are satisfied, we assume that there is an
GET and POST requests parameters are submitted if a user follows intended path between the two views and we connect them to-
the link. For example, in our sample application, if a user submits gether. For example, the link in <index.php>.view_0 (line
the form at line 28 of the create.php module, the user-provided 20) is connected to the view <create.php>.view_1 but not to
parameters user and pass are submitted as a part of the POST <create.php>.view_0. In fact, the pre-condition of <cre-
request to create.php. ate.php>.view_0 requires the existence of a POST parameter
named user that is obviously not provided if the user clicks on
5. INTER-MODULE ANALYSIS the link in index.php. The intended workflow for our example
In the second phase of our analysis, we connect the views ex- application is given in Figure 4.
tracted during the intra-module analysis into a single graph. This Finally, the analysis identifies the application’s entry points. We
graph models the intended workflow of the entire web application. exclude the modules that appear inside an include statement
We then use a model checking technique to identify multi-module from this step of the analysis, because they are generally not in-
data-flow vulnerabilities and violations of the intended workflow. tended to be directly accessed by the user. Of the remaining mod-
ules, we consider as entry point any view that has either an empty
4 pre-condition or a pre-condition that contains only predicates over
The header() function in PHP is commonly used to set the
HTTP Location header to redirect users to a different page. GET parameters (see Section 3).
Module View ID Pre-conditions Post-conditions Sinks
index.php view_0 ∅ ∅ Displayed(DB_dbname.-
users.username)
create.php view_0 Exist($_POST["user"]) Propagate($_POST["user"], ∅
DB_dbname.users.username,
{addslashes})

Propagate($_POST["pass"],
DB_dbname.users.password,
{addslashes})

Exist($_SESSION ["loggedin"])
Propagate("ok", $_SES-
SION["loggedin"], ∅)
create.php view_1 not Exist($_POST["user"]) ∅ ∅
answer.php view_0 not (Exist($loggedin) and ∅ ∅
Compare($loggedin, "ok", =))
answer.php view_1 Exist($loggedin) and ∅ ∅
Compare($loggedin, "ok", =)

Table 1: Views generated for the example application of Figure 2.

I n t e n d e d

P u b l i c v i e w V u l n e r a b i l i t y

w o r k fl o w e p o r t

V i e w

i d e n t i fi c a t i o n d e t e c t i o n R

d e t e r m i n a t i o n

e t

Figure 3: The main steps of the inter-module analysis.

Unfortunately, in some cases it is not possible to differentiate • Any empty redirect view is public. An empty redirect view
between an application’s entry point and the developer’s failure to is a view that does not have any post-condition, any sink, and
put the necessary safety checks into a module. For example, in only contains a redirect link. This models all the views used
our experiments we tested a web application where in one of the to detect and redirect unauthenticated users that try to access
administration pages the developer forgot to put a check to verify a restricted page.
that the user was actually logged in as administrator. Our technique
classified the views of this module as entry points since they did not In the example, our algorithm marked <create.php>.view_0,
have any pre-condition at all. Nevertheless, the user of our tool can <create.php>.view_1, and <answer.php>.view_0 as
easily detect these vulnerabilities by inspecting the automatically public views. The first two because they are reachable without
generated list of entry points. any change in the application state and the last one because it is
an empty redirect.
5.2 Detecting Public Views
The intended path introduced in Section 3 did not model a very 5.3 Detection Algorithm
important concept of a web application: the existence of publicly- Our graph exploration mechanism simulates a user that moves
accessible pages. These pages (such as the FAQs pages) are very from one view to another. At each step, we select a new view to
common in many web sites but they are rarely intended as entry add to the current path, we evaluate its pre-condition against the
points to the application. Therefore, we do not generate any secu- current state, and, if the pre-condition is satisfied, we update the
rity alert if it is possible to access these pages violating the intended state to reflect the effects of the view’s post-conditions.
workflow of the application. Each path is analyzed to check if it satisfies the definition we
For this reason, we adopted the following rules to detect and provided in Section 2 for multi-step data-flow vulnerabilities and
mark the publicly-accessible views: workflow violations. In general, if the graph is correct, it is pos-
sible to find all the vulnerabilities simply by trying each possible
• Starting from one of the application entry points, all the views navigation path in the application. Our solution is similar to a
that are reachable along some intended path traversing only model checking approach, and, unfortunately, it suffers from the
views that have empty post-conditions are marked as pub- same path explosion problem. Therefore, we limit our analysis to
lic. This models the fact that if it is possible to reach a view paths that contain up to one loop and with a total length limited
through a path that does not change the extended state of the by a user-defined upper bound. In our experiments, in fact, we ob-
application, the access to the view is not supposed to be re- served that most of the vulnerabilities can be exploited using a very
stricted. limited number of steps (usually less than 5).
Our detection algorithm traverses the graph following the in- of valid values (numbers and dots) that do not allow a user to mount
tended paths. At each step it checks if it is possible to jump to an attack against the application.
one of the views that should not be reachable from the current po- MiMoSA also reported several violations of the intended work-
sition. If it succeeds, it raises a workflow violation alert and it does flow of the web applications. Even though in most of the cases they
not go any further along that path. This means that some vulner- corresponded only to anomalous paths into the application (e.g., di-
abilities may not be discovered because they are hidden “behind” rectly jumping from the login to the logout page), we were also able
other vulnerabilities. In this case, the user should fix the discovered to confirm that some of the reported violations correspond to actual
vulnerability and run the analysis again. vulnerabilities that could be exploited to gain unauthorized access
By applying MiMoSA to our sample application, we identify the to a restricted page.
two existing vulnerabilities. Figure 5 shows the reports produced While the inter-module analysis is the more time-consuming
by MiMoSA for the example application of Figure 2. phase, the intra-module analysis is certainly the more fragile, since
it is where the static analysis techniques that we use introduce most
of the approximations. Any imperfection in this phase can result
Workflow Violation: DISPLAY of unsanitized entity:
Path: Entity: DB_dbname.users.username in an increasing number of both false positives and false negatives.
- index.php[view_0] Example of Exploitable Path: For instance, during the construction of the intended paths, we ob-
- answer.php[view_1] - create.php[view_0]
- index.php[view_0] served that some of the views were isolated, with no connection
to any other part of the application6 . This was probably caused by
an error in the view extraction, such as a missing link or a wrong
Figure 5: Vulnerabilities detected in the sample application of pre-condition predicate.
Figure 2. To better test the accuracy of our intra-module analysis and eval-
uate its impact on the final results, we selected one of the appli-
cations in our test suite (i.e., SimpleCMS) and manually analyzed
the output of each step of the view extraction phase. The results
6. EVALUATION are shown in Table 4. MiMoSA achieves a high accuracy in the
To prove the effectiveness of our approach in detecting multi- extraction of database operations, links, post-conditions, and sinks.
module data-flow vulnerabilities and violations of the intended Also the rate of unknown conditions, i.e., the pre-conditions that
workflow of a web application, we ran our tool on five real-world MiMoSA was not able to correctly reconstruct, is reasonable, con-
web applications. sidering that we are using a static analysis technique.
The selected applications satisfy three requirements: i) they are In this application, the number of generated views is, instead,
written in PHP and they contain multiple modules, ii) they use both considerably higher than the number of views actually present in
session variables and database tables to maintain the application the application code. This happens because of two main reasons.
state, and iii) they do not contain object-oriented code. The list of First, MiMoSA might generate views corresponding to paths that
chosen applications is shown in Table 2. The table also shows the are infeasible in the program, such as the ones that traverse nodes
list of known vulnerabilities for each application. with conflicting conditions. The presence of these views does not
For each application we ran the intra-module analysis in order affect the final results since they are never entered during the detec-
to extract the set of views corresponding to the application mod- tion phase. The second reason is that MiMoSA can generate dupli-
ules. We then ran the inter-module analysis to connect together the cate views, i.e., views with different but equivalent pre-conditions.
views and calculate the intended application workflow. Finally, we Even though this may lead to inaccuracy in the final results, in most
applied our detection algorithm to find anomalies in the possible of the cases its main effect is just to slow down the path generation
navigation paths and to detect multi-module data-flow vulnerabili- phase.
ties.
The results of our tests are summarized in Table 3. For the intra-
module phase, the table reports the number of views extracted and 7. RELATED WORK
the time required by the analysis5 . In the inter-module phase, we In the introduction, we briefly mentioned some recent works in
explored up to one hundred million paths, covering at least all the the areas of intrusion detection and application firewalls that fo-
paths of length 3. The table reports the time required to generate cus on detecting and blocking web-based attacks. Since our work
the paths and the alert messages raised by our tool. The alerts are focuses on vulnerability analysis, and, consequently, deals with a
grouped according to the entities involved (for the data-flow vulner- different class of problems than the detection of attacks at runtime,
abilities) and the modules (for the workflow violations). For both we are not going to further review these works here.
data-flow and workflow vulnerabilities, we report the number of There is a number of recent works in the area of vulnerability
violations detected by our tool, the number of false positives, and analysis of web-based applications. Most of these approaches are
how many of the remaining violations correspond to exploitable based on taint propagation analysis applied to application written
vulnerabilities. in PHP [7, 9, 10, 22] or Java [6, 14].
MiMoSA was able to find all the known vulnerabilities and to The WebSSARI tool [7] is one of the first works that applies
discover several new ones. static taint propagation analysis to find security vulnerabilities in
With regard to multi-module data-flow vulnerabilities, we had PHP. WebSSARI targets three specific types of vulnerabilities:
only one false positive. In fact, in the MyEasyMarket application, cross-site scripting, SQL injection, and general script injection.
the PHP variable REMOTE_ADDR is saved in the database and later The tool uses flow-sensitive, intra-procedural analysis based on a
printed to the user. Even though the value of the variable is never lattice model and typestate. When the tool determines that tainted
sanitized, it is automatically set to the IP address of the client’s data reaches sensitive functions, it automatically inserts runtime
machine by the PHP engine. Therefore, it only has a limited range
6
These views were not taken into consideration by our path explo-
5
All the experiments were executed on a Pentium 4 3.6GHz with ration algorithm since they could not provide any useful informa-
2G of RAM. tion to the user.
Application Name PHP Files Description Known Vulnerabilities
Aphpkb 0.71 59 Knowledge-base management system –
BloggIt 1.01 24 Blog engine CVE-2006-7014
MyEasyMarket 4.1 23 On-line shop –
Scarf 2006-09-20 18 Conference administration CVE-2006-5909
SimpleCms 22 Content management system BID 19386

Table 2: PHP applications used in our experiments. Vulnerabilities are referenced by their Common Vulnerabilities and Exposures
ID (CVE) or their Bugtraq ID (BID).

Intra-Module Analysis Inter-Module Analysis


Application
Views Time Time DF Violations-(FP) DF Vulnerabilities WF Violations-(FP) WF Vulnerabilities
Aphpkb 4680 31:24m 3:00h 0-(0) 0 17-(10) -
BloggIt 339 2:12m 0:31h 14-(0) 14 3-(0) -
MyEasyMarket 449 1:12:00h 6:36h 2-(1) 1 1-(0) 1a
Scarf 1721 7:30m 1:10h 3-(0) 3 3-(0) 1
SimpleCms 417 0:22m 2:50h 8-(0) 8 5-(0) 4
a
Detected through inspection of the entry point list, as discussed in Section 5.1.

Table 3: Results of the experiments. DF: Data Flow, WF: Work Flow, FP: False Positives.

guards, i.e., sanitization routines, that, at runtime, remove mali- larity, have been applied to other languages as well: Nguyen-Tuong
cious content from user input. et al. [15] propose modifications of the PHP interpreter to dynami-
Xie and Aiken [22] use intra-block, intra-procedural, and inter- cally track tainted data in PHP programs, and Haldar et al. [5] apply
procedural taint propagation analysis to find SQL injection vul- a similar approach to the Java Virtual Machine.
nerabilities in PHP code. This approach uses symbolic execution Pietraszek and Vanden Berghe [16] present a unifying view of
to model the effect of statements inside functions. These effects injection vulnerabilities and describe a general approach for de-
are summarized into the pre- and post-condition sets for each ana- tecting and preventing injection attacks. This approach is based on
lyzed function. The function pre-conditions contain a derived set of instrumenting the platform, such as the PHP interpreter, to track the
memory locations that have to be sanitized before the function invo- flow of untrusted data inside the applications. A context-sensitive
cation, while the post-conditions contain the set of parameters and string evaluation is then performed at each sensitive sink to detect
global variables that are sanitized inside the function. To model the injection attacks.
effects of sanitization routines, the approach uses a programmer- All dynamic approaches described above either are able or, at
provided set of possible sanitization functions, considers certain least in theory, can be extended to detect multi-module data-flow
forms of casting as a sanitization process, and, in addition, keeps a attacks. The main difference with our approach is that we are able
database of sanitizing regular expressions, whose effects are speci- to detect such vulnerabilities statically, considering all the possi-
fied by the programmer. ble application’s paths. Also, none of these approaches can detect
Pixy [9,10], which we have described in Section 4.1, specifically workflow vulnerabilities because they do not model or take into
targets the identification of intra-module XSS vulnerabilities. This account the application’s intended workflow.
tool seems to be the most complete static PHP analyzer in terms There are also several recent approaches that try to identify SQL
of the PHP features modeled. To the best of our knowledge, it injection attacks by building models of legitimate queries that can
is the only publicly available tool for the analysis of PHP-based be performed by an application and comparing these models to
applications. the dynamically-generated queries. Whenever these queries struc-
None of the described approaches performs inter-module analy- turally violate the static model, an attack is detected. For example,
sis, that is, all the vulnerabilities identified by these approaches are the AMNESIA tool [6] targets SQL injection attacks in Java-based
local to a single application module. Unlike our approach, these applications. AMNESIA defines a SQL injection attack as the at-
techniques do not have any notion of the application’s extended tack in which the logic or semantics of a legitimate SQL statement
state, and, therefore, they are unable to capture the workflow vul- is changed due to malicious injection of new SQL keywords or op-
nerabilities described in Section 2. By considering all inputs gen- erators. Thus, to detect such attacks, the semantics of dynamically-
erated from outside of an application as being tainted, these ap- generated queries is checked against a derived model that repre-
proaches should be able to identify some types of multi-module sents the intended semantics of the query.
data-flow vulnerabilities. However, because of the locality of the Su and Wassermann [20] propose another approach that uses the
analysis, they are incapable of tracing the origins of multi-steps at- syntactic structure of the program-generated output to identify in-
tacks, and, as a result, are subject to a much higher false positive jection attacks, such as XSS, XPath injection, and shell injection
rate. attacks. The current implementation, called SqlCheck is designed
There is also a number of works that apply dynamic analysis to detect SQL injection attacks only. The approach works by track-
techniques to the analysis of web-based applications. For example, ing sub-strings from the user input through the program execution.
approaches that use dynamic taint propagation analysis, conceptu- The tracking is implemented by augmenting input strings with spe-
ally similar to Perl’s taint mode but often with a more refined granu- cial characters, which mark the start and the end of each sub-string.
Views Accuracy Rate of
Extracted Optimal DB Operations Links Post-Conds Sinks Unknown Conditions
417 47 96% 78% 100% 100% 15%

Table 4: Accuracy of the view extraction step for SimpleCMS.

Then, dynamically-generated queries are intercepted and checked [5] V. Haldar, D. Chandra, and M. Franz. Dynamic Taint Propagation
by a modified SQL parser. Using the meta-information provided by for Java. In Proceedings of the Annual Computer Security
the sub-string markers, the parser is able to determine if the query’s Applications Conference (ACSAC’05), pages 303–311, December
2005.
valid syntactic form is modified by the sub-string derived from user [6] W. Halfond and A. Orso. AMNESIA: Analysis and Monitoring for
input, and, in that case, it blocks the query. NEutralizing SQL-Injection Attacks. In Proceedings of the
Both AMNESIA and SqlCheck can successfully detect SQL in- International Conference on Automated Software Engineering
jection attacks at the time of injection; however, without a sig- (ASE’05), pages 174–183, November 2005.
nificant implementation effort, none of them can detect data-flow [7] Y.-W. Huang, F. Yu, C. Hang, C.-H. Tsai, D. Lee, and S.-Y. Kuo.
Securing Web Application Code by Static Analysis and Runtime
vulnerabilities such as persistent XSS attacks. Obviously, both ap- Protection. In Proceedings of the International World Wide Web
proaches, as being based on the syntactic structure of legitimate Conference (WWW’04), pages 40–52, May 2004.
output, are incapable of detecting workflow vulnerabilities/attacks. [8] N. Jovanovic, E. Kirda, and C. Kruegel. Preventing Cross Site
Request Forgery Attacks. In Proceedings of the IEEE International
Conference on Security and Privacy for Emerging Areas in
8. CONCLUSIONS Communication Networks (Securecomm), pages 1–10, September
As web applications that perform security-critical tasks become 2006.
more sophisticated, there is an increasing need for techniques and [9] N. Jovanovic, C. Kruegel, and E. Kirda. Pixy: A Static Analysis
Tool for Detecting Web Application Vulnerabilities. In Proceedings
tools that can address the novel security issues introduced by these of the IEEE Symposium on Security and Privacy, pages 258–263,
applications. In particular, because of the heterogeneous nature of May 2006.
web applications, it is important to develop new techniques that are [10] N. Jovanovic, C. Kruegel, and E. Kirda. Precise Alias Analysis for
able to analyze the interaction among multiple application modules Static Detection of Web Application Vulnerabilities. In Proceedings
of the ACM SIGPLAN Workshop on Programming Languages and
and different technologies. Analysis for Security (PLAS’06), pages 27–36, June 2006.
In this paper, we presented a novel vulnerability analysis ap- [11] E. Kirda, C. Kruegel, G. Vigna, and N. Jovanovic. Noxes: A
proach that takes into account the multi-module, multi-technology Client-Side Solution for Mitigating Cross Site Scripting Attacks. In
nature of complex web applications. Our technique is able to model Proceedings of the ACM Symposium on Applied Computing (SAC),
both the intended workflow and the extended state of a web appli- pages 330–337, April 2006.
cation in order to identify both workflow and data-flow attacks that [12] A. Klein. Cross Site Scripting Explained. Technical report, Sanctum
Inc., 2002.
involve multiple modules. [13] C. Kruegel and G. Vigna. Anomaly Detection of Web-based
We developed a prototype tool, called MiMoSA, that implements Attacks. In Proceedings of the ACM Conference on Computer and
our approach and we tested it on a number of real-world applica- Communication Security (CCS ’03), pages 251–261, October 2003.
tions. The results show that by modeling explicitly the state and [14] B. Livshits and M. Lam. Finding Security Vulnerabilities in Java
workflow of a web application, it is possible to identify complex Applications with Static Analysis. In Proceedings of the USENIX
Security Symposium (USENIX’05), pages 271–286, August 2005.
vulnerabilities that existing state-of-the-art approaches are not able [15] A. Nguyen-Tuong, S. Guarnieri, D. Greene, and D. Evans.
to identify. Automatically Hardening Web Applications Using Precise Tainting.
Future work will focus on two main directions. First, we will In Proceedings of the International Information Security
include additional technologies so that we can cover a larger class Conference (SEC’05), pages 372–382, May 2005.
of applications. Second, we plan to leverage the findings of the [16] T. Pietraszek and C. Vanden Berghe. Defending against Injection
Attacks through Context-Sensitive String Evaluation. In
static analysis to automatically generate test drivers to reduce the Proceedings of the International Symposium on Recent Advances in
number of the false positives. Intrusion Detection (RAID’05), pages 372–382, 2005.
[17] I. Ristic. ModSecurity. http://www.modsecurity.org/,
November 2006.
Acknowledgments [18] D. Scott and R. Sharp. Abstracting Application-Level Web
This research was partially supported by the National Science Security. In Proceedings of the International World Wide Web
Foundation, under grants CCR-0238492, CCR-0524853, and CCR- Conference (WWW’02), pages 396–407, May 2002.
0716095. [19] M. Sharir and A. Pnueli. Two Approaches to Interprocedural Data
Flow Analysis. In N. Jones and S. Muchnick, editors, Program
Flow Analysis: Theory and Applications, chapter 7. Prentice Hall,
9. REFERENCES 1981.
[20] Z. Su and G. Wassermann. The Essence of Command Injection
[1] A. V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Attacks in Web Applications. In Proceedings of the Annual
Techniques, and Tools. Addison-Wesley Longman Publishing Co., Symposium on Principles of Programming Languages (POPL’06),
Inc., 1986. pages 372–382, January 2006.
[2] M. Almgren, H. Debar, and M. Dacier. A Lightweight Tool for [21] G. Vigna, W. Robertson, V. Kher, and R.A. Kemmerer. A Stateful
Detecting Web Server Attacks. In Proceedings of the Network and Intrusion Detection System for World-Wide Web Servers. In
Distributed System Security Symposium (NDSS), pages 157–170, Proceedings of the Annual Computer Security Applications
February 2000. Conference (ACSAC 2003), pages 34–43, December 2003.
[3] C. Anley. Advanced SQL Injection in SQL Server Applications. [22] Y. Xie and A. Aiken. Static Detection of Security Vulnerabilities in
Technical report, Next Generation Security Software, Ltd, 2002. Scripting Languages. In Proceedings of the USENIX Security
[4] Common Vulnerabilities and Exposures. Symposium (USENIX’06), pages 271–286, August 2006.
http://www.cve.mitre.org/, 2006.

You might also like