Stored Procedures
Stored Procedures
Stored Procedures
Random Input the procedure, i.e. number of input parameters along with
their data types. We generate first test case with random
(a) value e.g., id = 1. Symbolic execution with concrete input is
Stored Proce- New Test Case also called Concolic Execution. Flow of execution of stored
(d) dure Execution procedure is given in Figure 2.
Expression Evaluation
Subsystem
can be recursive, for such cases we model chains of foreign Figure 7: Plan for Join Query
key relations. We disable foreign key relations only if we
detect a cyclic relation during the processing of foreign key
constraints. We generate data by solving these constraints
applied to all rows in a table model using Z3 SMT solver choice corresponding to condition is not possible. This is a
and populate tables during setup for procedure execution. possible limitation but in our evaluation we did not encounter
any case where solver timed out. If solver times out, we
4.2 Processing of SQL consider path to be unreachable. When we reach a node
Execution of PL/pgSQL procedure can be divided into which needs the results of the earlier node, we can easily
execution of 1) PL/pgSQL Language constructs and 2) SQL provide symbolic models of the results processed earlier. In
statements. Because of same type system, both rely on the example above, second NestedLoop requires results from the
same expression processing subsystem as shown in Figure 6 first NestedLoop and a Sequential Scan. Since output model
to process conditions and expressions. Expressions can also of first NestedLoop is compatible with general result model,
contain function calls allowing recursive procedure calls. plan can be symbolically modeled. This means our grammar
Everytime PL/pgSQL encounters an SQL statement it calls coverage of SQL is not based on syntax, rather it is based
the SQL processing system to get the results. PostgreSQL on the underlying execution plan of SQL.
prepares multiple possible execution plans with estimate of
cost of executing each plan. An optimal plan represented as 4.3 Symbolic Execution
tree is selected for execution. We have instrumented PostgreSQL to record execution of
For example, Select * from table1 t1, table2 t2, table3 t3 PL/pgSQL, SQL and Expression Processing in a file Trace
where t1.col1 = t3.col1 and t1.col2 = t2.num1 and t2.num2 > Log. In our symbolic executor, State object is used to keep
2 joins three tables and places extra conditions on column of track of the explored and unexplored paths and it tells us
table2. The plan for this query is shown in Figure 7. During about execution of the program. The algorithm for gener-
execution, system scans table1 and table3 discarding the ating test cases is outlined in Algorithm 1. This algorithm
rows that dont meet condition. Results of the two scans are explores states in a depth first search manner. The Trace
joined using a nested loop based join. Latter these results Log is processed line by line. Information extracted from
are joined with output of scan of table2. each trace line from Trace Log is stored in our data structure,
We treat the nodes in the plan tree as basic building blocks called StackFrame. Each state maintains its StackFrame,
of the program and modeled their execution in our symbolic which allows processing of nested queries/functions. When
executor. When a sequential scan executes, we extract the a trace line comes in for processing, the State object can
table identifier, the columns that appear in the output and tell if it has already processed that line before. If the trace
condition(s). Extracted conditions decide the results of the line is not processed, then a new stack frame is prepared in
scan. In the absence of condition, all rows are selected as State to store information about the current trace line. The
result. Whereas, in presence of WHERE clause, we impose function ProcessTraceLine models all kinds of operations
the conditions on the table model such that each row in a and generate ChoiceSets and add them in State. If trace
table can either satisfy or dissatisfy the scan condition. line generates only one choice which means only one path
Here we draw inspiration from numerous works on symbolic out of state, then we are certain about program flow. At
execution that treat a simple IF condition as a choice point this point we simply add condition to the solver and get new
where system can take any of the two paths depending on traceline for processing. If the ChoiceSet of State has more
whether the condition or its negation is true. Here we have a than one condition then we are not certain about the path
larger number of conditions that are not related to each other taken by the current trace, so we stop analyzing trace and
by a simple negation. Each of the conditions, if true, has solve constraints to generate test cases.
a corresponding result model containing the rows selected On line 20, a new condition is fetched from the State. This
by that condition. Therefore, we define choice as a set tells us if the State has advanced (moved to next state) since
of conditions with corresponding result model. ChoiceSet the last call to NextChoice. If StateAdvanced is True, then
is the set of all choices at the node. During exploration we have a new stack frame in State and condition should be
of paths, at each node, we select a choice from ChoiceSet added to solver stack. Otherwise, we will pick next condition
one by one, append table integrity constraints as well as from stack frame. We explore all conditions in the stack
conditions from already processed nodes along this path. frame before advancing to next state. This happens when
We then solve it to generate corresponding result. However, Solver says that the condition is not satisfiable. We check if
if solver gives no solution for the condition, it means that we are still on same trace line then it means that State has
Table 5: Data Type Models
1 Solver = Create new solver object
Solver
2 // Contains the path condition stack Data Type Model Summary
3 State = Create new state object Type
4 // Keeps a stack of ChoiceSets for trace Integer Integer Direct Mapping
5 T = Create a test case with random values Numeric Real Direct Mapping
6 // Main Symbolic Execution Code Boolean Boolean Direct Mapping
7 while (T is not NULL)
Execute test case T on database Integer represents ASCII value
8 Character Integer
9 For Line in ExecutionTrace: Restricted to A-Z, a-z, 0-9
10 if (Line is already processed) Integer is dictionary lookup
Text Integer
11 continue with next loop iteration value
12 State.Advance() // prepare new stack frame Date Integer Integer is offset from base date
13 ChoiceSet = ProcessTraceLine(Line)
14 State.AddChoiceSet(ChoiceSet)
15 if (ChoiceSet.size == 1)
16 Add the condition from the only choice in with MergeJoin algorithm in the next execution. We refer to
Solver this problem as plan instability for SQL statements. Both
17 continue with next loop iteration the plans would give the same final output but it will throw
18 break off our symbolic executor which was expecting the exact
19 while true
20 Condition, StateAdvanced = State.NextChoice()
same trace till the last processed trace line. We addressed
21 if Condition == NULL this issue by turning off the use of many algorithms that
22 Terminate = Backtrack() planner can use to optimize the queries, such as BitmapScan,
23 if Terminate HashAgg, HashJoin, IndexScan, IndexonlyScan and Merge-
24 T = NULL Join. This just deactivates possible planner optimizations
25 break without restricting language grammar in any case.
26 else
27 continue with next iteration
28 else if StateAdvanced 5. IMPLEMENTATION DETAILS
29 Add condition in Solver Stack
30 5.1 Data Type Models
31 else
32 Replace condition at Solver Top of Stack Data elements in the tables and variables have specific
33 Solve the path condition stack in solver data types. We represented these data elements as symbolic
34 if (path condition is satisfied) variables. Most data types dont translate directly to Z3
35 T = Make test case from solver result solver data types. Only three database types integer, numeric
36 break and Boolean map directly to corresponding solver types. We
Algorithm 1: Test Case Generation Algorithm provided added support for character, text and date types
by modeling them as integers because these are some of the
most common data types in databases.
Characters naturally translate to integers through their
ASCII values. In order to ensure that the solver does not
not advanced.In such case NextChoice function will return assign invalid values to integers representing characters, a
the second condition from the same ChoiceSet. As the new constraint is added to restrict the value of the such integers
condition comes from the same ChoiceSet we need to replace between a and z orA and Z or 0 and 9. This restricts
the condition in the Solver stack instead of adding a new special characters but using special characters in character
frame. The NextChoice function returns a NULL, if all type columns is not a common practice. Strings are mapped
choices are explored. This means that end of path is reached to integers. This allows for exact string matches but does
and we need to move back. Function BackTrack removes not support partial matches with LIKE operation in SQL.
the StackFrame from top of stack for both State and Solver In order to model date as an integer, we have maintained a
objects and continues with next loop iteration. The next reference date in the symbolic executor. The integer mod-
iteration then gets a choice from the new stack top, leading eling of the date is an offset from this reference date. The
to exploration of another path. Whenever backtracking ends timestamp data types are compatible with date type and
up emptying the State stack, then we know that the complete we use the same model for them. Any data element in the
tree has been explored. Backtrack function returns true for database can be initialized to NULL. So NULL is modeled as
terminating the search. a special integer with value -101 as it is not a common hard
From the algorithm, it is clear that we are progressing coded value. Data type models are summarized in Table 5.
towards a full path condition by generating many cases that
are based on incomplete paths through the program. We 5.2 Expression Processing Models
skip the processing of the trace lines already processed based While processing constraints we come across a variety of
on the assumption that the execution of those statements expressions. The expressions that we have modeled include
will be exactly the same. This is a reasonable assumption binary operators, Boolean operators, testing for NULL values,
for PL/pgSQL language nodes but it is not completely true Coalesce expressions and Functions calls. The arguments for
for SQL. SQL, being a declarative language, leaves it for the these operators can be table columns, variables, constants
database to decide how to execute the statement through a or expressions. PostgreSQL represents expressions as a tree
planner instead of simply executing programmer instructions. before processing it which is printed out as a pre-order traver-
Hence it is possible for the SQL statement to join two tables sal in Trace Log. The results of processing each type of
using NestedLoop algorithm in one execution and use sorting expression is shown in the Table 6. For many cases in the
Table 6: Expressions and Table Constraint Models
Expression Type Expression Model
Boolean Operator And Result: And (Arg1 , Arg2)
Boolean Operator Or Result: Or (Arg1 , Arg2)
Boolean Operator Not Result: Not(Arg)
NULLTEST Result: Arg == NULL OR Arg != NULL
Result: (Arg1 op Arg2)
Conditional Binary Operator op
Condition: And(Arg1 != NULL, Arg2 != NULL)
Result: ExprResult
Arithmetic Binary Operator op Condition:Or(And(Arg1 != NULL, Arg2 != NULL, ExprResult == Arg1 op Arg2)
, And( Or(Arg1 == NULL, Arg2 == NULL) , ExprResult == NULL))
Result: ExprResult
Condition: Or And(Arg1 != NULL, ExprResult == Arg1),
Coalesce Expression
And(Arg1 == NULL, Arg2 != NULL, ExprResult == Arg2) , ... ,
And(All Args == NULL, ExprResult == NULL)
Result: FunctionResult
PL/pgSQL Function Call Condition on Function Start: And(Param1 == Arg1, Param2 == Arg2, ...)
Condition on Function End: FunctionResult == ActualReturnExpresion
Non PL/pgSQL Function Call Result: FunctionResultFromModel ( FunctionID, ArgumentList )
Constraint Type Constraint Model
Unique Constraint Condition: And( Not( And( row1.col1 == row2.col1, row1.col2 == row2.col2 ) ) ,
(Assuming 3 rows in table Not( And( row1.col1 == row3.col1, row1.col2 == row3.col2 ) ) ,
Composite constraint [col1, col2]) Not( And( row2.col1 == row3.col1, row2.col2 == row3.col2 ) ) )
Foreign Key Constraint
Condition: Or( table1.row1.col2 == table2.row1.col1 ,
(Assuming table1 = 1 row,
table1.row1.col2 == table2.row2.col1 ,
table2 = 2 rows,
table1.row1.col2 == NULL )
table1.col2 referencing table2.col1)
table, we have specified processing result expression with a sion takes multiple inputs and returns its first NOT NULL
condition. Result is the expression that is sent back to the result. Like arithmetic operators the result is a new symbolic
main trace processor. Main trace processor use the result variable whose value comes from a set of mutually exclusive
according to its needs, e.g, IF condition trace line process- constraints.
ing would use it as the decision condition and generate two
choices from it; the condition itself and its negation. Whereas 5.2.2 Function Calls
an assignment operation trace line processing will use it to We classify the function calls according to the function
assert that the target variable is equal to this expression language. Execution of functions written in PL/pgSQL is
value. The condition expression represents the conditions printed in our trace and we can follow the path through these
that are automatically added with every choice by the trace functions. For function calls appearing in an expression we
processor using the result expression. simply return a new symbolic variable representing its result
value with no condition imposed on it. We also generate
a condition that assigns the values to the function input
5.2.1 Boolean Operators, Binary Operations and Co- variables. We call this condition Call Condition for the
alesce Expression function and it links the variables in the current function
All three Boolean operations and binary conditional oper- with the variables in the new context. Reaching trace line
ators map directly to the Z3 solver API. We have two types of the expression containing function calls, we get the trace
of NULLTESTs. ISNULL and ISNOTNULL test. Both lines for execution of the functions listed in that expression.
return true when the value is null and not NULL respec- Call condition is loaded into solver when we get trace for the
tively. Result expressions for ISNULL and ISNOTNULL as start of the function. At the end of the function, the value
in Table 6 are returned. In binary arithmetic operators, for returned by the function is stored in result variable created
NULL arguments, result of the expression is also NULL. For earlier. Implementing this functionality requires tracking the
the arithmetic operation, a new symbolic variable is created call stack in the symbolic executor. The maximum depth
and returned as a result. Conditions are imposed on the of the call stack for the symbolic executor is a configurable
new symbolic variable. We have two conditions inside the constant.
OR function. Both the conditions are mutually exclusive
because of the NULL checks in each condition i.e. if both 5.2.3 Special Functions and Sequences
arguments are not NULL then the OR condition inside the Functions not written in PL/pgSQL dont show execution
second AND can never be true. Therefore the solver has details in the trace. These functions includes SQL built-in
to impose ExprResult == Arg1 op Arg2 in order to get a functions like the nextval for sequences, date and time and
satisfying assignment. Similarly, if any of the arguments is type conversion. Model for the nextval function relies on
NULL then the OR condition will be true but the first AND our model for sequences to output a symbolic expression.
can never be true so in order to get a satisfying assignment We model the sequence object as symbolic variable of type
ExprResult == NULL must be imposed. A coalesce expres- integer with starting value as 0. We return the expression
start value + integer, whenever nextval function is called for Table 7: Exception Cases
this object. The model for current date and time function Total Exception Cases 93
simply returns 0. That means we are treating the base date No data found in SELECT Over 50
of our symbolic executor as current date. Another function Exceptions due to sequence reset 23
that we have modeled takes in the sequence name and returns Constraint violation due to unchecked inputs 10
the sequence id in database. To model this function we query User defined exceptions on input validation 4
database for the relevant information and return it as model
output.
condition for the choice can be obtained from the expression
5.3 Constraint Models processor where as in later case, the result comes from the
We model check, unique, foreign key and primary key(NOT SQL processing. Consequently, generating cases for the SQL
NULL check and unique) constraints. Check constraints is the way to explore the possible directions the code can
are translated into pre-order traversed tree expressions and take from the IF condition.
processed in same way to get conditions. The conditions are Assignment statements usually generate only one choice i.e
added to the solver for each row in the table(s). The models target variable == expression result. Another kind of assign-
for foreign key constraints and unique constraints are shown ment occurs with the result of the SQL queries. The keyword
in Table 6. into allows the SQL statements in PL/pgSQL procedures
For unique constraint in Table 6, the condition is generated to directly assign their result values to variables. This is
for a composite constraint on col1 and col2. Composite handled similarly except for the fact that multiple variables
constraint means that value of both columns taken together can be simultaneously assigned new values. In both cases
must be unique. In general the condition needs to be asserted these nodes have ChoiceSets of size 1.
for all row combinations. Table 6 also shows foreign key Function start and function return statements are part of
constraint model. Column value has to be one of the values the support for the PL/pgSQL function calls. The Choice-
in the referenced column or it needs to be NULL. Sets for these are described in Table 6. We have also added
support for FOR loop over SELECT statements. The state-
5.4 SQL Plan Node Processing ment at the start of the FOR loop acts like an assignment
We support SequentialScan, NestedLoop, ResultNode and statement and assigns a row from the FOR loop SELECT
ModifyTable nodes. In previous section we discussed first query results to the variables on which the loop runs. FOR
two types of nodes. ResultNode is used to generate a single loop works with currently modeled data types.
row of results. The values in the row being generated can be
variables, constants or any expression containing variables 6. EVALUATION
and constants.We treat results of the expressions as symbolic We evaluated our technique on an open source Accounting
element in a new symbolic row that ResultNode is supposed and Enterprise Resource Planning (ERP) system, PostBooks
to produce. The conditions from processing of multiple which has a significant amount of its business logic written in
expressions are appended to generate a single condition. The functions in the database. It has a schema consisting of 251
row produced by the expression results by solving conditions tables. Functionality of 151 procedures is fully supported by
is the result for this node. our models. Manual inspection of some of these procedures
ModifyTable is top plan node and supports inserts, updates indicated that 100% branch coverage was achieved. The
and deletes. Thus it relies on its child node to provide the symbolic execution was configured with the stack depth of 5,
data it needs to perform its job. For DELETE operation, result size of 2 excluding unconditional scans and cross joins,
this node needs a list of row identifiers of the rows that need and an initial table size of 2 rows each.
to be deleted. For UPDATE operation it need the new rows All experiments were performed on a 1.9Ghz Dell i7 ma-
along with the row identifiers of the rows which the new rows chine. Many of the procedures in PostBooks have user defined
are supposed to replace. For INSERT operations, the new exceptions to give a user friendly response to the client. Dur-
row must be the output of the child plan. The simple case, ing exploration of the procedures, we automatically generate
where we specify the list of values to be inserted, is covered cases that drive the procedure execution towards such ex-
by the ResultNode, which makes the row out of the values. ceptions, e.g. schema constraint violations. We found 93
In more complicated cases, where we are inserting the result test cases that trigger user defined exceptions or constraints
of a query into a table, we simply have the whole query plan violations.
as the child plan for ModifyTable. In all three cases, we The scalability of our technique is shown by the number of
check that the table modifications do not violate the table modeled tables for many stored procedures. PostBooks uses
constraints by creating constraint conditions on the modeled a very large number of constraints to ensure data integrity. In
table and adding the result as a choice. We get our second particular, the schema has over 400 foreign key constraints.
choice as negation of the first choice which is responsible for This means that long chains of tables related by foreign
generating cases which violate constraints. keys are common. Even if a procedure directly uses a few
tables, we had to model the tables it references to be able to
5.5 PL/pgSQL Construct Processing set up data properly for execution of the procedure. In 71
The PL/pgSQL language constructs such as IF condition, procedures our symbolic executor ended up modeling over 30
FOR loops over SQL query results, assignment statement, tables and the constraints associated with them. Usually time
variable initialization from SQL results, function start, func- taken for symbolic execution of stored procedure depends on
tion return statement are supported. IF statement can be a number of tables being modeled.
simple expression or it can have a complicated condition with Now we will analyze the nature of the exception cases
an SQL statement embedded in it. For the simple case, the found. Breakup of 93 exception cases found in the fully
1 CREATE OR REPLACE FUNCTION attachcontact(integer, 1 CREATE OR REPLACE FUNCTION createcrmacct(integer,
integer) integer,...)
2 DECLARE 2 DECLARE
3 pcntctId ALIAS FOR $1; 3 _crmacctid ALIAS FOR $1;
4 pcrmacctId ALIAS FOR $2; 4 _custid ALIAS FOR $2;
5 BEGIN 5 ...
6 UPDATE cntct SET cntct_crmacct_id = pcrmacctId 6 BEGIN
7 WHERE cntct_id = pcntctId; 7 INSERT INTO crmacct (crmacct_id,...,
8 ... crmacct_cust_id,..., crmacct_prospect_id,
9 END; crmacct_taxauth_id,...)
8 VALUES (_crmacctid,..., _custid,...,_prospectid,
Figure 8: Code for NOT NULL Constraint Violation _taxauthid,...);
9 ...
with Test Case attachcontact(2, NULL)
Figure 9: Code for Foreign Key Constraint Violation
with Test Case createcrmacct(2,...4,...,8,6,...)
explored procedures is listed in Table 7. Of these 93 cases,
over 50 are user defined exceptions. Manual inspection of
many procedures and cases involved in these exceptions stored procedure createcrmacct in Figure 9 value of crma-
indicates that majority of them correspond to no data found cct cust id is a reference field, i.e., foreign key from customer
cases for SELECT statements. The typical scenario for these table. Test case createcrmacct(2,...4,...,8,6,...) gen-
exceptions is the case where a select statement tries to fetch erated by our symbolic executor will throw an exception for
configuration data. For these cases, we observed that the violation of foreign key constraint.
programmer has raised exceptions in the code to give the
client a meaningful message about the missing data. Another
class of user defined exceptions is based on validation of input 7. RELATED WORK
values. Exceptions are raised to notify the user exactly which Symbolic Execution. Clarke [8] and King [19] pioneered
input is incorrect. traditional symbolic execution for imperative programs with
We have 23 cases of primary key constraint violations. This primitive types. Much progress has been made on symbolic
is because most of the tables in PostgreSQL have sequences execution during the last decade. PREfix [3] is among the
linked to the primary key columns as default values. In this first systems to show the bug finding ability of symbolic
situation, the only way a primary key violation can occur is execution on real code. Generalized symbolic execution [18]
when the programmer specifies the value of the primary key defines symbolic execution for object-oriented code and uses
column himself or when the sequence is reset. From manual lazy initialization to handle pointer aliasing.
inspection of the procedure source we know that there is no DART [15] combines concrete and symbolic execution
mention of any value overriding the primary key columns to collect the branch conditions along the execution path.
default value in the procedure source. So the constraint DART negates the last branch condition to construct a new
violation is triggered by resetting of the sequence. This is path condition that can drive the function to execute on
something that we have allowed in our sequence model be- another path. DART focuses only on path conditions in-
cause accidental sequence reset is a common problem during volving integers. To overcome the path explosion in large
an implementation phase of ERP systems in our experience programs, SMART [14] introduced inter-procedural static
resulting in a buggy behavior. In 10 other cases, the symbolic analysis techniques to compute procedure summaries and re-
executor found cases that violate NOT NULL constraint dur- duce the paths to be explored by DART. CUTE [26] extends
ing UPDATE statements. Inspection of the code indicated DART to handle constraints on references.
that the UPDATE statements in these functions are directly EGT [5] and EXE [6] also use the negation of branch pred-
using some of the input values of the procedure allowing the icates and symbolic execution to generate test cases. They
SMT solver to set them to any value to trigger a constraint increase the precision of symbolic pointer analysis to handle
violation. pointer arithmetic and bit-level memory locations. KLEE [4]
Just like assertions provide an Oracle for test generation is the most recent tool from the EGT/EXE family. KLEE
of normal programs, user defined exceptions and constraint has been shown to work for many off the shelf programs
violations provide an oracle against which our technique can written in C/C++. Many recent research projects have pro-
be used to automatically generate valid test cases. Our tech- posed techniques for scaling symbolic execution by parallel
nique will also generate other valid and important test cases and incremental execution [28, 32, 30, 29, 24, 36].
whose results should be verified by the application program- Testing Database Applications. The testing of stored
mer to find if they are correct or they identify a mistake procedures is closely related to the testing of database driven
or missing exception handling in the stored procedure itself. applications written in imperative languages. While we
Our test cases are useful as they all cover unique scenarios have not found any previous work on automated test case
in stored procedure execution using symbolic execution. generation for stored procedures, the testing of database
In order to evaluate effectiveness of constraint models, we driven applications has received significant attention in the
injects faults in code and execute our symbolic executor. past decade.
Sample code for attachcontact is given in Figure 8 . There Binnig et al. [2] introduced reverse relational algebra for
is NOT NULL check constraint on field cntct crmacct id generating a test case given an SQL SELECT statement and
of table cntct. Our symbolic executor generates a test case a desired output. In general, test oracles like this are not
attachcontact(2, NULL) for check constraint violation. On available. Further, a given output exercises one particular
reaching line 6 of the code, we will get exception. In other behavior of an SQL statement whereas our work is concerned
with exhausting many possible behaviors under some bounds. Khalek and Khurshid [17] presented a framework that uses
Veanes et al. [34] modeled SQL queries as constraints and Alloy [16] to model a subset of SQL queries by automatically
used SMT solvers to generate database table data. They also generating SQL queries, database state and expected results
need some specifications of the required output like the result of queries when executed on a database management system.
should be empty or result should have a specified number of They have modeled SQL queries and database schema using
rows. While using SMT solvers to improve the analysis, they Alloy which used SAT solver to populate tables. The focus of
are also concerned with finding one particular database state their work is testing the correctness of database management
for one particular query. Tuya et al. [33] introduced a new system itself and not of the applications using them.
coverage criterion for testing of SQL statements considering A common approach in many of the above techniques
semantics of multiple SQL constructs. They later [9] worked is using declarative specifications in Alloy [16] and using
on a constraint based approach to generate test cases for SQL the Alloy Analyzer to solve them. The solutions are often
queries that satisfy their proposed criteria. The criteria was converted back automatically to INSERT queries that can
written in Alloy [16] i.e. required additional specifications populate a database. While Alloy is a powerful language,
and was focused on single SQL statements. In contrast to converting imperative constraints like those which are mixed
the above approaches, our approach is focused on complete with queries in a stored procedure are difficult to model,
stored procedures analysis and does not need any additional resulting in a substantially reduced SQL subset being mod-
specifications. eled. In contrast, our technique of instrumenting the query
One of the first tools for automated test case generation of nodes in the database query execution engine results in both
database driven application was AGENDA [11]. AGENDA declarative queries and imperative constraints converted into
generates test cases for transactions in applications by con- a series of imperative sequential tasks on which standard
sidering the database schema constraints. To model the symbolic execution techniques can be applied. While we do
conditions imposed by the transaction logic, it relies on user not support the entire SQL grammar, our limitations are
supplied constraints and is focused on specific kinds of tests. not fundamental in nature and the technique can be easily
As far as we know, Emmi et al. [13] were the first ones extended to other SQL statements.
to apply the idea of symbolic execution to database driven Commercial IDEs like Visual Studio3 have support for unit
applications. They used concolic execution and used two testing stored procedures. However, this support is limited
constraint solvers to obtain the test cases. First constraint to automatically filling databases, executing the procedure,
solver was used to solve arithmetic constraints while other and comparing output. The database values and procedure
was specialized to solve string constraints. They were able inputs are not automatically generated. There is also work
to support partial string matches that are expressed in SQL on preventing SQL injection attacks in stored procedures [35].
with LIKE keyword. Although they supported a wide variety They combine static analysis to instrument SQL statements
of constraints that appear in the WHERE clause, their sup- in stored procedures and a dynamic part to compare the
ported SQL grammar was limited to queries using a single statements to what was observed statically. However, this
table unlike our work which supports joins with any number technique is specific to SQL injection and cannot be extended
of tables. Emmi et al. designed their symbolic executor to to generic test case generation.
maximize branch coverage in the code. Li et al. [20] and
Pan et al. [23] extended their approach for different coverage
criterion. Our work is different from the above symbolic
8. CONCLUSIONS AND FUTURE WORK
approaches as it is hosted inside the database and is able to We presented a novel approach of applying symbolic exe-
apply symbolic analysis at a finer granularity and is therefore cution to automatically generate test cases for stored proce-
able to generate test cases for complete stored procedures. dures. We instrumented the internal execution plans gener-
Marcozzi et al. [21] proposed an algorithm for testing con- ated by PostgreSQL database management system to extract
trol flow graph of Java code interacting with the database. constraints and used the Z3 SAT solver to generate test
It generates Alloy [16] relational model constraints for a cases consisting of table data and procedure inputs. We
given database schema, a finite set of paths from the control treated values in database tables as symbolic, modeled the
flow graph, and variables along those paths (both method constraints on data imposed by the schema and by the SQL
variables and those used in SQL queries). It generates a statements executed by the stored procedure, and used a
symbolic variable for each value taken by the method vari- SMT solver to find values that will drive the stored procedure
ables or database tables during path exploration. The Alloy on a particular execution path.
model generated ensures the execution of the path that in- We showed in our evaluation on more than a hundred
volves these symbolic variables. Alloy Analyzer solves these stored procedures from a large business application that
constrains to generate test cases. In later work [22], they this technique can generate many useful test cases and also
described how an SMT solver can be potentially used to uncover bugs that lead to schema constraint violations or
model the constraints and generate test cases. This would user defined exceptions.
make analyzing larger applications possible. However, this is In future, we plan to extend our technique to handle the
an idea paper and they have not implemented or evaluated remaining query node types and release our tool for end-
it. Potentially such a technique would face the same hurdles to-end automated testing of PostgreSQL procedures. In
as other approaches implemented outside the database man- addition, we intend to make seamless symbolic execution
agement system. In our work, because of implementing it at from Java applications to PostgreSQL stored procedures
the query engine level, we are able to implement and eval- that can share a symbolic map. We are also working on a
uate complex queries like multi-table joins which previous technique to automatically generate larger tables if it enables
techniques are unable to handle. a particular path exploration in code to be executed.
3
https://www.visualstudio.com
9. REFERENCES Conference on Software Testing, Verification and
[1] C. Barrett and C. Tinelli. CVC3. In Proc. 19th Validation (ICST), pages 5059, 2011.
International Conference on Computer Aided [18] S. Khurshid, C. S. Pasareanu, and W. Visser.
Verification (CAV), pages 298302, 2007. Generalized Symbolic Execution for Model Checking
[2] C. Binnig, D. Kossmann, and E. Lo. Reverse query and Testing. In Proc. 9th International Conference on
processing. In IEEE 23rd International Conference on Tools and Algorithms for the Construction and Analysis
Data Engineering (ICDE), pages 506515, 2007. of Systems (TACAS), pages 553568, 2003.
[3] W. R. Bush, J. D. Pincus, and D. J. Sielaff. A Static [19] J. C. King. Symbolic Execution and Program Testing.
Analyzer for Finding Dynamic Programming Errors. Communications ACM, 19(7):385394, July 1976.
Software Practice Experience, 30(7):775802, June 2000. [20] C. Li and C. Csallner. Dynamic symbolic database
[4] C. Cadar, D. Dunbar, and D. R. Engler. KLEE: application testing. In Proceedings of the Third
Unassisted and Automatic Generation of International Workshop on Testing Database Systems
High-Coverage Tests for Complex Systems Programs. (DBTest), 2010.
In Proc. 8th Symposium on Operating Systems Design [21] M. Marcozzi, W. Vanhoof, and J.-L. Hainaut. A
and Implementation (OSDI), pages 209224, 2008. relational symbolic execution algorithm for
[5] C. Cadar and D. Engler. Execution Generated Test constraint-based testing of database programs. In IEEE
Cases: How to make systems code crash itself. In Proc. 13th International Working Conference on Source Code
International SPIN Workshop on Model Checking of Analysis and Manipulation (SCAM), pages 179188,
Software, pages 223, 2005. 2013.
[6] C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and [22] M. Marcozzi, W. Vanhoof, and J.-L. Hainaut. Towards
D. R. Engler. EXE: Automatically Generating Inputs testing of full-scale SQL applications using relational
of Death. In Proc. 13th Conference on Computer and symbolic execution. In Proceedings of the 6th
Communications Security (CCS), pages 322335, 2006. International Workshop on Constraints in Software
[7] L. A. Clarke. A System to Generate Test Data and Testing, Verification, and Analysis, pages 1217, 2014.
Symbolically Execute Programs. IEEE Transactions on [23] K. Pan, X. Wu, and T. Xie. Database state generation
Software Engineering (TSE), 2(3):215222, May 1976. via dynamic symbolic execution for coverage criteria. In
[8] L. A. Clarke. Test Data Generation and Symbolic Proceedings of the Fourth International Workshop on
Execution of Programs as an aid to Program Validation. Testing Database Systems, page 4, 2011.
PhD thesis, University of Colorado at Boulder, 1976. [24] S. Person, G. Yang, N. Rungta, and S. Khurshid.
[9] C. De La Riva, M. J. Suarez-Cabal, and J. Tuya. Directed Incremental Symbolic Execution. In Proc.
Constraint-based test database generation for SQL 2011 Conference on Programming Languages Design
queries. In Proceedings of the 5th Workshop on and Implementation (PLDI), pages 504515, 2011.
Automation of Software Testing, pages 6774, 2010. [25] D. A. Ramos and D. R. Engler. Practical, Low-Effort
[10] L. de Moura and N. Bjrner. Z3: An Efficient SMT Equivalence Verification of Real Code. In Proc. 23rd
Solver. In International Conference on Tools and International Conference on Computer Aided
Algorithms for the Construction and Analysis of Verification (CAV), pages 669685, 2011.
Systems (TACAS), pages 337340, 2008. [26] K. Sen, D. Marinov, and G. Agha. CUTE: A Concolic
[11] Y. Deng, P. Frankl, and D. Chays. Testing database Unit Testing Engine for C. In Proc. 5th joint meeting
transactions with AGENDA. In Proceedings of the 27th of the European Software Engineering Conference and
international conference on Software engineering, pages Symposium on Foundations of Software Engineering
7887, 2005. (ESEC/FSE), pages 263272, 2005.
[12] B. Elkarablieh, I. Garcia, Y. L. Suen, and S. Khurshid. [27] C. Seo, S. Malek, and N. Medvidovic. Component-Level
Assertion-based Repair of Complex Data Structures. In Energy Consumption Estimation for Distributed
Proc. 22nd International Conference on Automated Java-Based Software Systems. In Proc. 11th
Software Engineering (ASE), pages 6473, 2007. International Symposium on Component-Based
[13] M. Emmi, R. Majumdar, and K. Sen. Dynamic test Software Engineering, pages 97113, 2008.
input generation for database applications. In [28] J. H. Siddiqui and S. Khurshid. ParSym: Parallel
Proceedings of the 2007 International Symposium on Symbolic Execution. In Proc. 2nd International
Software Testing and Analysis, pages 151162, 2007. Conference on Software Technology and Engineering
[14] P. Godefroid. Compositional Dynamic Test Generation. (ICSTE), pages V1: 405409, 2010.
In Proc. 34th Symposium on Principles of [29] J. H. Siddiqui and S. Khurshid. Scaling Symbolic
Programming Languages (POPL), pages 4754, 2007. Execution using Ranged Analysis. In Proc. 27th Annual
[15] P. Godefroid, N. Klarlund, and K. Sen. DART: Conference on Object Oriented Programming Systems,
Directed Automated Random Testing. In Proc. 2005 Languages, and Applications (OOPSLA), 2012.
Conference on Programming Languages Design and [30] J. H. Siddiqui and S. Khurshid. Staged Symbolic
Implementation (PLDI), pages 213223, 2005. Execution. In Proc. 27th Symposium on Applied
[16] D. Jackson. Alloy: a lightweight object modelling Computing (SAC): Software Verification and Testing
notation. ACM Transactions on Software Engineering Track (SVT), 2012.
and Methodology (TOSEM), 11(2):256290, 2002. [31] N. Sorensson and N. Een. An Extensible SAT-solver. In
[17] S. A. Khalek and S. Khurshid. Systematic testing of Proc. 6th International Conference on Theory and
database engines using a relational constraint solver. In Applications of Satisfiability Testing (SAT), pages
Proceedings of the Fourth IEEE International 502518, 2003.
[32] M. Staats and C. Pasareanu. Parallel Symbolic Methods and Software Engineering, pages 4968.
Execution for Structural Test Generation. In Proc. 19th Springer, 2009.
International Symposium on Software Testing and [35] K. Wei, M. Muthuprasanna, and S. Kothari.
Analysis (ISSTA), pages 183194, 2010. Preventing sql injection attacks in stored procedures.
[33] J. Tuya, M. J. Suarez-Cabal, and C. de la Riva. Full In Proceedings of the Australian Software Engineering
predicate coverage for testing SQL database queries. Conference (ASWEC), pages 191198, 2006.
Journal of Software Testing, Verification and [36] G. Yang, C. S. Pasareanu, and S. Khurshid. Memoized
Reliability, 20(3):237288, 2010. Symbolic Execution. In Proc. 2012 International
[34] M. Veanes, P. Grigorenko, P. De Halleux, and Symposium on Software Testing and Analysis (ISSTA),
N. Tillmann. Symbolic query exploration. In Formal ISSTA 2012, pages 144154, 2012.