3.2.1 Core JavaScript.
To provide a formal description of the taint tracking engine, we consider a core subset of non-strict ECMAScript 5.1 [
12]. Such subset, whose grammar is given in Figure
2, is subject to many simplifying assumptions and excludes many of the language features (e.g., exception handling) and syntactic sugars described in the full ECMAScript specification. Nevertheless, it suffices to provide the key ideas of our taint tracking approach. Additional JavaScript features not included in our core subset but supported by our implementation are briefly discussed later.
The considered subset of JavaScript supports all the literals of primitive values and the null keyword. Objects are initialized through object expressions by providing the list of key-value pairs for data properties. Similarly, functions are anonymous and specified through function expressions by supplying the list of formal arguments and the body. Like in traditional core programming languages, we also assume a set of unary and binary operators such that their evaluation does not have side effects and does not involve any implicit type conversion of its operands (e.g., in JavaScript, the evaluation of ”obj:” + requires the object to be converted into a string). Our core subset of JavaScript supports variable declarations to introduce bindings in the local scope, variable reads to load their value by means of the corresponding identifier, and variable assignments to store a given value. Object properties are accessed for reading and writing as well, but only using the bracket notation Expr[Expr], in contrast to the full language that also supports the dot notation \(Expr.Id\). Furthermore, although variable and property writes are expressions in JavaScript, here they are just statements.
Finally, the full JavaScript language provides three different semantics of invocation: (i) call, (ii) method call, and (iii) constructor invocation. To complicate things, there exist numerous situations in which a function is implicitly executed by the JavaScript engine, including implicit type conversions, accessing accessor properties, asynchronous operations, and so on. For simplicity, here we just consider regular function calls as the simplest form of invocation. Moreover, we assume that functions must be invoked with a number of actual arguments equal to the number of formal arguments and that invocations must terminate with a return statement, differently from the real-world language in which functions can be invoked with a number of actual arguments lesser or greater than the number of formal arguments and an invocation may exit without executing a return statement.
3.2.2 Abstract Machine.
The abstract machine manipulates a stack of abstract values reflecting the taint of values in the stack of the JavaScript program while also maintaining maps that associate abstract values to variables and object properties. With the aim of identifying the different types of sources that influence concrete values during the script execution (local storage, network, etc.), we model abstract values with
label sets \(\tau\). Accordingly, the empty label set annotates a non-tainted concrete value, whereas a non-empty label set annotates a concrete value that is tainted by the labels it contains. A label is a triple
\(\ell = ({\it type},{\it loc},{\it extras})\) that represents the operation that led to its creation, where
\({\it type} \in \mathit {Str}\) is the type of label, which identifies a specific type of operation,
\({\it loc} \in \mathit {Str}\) represents the code location where the operation was executed, and
\({\it extras} \in \mathit {Str}^{*}\) is a sequence of extra information about that operation. For example, the label (
"localStorage.getItem","https://foo.com/index.js:15:48", 〈"theme","dark"〉) represents a call to the
getItem method of
localStorage, located at row
15 and column
48 of
https://foo.com/index.js, with extra information about the key of the item being accessed,
”theme”, and its value,
”dark”. We use labels to represent also sinks in the taint analysis. When a source generates a value, we put a new label representing that operation into the set that annotates such value; afterward, when a tainted value (i.e., a value annotated with a non-empty label set
\(\tau\)) reaches a sink, we record the information flow as a pair
\((\tau ,s)\), where
s is a label representing the sink.
Figure
3 shows the list of instructions supported by our abstract machine. The instructions for manipulating the stack of abstract values (
push,
pop), accessing the maps of abstract values associated to local variables (
initvar,
readvar,
writevar), and accessing object properties (
initproperty,
readproperty,
writeproperty) are common between our tool and Ichnaea [
19]. We point out that the identifiers that refer to variables and properties in the abstract machine are fully qualified names, which are fresh and unique along the whole script execution. In addition, our abstract machine supports the
join instruction, which extracts two abstract values from the top of the stack and then pushes their join (in terms of label sets, their union). We need this instruction for our treatment of native functions, whose internal behavior is not observable because their code is not instrumented by Jalangi.
3.2.3 Instruction Generation.
The idea behind taint tracking is to propagate the taint of the sub-expressions to the result upon expression evaluation. We hold the invariant that the last values pushed onto the stack represent the taint of the previously calculated sub-expressions, hence we generate instructions that push a number of abstract values on top of the stack equal to the number of sub-expressions. At the end of the evaluation, at the top of the stack there will be the value reflecting the taint of the whole expression. Taints are then propagated across statements (e.g., assignments).
Figure
4 shows how abstract machine instructions are generated for each type of operation specified in the core subset of JavaScript. For each expression and statement, we formalize a simple model of the callbacks implemented in Jalangi for taint tracking purposes [
29]. Jalangi allows one to define callbacks both before and after execution of operations: we denote these callbacks with pre() and post(), respectively. The callbacks have access to different parameters depending on the performed operation—for example, callbacks for binary expressions have access to both the operand and the values of the evaluated sub-expressions. For most of the core language constructs, we do not need the full generality of the Jalangi approach and we just define the post() callback. The figure is described in detail in the rest of this section.
Basic Operations. Literal and function expressions are constant values in the source code, which generally do not represent sensitive information; accordingly, a push\((\emptyset)\) instruction is issued for the abstract machine, indicating that such values are not tainted. Objects, instead, are more complex because object expressions define a number of properties, let us say n, whose values have been pushed onto the stack just before evaluating the object itself; those n properties must be initialized in reverse order using the n topmost values of the stack, and therefore we emit n initproperty instructions.
The taint of the result of unary operations corresponds to the taint of the single operand at the top of the stack, hence it is not necessary to issue any instructions for the abstract machine. Instead, in the case of binary operations, the taint of the two operands must propagate to the result; therefore, we emit a join\(()\) instruction to merge the two topmost taints into a single one.
Variables and object properties are both considered containers of data, so their taint is associated to the value they carry. A variable is declared in the abstract machine as a result of generating an initvar\((\mathit {name})\) instruction. Afterward, it can be accessed by issuing readvar\((\mathit {name})\) for reading and writevar\((\mathit {name})\) for writing. The \(\mathit {name}\) parameter includes both the variable name and the identifier of the activation frame where the variable has been declared. Analogously, the access to the property \(v_p\) of an object \(v_o\) is possible by emitting a readproperty\((\mathit {oid}(v_o),\mathit {offset}(v_p))\) to get the assigned value and writeproperty\((\mathit {oid}(v_o),\mathit {offset}(v_p))\) to put another one. In this case, \(\mathit {oid}\) and \(\mathit {offset}\) are mappings from JavaScript objects and property values to their corresponding unique identifiers in the abstract machine. It is also worth noting that writevar and writeproperty do not extract from the stack the value they store, hence those instructions must be followed by pop\(()\); furthermore, readproperty and writeproperty do not pull out of the stack the taints for the object and the property name, so once again a pop\(()\) instruction must be issued twice.
Function Calls. Handling function calls is the most challenging part of the analysis because the JavaScript specification defines a significant amount of native functions—that is, built-in functions available in the browser which are not amenable for instrumentation because their internals are invisible to Jalangi. Ichnaea bridges the lack of information on data dependencies with the help of manually crafted models for specific native functions, which generate instructions for the abstract machine that mimic their internals. However, this strategy is only effective for the supported functions, thus it cannot easily scale to the whole JavaScript standard library and is also hard to maintain in terms of engineering effort.
Since we do not have access to Ichnaea and its models of native functions, we design a different solution based on standard concepts, which is not as precise as Ichnaea’s manually crafted models but works for any invocation of both user-defined and native functions, thus being more general and easier to maintain. Essentially, since operations performed within a native function are unknown, in such a case we determine an over-approximation of the taint for the returned value. Along with the existing assumptions, we suppose that the instrumented code allows us to intercept the entry into and exit from a function invocation, be it user defined or native, except for the case of the invocation of a native function by another native function, because such calls are not observable by Jalangi due to lack of instrumentation. With this in mind, let us examine all the possible types of function calls.
The simplest instance is the one in which a user-defined function invokes another user-defined function: in this case, when the function is called, the stack contains the taints of n actual arguments as the n topmost values. Hence, first of all, the formal arguments must be initialized in reverse order by generating n initvar instructions. Then, upon reaching a return e statement, the taint for the value of e on top of the stack is stored in a special variable, called _ret_, with the writevar("_ret_") instruction, and is subsequently pulled out of the stack with pop\(()\). After the call, the top value of the stack is the taint for the called function, so it is discarded with pop\(()\), and finally the value associated to the special _ret_ variable is read by emitting a readvar("_ret_") instruction, to communicate the returned value to the caller.
When a user-defined function invokes a native function, the taints for n actual arguments combine into a single value as a result of generating \(n - 1\) join instructions; the obtained value on top of the stack is the over-approximated taint for the result of the invocation. If no argument has been passed, we push \(\emptyset\) onto the stack. Such taint is then recursively joined with the taints associated to the properties of objects passed as an argument because the values of these properties may influence the result as well. At the end, the taint for the result is stored into the special _ret_ variable if the returned value is primitive, and otherwise we recursively propagate it to the properties of the returned object and replace the taint associated to the _ret_ variable with \(\emptyset\), thus preserving the rule that objects are never tainted. The recursion through objects for updating the taint of their properties must take into account the possibility of cyclic references, such as in case of an object referring to itself; to avoid infinite recursion, we process an object only if it has not been visited yet in a single traversal.
Finally, we discuss the case in which a native function invokes a user-defined function. This may happen in the case of higher-order native functions—that is, functions that accept another function as an argument. An example is the Array.prototype.map native function, which progressively applies a given callback to all the elements of an array and then creates a new array with the corresponding results. Note that we can observe this kind of invocation because we are able to detect when the execution enters a user-defined function, knowing that the last observed operation is a native function call. In this case, we duplicate the topmost value of the stack (i.e., the resulting taint of the native function) for each primitive value passed as an argument while issuing push\((\emptyset)\) for each passed object. This means that the taints for the arguments may depend on each of the arguments passed to the native caller. Duplication is achieved using an auxiliary _arg_ variable, which is written once and read as many times as necessary. On the return, we load the taint for the _ret_ variable and perform a weak update of the native function’s resulting taint by joining these two values: in fact, the user-defined callback may have returned a value annotated with a new label, which we have to consider for the final result of the native function call or the arguments of another invocation of the callback.
Our approach is formally described in Figure
5: we define a collection of procedures that emit instructions for the abstract machine whenever the execution enters and leaves a user-defined or native function, in accordance with the preceding rules. In particular, we call from inside user-defined functions the
Enter-User-Function procedure before initializing formal arguments and the
Leave-User-Function procedure before returning to the caller, and we call the
Enter-Native-Function and
Leave-Native-Function procedures respectively before and after the invocation of native functions, since we are able to observe the entry and exit of this kind of invocation only from the caller (see Figure
4). The behavior of these procedures is stateful with respect to a stack of abstract activation frames, which reflects the nature of invoked functions in the runtime call stack of being user defined (with the
”USER” string) or native (with the
”NATIVE” string). We clarify that such abstract call stack is just an auxiliary data structure for generating abstract machine instructions, and hence the abstract machine is totally independent from it. It is possible to manipulate the abstract call stack with the following standard procedures:
Push-Frame pushes an abstract frame onto the stack,
Pop-Frame removes an abstract frame from the stack, and
Top-Frame gets the abstract frame at the top of the stack without removing it.
3.2.4 Additional JavaScript Features.
The JavaScript standard specification includes a lot of additional features that we have not covered in the formal discussion. For the majority of them, including arrays, getters and setters, dynamic code evaluation (
eval()), and arguments in function calls, we borrow the treatment from Ichnaea and so we refer readers to the work of Karim et al. [
19] for further details. Instead, we briefly explain why and how we behave differently with respect to one important feature of the language: exception handling.
When a function executes a throw statement, the JavaScript engine interrupts that function and returns to the caller recursively, until it hits a try-catch statement or the stack of activation frames is empty. In the first base case, the variable in the catch clause is filled with the parameter of the throw statement. It is well specified that Ichnaea, as well as our tool, passes the taint of such parameter to the variable in the catch clause through another special variable called _throw_, similar to the treatment of values returned by functions. However, the description of Ichnaea does not mention anything about cleaning up the stack of abstract values from intermediate values of interrupted functions. This detail is important for the correctness of the taint tracking engine: if not considered properly, the state of the abstract machine may get out of sync with respect to the concrete state every time an exception occurs, leading to erroneous analysis results. To remedy this problem, we extend our model of abstract activation frame with an additional numeric information, which we call frame pointer. This value corresponds to the height of the stack of abstract values at the time of creation of the activation frame—that is, when the JavaScript engine enters the last invoked function. When the running function gets interrupted exceptionally, a number of pop\(()\) instructions is emitted until the height of the stack of abstract values is equal to the corresponding frame pointer. This way, no more taints associated to intermediate values of the interrupted function are present in the stack of abstract values, hence the alignment between the abstract and the concrete state is preserved.