1 Introduction
Commercial processors today have native support for
trusted execution environments (TEEs) to run user-level applications in isolation from other software on the system. A prime example of such a TEE is Intel
Software Guard eXtensions (SGX) [
59]. The hardware-isolated environment created by SGX, commonly referred to as an
enclave, runs a user-level application without trusting privileged software. Enclaves offer a good basis for isolation, as they do not necessarily place trust on the OS and allow us to restrict the code base to trust. Further, they open up the possibility of reverse sandboxing, where the enclaved application protects itself from attacks arising from the OS [
49].
SGX exposes an extremely large interface between the enclave and the OS, including the potential to transfer control to the OS at every memory access (e.g., via memory faults) or instruction executed (e.g., via timer interrupts and exceptions). Furthermore, the demand for running commodity applications inside SGX has surged, but these applications are not written to deal with the threat of a malicious OS. Therefore, the ability to interpose on all control and data passed on the enclave-OS interface is an important building block. Such interposition can be used for implementing compatibility frameworks, a host of well-known inline security monitors, and sandboxing techniques inside enclaves [
14,
47,
73,
83,
84].
Complete interposition on enclave-OS interface is a known challenge. For example, a long line of work on frameworks that aim to run existing software on SGX highlights the difficulty of ensuring compatibility [
16,
20,
30,
72,
75]. In this work, we address this challenge by taking a new approach: we enable
dynamic binary translation(DBT), i.e., the ability to interpose on all enclave instructions executed in the enclave. Our work enables DBT on Intel SGX enclaves for unmodified x86_64 Linux binaries by designing a system called
Ratel.
Ratel is available open-source [
1] and it builds on DynamoRIO, an industrial-strength DBT engine originally designed for non-enclave code [
24]. The
Ratel DBT engine does
not trust the OS, and enclave applications running on
Ratel are assumed to be unaware of its presence. A security monitor implemented using
Ratel can mediate and intercept on all instructions, entry-exits, system calls, dynamically generated code, asynchronous events, virtual address accesses, and runtime loading of code and data in the enclave—a foundation for implementing a wide variety of security-related instrumentation on enclaves in the future, without specializing to individual applications or language runtimes.
To illustrate one advantage of such seamless interposition, in this work, we use
Ratel to build a binary compatibility layer for SGX. Binary compatibility creates the illusion for an unmodified application binary as if it is running in a normal OS process, rather than in a restricted environment such as an enclave. In designing this layer, we observe several trade-offs arise between ensuring
complete interposition on the OS-enclave boundary and the resulting performance. These trade-offs are orthogonal to security concerns pointed out in prior works (cf. Iago attacks [
31], side-channels [
90]). We observe that these trade-offs are somewhat fundamental and rooted in five specific restrictions imposed by the SGX design, which create sweeping incompatibility with multi-threading, memory mapping, synchronization, signal-handling, shared memory, and other commodity OS abstractions. Our design resolves these trade-offs consistently in the favor of complete interposition rather than performance. In this sense, our work departs from prior works.
Ratel is the first system that enables DBT and binary compatibility for SGX, to the best of our knowledge. Prior works have proposed a number of different ways of achieving partial compatibility—offering specific programming languages for authoring enclave code [
17,
64,
86], keeping compatibility with container interfaces [
16,
49], or conformance to specific versions of library interfaces provided by library OSes [
20,
30,
68,
72,
75]. All of these designs, however, assume that the application binaries run benign code that uses a particular prescribed interface to achieve compatibility on SGX—for example, application binaries are expected to be relinked against specific versions of libraries (e.g.,
\( {\tt musl} \) ,
\( {\tt libc} \) ,
\( {\tt glibc} \) ), ported to a customized OS, or containerized. In contrast,
Ratel interposes at the instruction-level execution of unmodified program binaries in the enclave, and this approach conceptually does not require such strong assumptions.
Results. We highlight three results showing the egalitarian compatibility offered by
Ratel. First, we find that
Ratel supports more than one language runtimes (e.g., Python and R) out-of-the-box, without requiring any language-specific design decisions. Second, we successfully run a total of 203 unique unmodified binaries across five benchmark suites (58 binaries), four real-world application use-cases (12 binaries), and 133 Linux utilities. These encompass various work-load profiles including CPU-intensive (SPEC 2006), I/O system call intensive (FSCQ, IOZone), system stress-testing (HBenchOS), multi-threading support (Parsec-SPLASH2), a machine learning library (Torch), and real-world applications demonstrated in prior works on SGX.
Ratel offers compatibility but does not force applications to use any specific libraries or higher-level interfaces. At the same time, our presented techniques work without any specialization per target application or runtime, highlighting that DBT can be a general solution to compatibility on enclaves. Last, we show that
Ratel has comparable
1 or better compatibility than Graphene-SGX, which requires relinking with a particular version of
libc, and is one of the longest maintained SGX compatibility infrastructure available publicly.
2 Why Is Complete Interposition Challenging?
Intel SGX allows execution of code inside a hardware-isolated environment called an enclave for running user-level application code [
35].
2 Our goal is to interpose on all the instructions executed inside the enclave. This is a challenge on SGX because of its severe threat model and restrictions placed on enclave code. The OS is not trusted in this threat model. SGX enforces confidentiality and integrity of enclave-bound code and data. All enclave memory is private and only accessible when executing in enclave-mode. Data exchanged with the external world (e.g., the host application or OS) must reside in public memory, which is not protected. At runtime, execution control can only synchronously enter an enclave via
\( {\tt ECALL} \) s and exit an enclave via
\( {\tt OCALL} \) s, which are primary interfaces provided by SGX for effecting system calls (syscalls). Any illegal instructions or exceptions in the enclave create asynchronous entry-exit points. SGX restricts these to pre-specified points in the program. If the enclave execution is interrupted asynchronously, then SGX saves the enclave code execution context and resumes it at the entry point later [
2].
2.1 Restrictions Imposed by SGX Design
Intel SGX protects the enclave by enforcing strict isolation at several points of interactions between the OS and the user enclave code. We outline five restrictions that the design of SGX imposes:
(R1.)
Spatial memory partitioning. SGX enforces spatial memory partitioning. It reserves a region that is private to the enclave and the rest of the virtual memory is public. Memory can either be public or private, not both.
(R2.)
Static memory partitioning. The enclave has to specify the spatial partitioning statically. The size, type (e.g., code, data, stack, heap), and permissions for its private memory have to be specified before creation and these attributes cannot be changed at runtime.
(R3.)
Non-shareable private memory. An enclave cannot share its private memory with other enclaves.
(R4.)
1-to-1 private virtual memory mappings. Private memory spans over a contiguous virtual address (VA) range, the start address of which is decided by the OS. The private VA space has a 1-to-1 mapping with the physical address (PA) space.
(R5.)
Fixed entry points. An enclave can only resume execution from its last point and context of exit. Any other entry points/contexts have to be statically pre-specified in the binary as valid ahead of time.
2.2 Ramifications on Incompatibility
Restrictions R1–R5 are a systematic way to understand the incompatibility created by design choices in SGX with the OS and application functionality. Table
1 summarizes the effect. All of the restrictions apply to SGX v1 [
12].
\( R1 \) –
\( R4 \) apply to SGX v2 [
58,
89], but
\( R2 \) is relaxed, because SGX v2 enables dynamic page creation and permission changes. Thus, for the rest of the article, we describe our design based on SGX v1. We discuss their specific differences to SGX v2 and its ramifications in Section
7.2.
R1. Since SGX spatially partitions the enclave memory, any data that is exchanged with the OS requires copying between private and public memory. In normal applications, an OS assumes that it can access all the memory of a user process, but this is no longer true for enclaves. Any syscall arguments that reside in enclave private memory are not accessible to the OS or the host process. The enclave has to explicitly manage a public and a private copy of the data to make it accessible externally and to shield it from unwanted modification when necessary. We refer to this as a
two-copy mechanism. Thus,
\( R1 \) breaks functionality (e.g., system calls, signal handling, futex), introduces non-transparency (e.g., explicitly synchronizing both copies), and introduces security gaps (e.g., TOCTOU attacks [
31,
43]).
R2. Applications often require changes to the size or permissions of enclave memory. For example, memory permissions change after dynamic loading of libraries (e.g., \( {\tt dlopen} \) ) or files (e.g., \( {\tt mmap} \) ), executing dynamically generated code, creating read-only zero-ed data segments (e.g., .bss), and for software-based isolation of security-sensitive data. The restriction R2 is incompatible with such functionality. To work with this restriction, applications require careful semantic changes: either weaken the protection (e.g., read-and-execute instead of read-or-execute), use the two-copy mechanism, or rely on some additional form of isolation (e.g., using segmentation or software instrumentation).
R3. SGX has no mechanism to allow two enclaves to share parts of their private memory directly. This restriction is incompatible with the synchronization primitives like locks and shared memory when there is no trusted OS synchronization service. Keeping two copies of a shared lock breaks its semantics and creates a chicken-and-egg issue: how to synchronize the two copies without another trusted synchronization primitive.
R4. When applications demand new virtual address mappings (e.g., \( {\tt malloc} \) ), the OS adds these mappings. Normally, applications can ask the OS to map the same physical page at several different offsets, either with same or different permissions—for example, say when the same file is mapped as read-only at two places in the program space. On SGX, however, the same PA cannot be mapped to multiple enclave VAs. Any such mappings lead to memory protection faults.
R5. SGX starts or resumes enclave execution only from controlled entry points, i.e., which have to be statically identified virtual addresses. Entry/exit points such as those via system calls are feasible to identify statically. However, there are several unexpected entry points to an application when we run them unmodified in an enclave (e.g., due to exception-generating instructions or faulting memory accesses). Determining all potential program points across the enclave boundary is not straightforward. When the control re-enters the enclave after an exit (e.g., an OCALL), SGX requires that the program execution context at the time of exit and re-entry should be the same. This does not adhere to typical program functionality. Normally, if the program wants to execute custom error handling code, say after a divide-by-zero ( \( {\tt SIGFPE} \) ) or illegal instruction ( \( {\tt SIGILL} \) ), it can resume execution at a handler function in the binary with appropriate execution context setup by the OS. On the contrary, SGX will resume enclave execution at the same instruction and same context (not the OS setup context for exception handling), thus re-triggering an exception if naively handled.
It is possible to enable compatibility with exceptions by using the SGX features for exception handling [
59]. However, it requires inserting logic to update exception handling data structures (e.g., SSA, enclave stack for EENTER, ERESUME) before and after the exception occurs. Doing such changes in unmodified binaries, though not conceptually impossible, is complex and intricate. Our design needs to prevent overriding any pre-existing legitimate behavior of the enclave binary and ensure that the context expected by the application binary remains the same, i.e., as if the binary was running outside the enclave. We explain the challenges and our design in Section
3, and Section
4 provides details.
3 Overview
Our work poses the following question: Can complete interposition on the OS-enclave interfaces be achieved on the SGX platform? We present the first system that allows interposing on all enclave-bound instructions, by enabling a widely-used DBT engine inside SGX enclaves. Our system is called Ratel.
Before we present the design of our DBT engine, we emphasize a key design trade-off:
Working with restrictions \( R1\text{--}R5 \) , we observe that one is forced to choose between complete interposition at the OS-enclave interfaces and performance. We explain these trade-offs in Section
3.2. Our design picks completeness of interposition over performance, wherever necessary. In this design principle, it fundamentally departs from prior work.
Several different approaches to enable applications in SGX enclaves have been proposed. In nearly all prior works, performance consideration dominates design decisions. A prominent way to side-step the performance costs of ensuring compatibility is to ask the application to use a prescribed program-level interface or API. The choice of interfaces varies. They include specific programming languages [
33,
41,
44,
86], application frameworks [
53], container interfaces [
16], and particular implementations of standard
\( {\tt libc} \) interfaces. Figure
1 shows the prescribed interfaces in three approaches, including library OSes and container engines, and where they intercept the application to maintain compatibility. Given that complete instruction-level interposition is not the objective of prior works, they handle only subsets of
\( R1\text{--}R5 \) . One drawback of these approaches is that if an application does not originally use the prescribed API, the application needs to be rewritten, recompiled from source, or relinked against specific libraries. Further, enclave programs may invoke the OS interfaces directly outside the prescribed API. The approach of complete interposition at the lowest level of interfaces (i.e., at each executed instruction) offers a powerful way of providing compatibility
without specializing to specific target applications or making any such assumptions on the application behavior.
As an illustration of its utility, we show that we can build a binary compatibility layer for Linux-based SGX enclaves on
Ratel. Application binaries are originally created with the intention of running on a particular OS in an unrestricted OS process environment. A binary compatibility layer runs below the application and translates any code illegal in the restrictive SGX environment to the appropriate enclave-OS interfaces. In concept, application code is thus free to use any library, direct assembly code, and runtime that uses the Linux system call interfaces. Furthermore,
Ratel has about 26 additional instruction-level runtime profilers and monitors, which are pre-existing in our baseline DBT engine (see Section
6.4). These become available to applications running on SGX enclaves directly. Such instrumentation can be used for debugging, resource accounting, or implementing inline security monitors for enclaved code in the future.
3.1 Background on DBT
Dynamic binary translation (DBT) is a well-known approach to binary code instrumentation and implementing inline reference monitors. It intercepts each instruction in a program before it executes [
52]. In this article, we choose DynamoRIO as our DBT engine, since it is open-source and widely used in industry [
24].
3 Vanilla DynamoRIO works much like a just-in-time compilation engine, which dynamically re-generates (and instruments) code of the application running on it. At a high level, DynamoRIO first loads itself and then loads the application code in a separate part of the VA space, as shown in Figure
2. Similarly, it sets up two different contexts, one for itself and one for the application. DynamoRIO can update the code on-the-fly before putting it in the code cache by re-writing instructions (e.g., convert a syscall instruction to a stub or library function call). Such rewriting ensures that DynamoRIO engine takes control before each block of code executes, enabling the ability to interpose on every instruction. Instrumented code blocks are placed in a region of memory called a code cache. When the code cache executes, DynamoRIO regains control as the instrumentation logic desires. It does post-execution updates to itself for book-keeping or to the program’s state. Additionally, DynamoRIO hooks on all events intended for the process (e.g., signals). The application itself is prevented from accessing DynamoRIO memory via address-space isolation techniques [
52]. Thus, it acts as an arbiter between the application’s binary code and the external environment(e.g., OS, filesystem) with complete interposition.
The original DynamoRIO engine is designed to work for non-enclave code. We adapt it to work inside SGX enclaves, resulting in our
Ratel system. To contrast it with the approach of changing
\( {\tt libc} \) , DBT intercepts the application right at the point at which it interacts with the OS (Figure
1) for SGX compatibility.
Ratel retains the entire low-level instruction translation and introspection machinery of DynamoRIO, including the code cache and its performance optimizations. This enables reusing well-established techniques for application instrumentation and performance enhancements. We eliminate the support for auxiliary client plugins to reduce
trusted computing base (TCB), but a suite of built-in runtime profilers (see Table
12), which do not use the DynamoRIO client plugin interfaces are retained in
Ratel.
DynamoRIO instrumentation capability is also designed to execute a race-free application without introducing any new races [
26]. To achieve this, vanilla DynamoRIO engine itself outlines the following design principles:
(a)
DynamoRIO injects itself into existing threads. DynamoRIO does not create any new threads during its own execution. Instead, it uses application threads to perform all translation operations.
(b)
Separate DynamoRIO and application locks. DynamoRIO does not re-use application locks for its own synchronization. It introduces new lock variables and exclusively uses them. Further, DynamoRIO does not modify any lock operations of the application code.
(c)
Lock ordering to avoid deadlocks. DynamoRIO does not hold any locks when the application code in the code cache holds locks. This way locks held by the application have higher priority in the order in which locks must be acquired, a well-known way of avoiding deadlocks.
(d)
Memory consistency for race-free applications. The original DynamoRIO implementation only inserts memory reads for indirect branches and certain system call arguments. To safeguard consistency for memory reads, DynamoRIO does not insert additional memory access or modify the order of accesses in the application. For system call processing, DynamoRIO uses locks following the principles (a), (b), and (c). Applications that originally have race conditions are not protected by DynamoRIO, but this is unavoidable.
The four principles above avoid introducing any new concurrency bugs (e.g., race conditions) in the translated application. We explain how our approach in
Ratel piggybacks on such thread-safety to ensure there are no concurrency issues beyond those that exist in the logic of the original translated program in Section
3.3.
3.2 Ratel Approach
Ratel provides compatibility for both the DynamoRIO DBT engine as well as any application binary code that DynamoRIO translates. We provide a high-level overview of our design and explain its key trade-offs.
High-level Overview. As a first step,
Ratel loads the DynamoRIO dynamic translation engine at a specific location in virtual memory, which we denote as
\( A \) . Let us say that the vanilla dynamic translation engine in DynamoRIO is coded to access virtual address memory regions denoted as
\( B \) and the target application accesses regions
\( C \) , respectively. The basic principle behind the design of
Ratel is to ensure
referential transparency: whenever the DynamoRIO engine accesses any virtual address that would have been a location in
\( B \) (without
Ratel), it must now access the corresponding location with base at the relocated address
\( A \) in
Ratel. Similarly, if the application would have accessed a location in
\( C \) originally, then
Ratel must ensure that it accesses the corresponding to its translated location. To do this,
Ratel must (a) intercept all operations that create virtual memory maps (e.g., via static and dynamic loading), and (b) keeps an address translation table in
\( A \) for translating program accesses made by the target application dynamically. Note that such referential transparency for memory accesses provides compatibility with position-independent code, dynamically generated code, and shadow memory data structures (e.g., shadow stacks) that the application may have originally used—the memory references at runtime resolve consistently to the same translated address and the values read/written as thus consistent with the original run. For security,
Ratel must ensure that all accesses to
\( A \) originate from the dynamic translation engine itself, and the application code is unable to access
\( A \) directly—a memory isolation policy.
Ratel enforces this policy for the dynamic translation engine by modifying its code statically. For the target application, memory isolation can be enforced at runtime through the instruction rewriting capability of DynamoRIO itself, as done in program shepherding [
52].
In addition, Ratel modifies DynamoRIO to adhere to SGX virtual memory limitations (R1–R4). In designing Ratel, we statically change the DynamoRIO code to load it at a fixed memory region \( A \) . Note that \( A \) can be fixed at the time of initialization of the process (loading), therefore, it does not break compatibility with address-space layout randomization. This allows us to load Ratel and start its execution without violating the memory semantics of SGX. We register a fixed entry point in Ratel when entering or resuming the enclave. This entry point acts as a unified trampoline, such that upon entry, Ratel decides where to redirect the control flow, depending on the previously saved context. In DynamoRIO code, we statically replace all instructions that are illegal in SGX with an external call that executes outside the enclave. Thus, Ratel execution itself is guaranteed to never violate R5.
Ratel has complete control over the loading and running of the translated application binary. Therefore, to ensure that the application adheres to R1–R5, Ratel dynamically rewrites the instructions before they are executed from the code cache. To keep compatibility with R2, we statically initialize the virtual memory size of the application to the maximum allowed by SGX; the type and permissions of memory are set to the specified type in the original binary. Ratel augments its memory manager to keep track of and transparently update the application memory layout as it changes during execution. At runtime, the application can make direct changes to its own virtual memory layout via system calls. Ratel dynamically adapts these changes to SGX by making two copies, wherever necessary, or by relocating the virtual address regions. Ratel intercepts all application interactions with the OS. It modifies application parameters, OS return values and events for monitoring indirect changes to the memory (e.g., thread creation). Before executing any application logic, Ratel scans the code cache for any instructions (e.g., \( {\tt syscall} \) , \( {\tt cpuid} \) ) that may potentially be deemed as illegal in SGX and replaces it with an external call. In the other direction, Ratel also intercepts OS events on the behalf of the application. Upon re-entry, if the event has to be delivered to the application (e.g., signals for application itself), it sets/restores the appropriate execution context and resumes execution via the trampoline. In this way, Ratel remedies the application on-the-fly to adhere to R1–R5.
3.3 Resolving Key Design Trade-offs
Ratel helps to interpose on the enclave code without relying on the untrusted OS. In doing so, the SGX restrictions \( R1\text{--}R5 \) give rise to trade-offs between ensuring complete interposition and having low performance overheads. We point out that these are somewhat fundamental and apply to Ratel and other compatibility efforts equally. However, Ratel chooses completeness in its interposition over performance, whenever conflicts arise.
Two-copy mechanism for \( R1 \) & \( R2 \) . Due to restriction \( R1 \) , whenever the application wants to read or write data outside the enclave, the data needs to be placed in public memory. Computing on data in public memory, which is exposed to the OS, is insecure. Therefore, if the application wishes to securely compute on the data, then a copy must necessarily be maintained in a separate private memory space, as \( R2 \) forbids making changes to the memory permissions dynamically. Specifically, we use a two-copy mechanism, instances of which repeat throughout the design. Consider that case where an enclave wishes to write data to a file. The OS cannot access the buffer in enclave private memory. Here, the enclave can utilize a two-copy design, in which we place the file data in public memory. The enclave keeps an additional copy i.e., the second copy of that data in its private memory. When the enclave wishes to update the file, the enclave updates the private and the public copy. When reading data from the public memory such as a file, an enclave must copy the data to private memory and check it before further use.
The above two-copy design pattern is used in other scenarios in
Ratel too, specifically, when the enclave shares data with the OS (e.g., for system call handling) and when the data in its private memory needs to have different permissions over time. The OS can manipulate the public memory content at any time—while the enclave is computing on the data, after the enclave’s check but before its use. Such changes by the OS can result in TOCTOU bugs. Keeping a separate private copy in the enclave reduces the attack surface. The other scenario where two-copy design comes useful is when the enclave wants to change the permissions of its private memory, such as in switching read-only data to executable/writable, but within the enclave itself. Such permission switching for memory region is disabled on SGX due to R2. To address this, the two-copy mechanism creates a copy of the data in an additional separate memory region inside private memory, such that the new region has the new permissions. The two-copy mechanism, however, incurs both space and computational performance overheads. Every update to the data causes at least a memory copy operation. The total overhead depends on the application’s workload characteristics (kinds and frequency of shared data accesses) and can vary from
\( 10\% \) for CPU-intensive workloads to
\( 10\times \) for IO-intensive workloads (see Section
6.3).
Memory Sharing for \( R3 \) & \( R4 \) . \( R3 \) creates an “all or none” trust model for enclaves. Either memory is shared with all entities (including the OS) or kept private to one enclave.
\( R4 \) restricts sharing memory within an enclave further. These restrictions conflict with semantics of shared memory and synchronization primitives. For instance, synchronization primitives such as
futexes are implemented with a single memory copy that the OS is trusted to manage securely—such a design is in direct conflict with the SGX security model. To implement such abstractions securely, designs on SGX must rely on a trusted software manager, which necessarily resides in an enclave, since the OS is untrusted (see Section
4.4). Applications can then maintain compatibility with locks and shared memory abstractions. But, this comes at a trade-off in performance costs: access to shared memory or synchronization primitives, which are originally inexpensive memory accesses, turn into (possibly remote) procedure calls to the trusted manager enclave.
Secure entry-points for \( R5 \) . Restriction \( R5 \) requires that whenever the enclave resumes control after an exit, the enclave state (or context) should be the same as right before exit. This implies that the security monitor (e.g., the DBT engine) must take control before all exit points and after resumption, to save-restore contexts—otherwise, the interposition can be incomplete, creating security holes and incompatibility. Without guarantees of complete interposition, the OS can return control into the enclave, bypassing security checks that the DBT engine implements. The price for complete interposition on binaries is performance—the DBT engine must intercept all entry/exit points and simulate additional context switches in software. Prior approaches, such as library OSes, choose performance over completeness in interposition, by asking applications to link against specific library interfaces that constrict enclave-OS interaction via certain specified library interfaces. But, this does not enforce complete interposition. Applications, due to bugs or when exploited, can make direct OS interactions without using the prescribed API, use inline assembly, or override entry handlers setup by the library OS. All scenarios in which applications go outside the prescribed interfaces, the library OS design requires special handling.
Several additional security considerations arise in the implementation details of our design. These include (a) avoiding naïve designs that have TOCTOU attacks; (b) saving and restoring the execution context from private memory; (c) maintaining
Ratel-specific metadata in private memory to ensure integrity of memory mappings that change at runtime; and (d) explicitly zeroing out memory content and pointers after use. We explain them inline in Section
4.
3.4 Threat Model and Scope
Ratel is best viewed as a general framework for instruction-level interposition on enclaved binaries, rather than a stand-alone sandboxing engine that protects enclaves against all possible OS attacks.
Ratel itself does
not trust the OS or any security guarantees the OS provides. Binaries running on top of
Ratel otherwise follow the same threat model as vanilla DBT engines [
82]. Application binaries are assumed to not be aware of the presence of
Ratel—they are benign binaries but they can be exploited via externally provided inputs. Under exploitation, malicious code may execute on
Ratel.
Ratel provides instruction-level instrumentation of all instructions executed and does not provide any higher-level security guarantees beyond that. For example, malicious code can readily determine that it is running on
Ratel [
42] and, therefore,
Ratel is not suitable for analyzing analysis-evading malicious code. Using the instruction-level monitoring capability, the vanilla DynamoRIO engine itself provides certain built-in security mechanisms, which are preserved in
Ratel. Specifically, ASLR for all heap regions is turned on and the code cache is randomized.
Ratel isolates the stack, dynamically allocated memory, and file-mapped I/O regions through instrumentation. The main new challenges highlighted in this work are those due to enabling DBT on SGX, while most other threats to DBT-based instrumentation are pre-existing and known. The design trade-offs we emphasize apply to any interposition framework that runs on SGX, but have not been emphasized clearly in prior works.
Building an end-to-end secure sandbox on top of
Ratel requires additional security mechanisms, which are common to other systems and are previously known. These mechanisms include encryption/decryption of external file or I/O content [
16,
30,
49,
72], sanitization of OS inputs to prevent Iago attacks [
31,
51,
77,
81], defenses against known side-channel attacks [
22,
48,
66,
73,
74], additional attestation or integrity of dynamically loaded/generated code [
44,
45,
46,
85], and so on. These are important but largely orthogonal to our focus.
Our binary compatibility layer has support for large majority but not all of the Linux system calls. The most notable of these unsupported system calls is
\( {\tt fork} \) , which is used for multi-processing. Since
Ratel does not support multi-process applications, we support locks and synchronization primitives for threads within a single enclave. The basic design of
Ratel can be extended to support
\( {\tt fork} \) with the two-copy mechanism, similar to prior work [
30,
75]. However, maintaining compatibility with
fork blindly is a questionable design decision, especially for enclaves, as has been argued extensively [
19]. A recent work on SGX compatibility has left out support for
\( {\tt fork} \) and multi-enclave locks based on the same observation [
72].
4 Ratel Design
We explain how Ratel handles syscalls, memory, threads, synchronization, and exceptions/signals inside SGX enclaves.
4.1 Syscalls and Unanticipated Entry-Exits
SGX does not allow enclaves to execute several instructions such as \( {\tt syscall} \) , \( {\tt cpuid} \) , and \( {\tt rdtsc} \) . If the enclave executes them, then SGX exits the enclave and generates a \( {\tt SIGILL} \) signal. Gracefully recovering from the failure requires re-entering the enclave at a different program point. Due to R5, this is disallowed by SGX. In Ratel, either DynamoRIO or the application can invoke illegal instructions, which may create unanticipated exits from the enclave.
Ratel changes DynamoRIO logic to convert such illegal instruction to stubs that either delegate or emulate the functionality. For the target application, whenever Ratel observes an illegal instruction in the code cache, it replaces the instruction with a call to the Ratel syscall handler function. Ratel has three ways of handling system call execution:
(1)
Complete delegation: Entirely delegate the syscall instruction and handler to code outside the enclave;
(2)
Partial delegation: Execute the syscall instruction outside, and then update the private in-enclave state; or
(3)
Emulation: Completely simulating the syscall behavior with a handler inside the enclave.
Ratel uses complete delegation for file, networking, and timer-related system calls. It uses partial delegation for memory management, threads, and signal handling. We outline the details of other syscall subsystems that are fully or partially emulated by
Ratel in Sections
4.2,
4.3,
4.4, and
4.5.
Ratel uses emulation for very few system calls. For example, the
\( {\tt arch\_prctl} \) syscall is used to read the FS base.
Ratel emulates it by executing a
\( {\tt rdfsbase} \) instruction.
Creating and Synchronizing Memory Copies. Syscalls access process memory for input-output parameters and error codes. Since enclaves do not allow this, for delegating the syscall outside the enclave, Ratel creates a copy of input parameters from private memory to public memory. This includes simple value copies as well as deep copies of structures. The OS then executes the syscall and generates results in public memory. After the syscall completes, Ratel copies back the OS-provided return values and error codes to private memory.
Memory copies alone are not sufficient. For example, when loading a library, the application uses \( {\tt dl\_open} \) , which in turn calls \( {\tt mmap} \) , which must execute outside the enclave. Thus, the \( {\tt mmap} \) call outside the enclave will map the library in the untrusted public address space of the application. However, the original intent of the application is to map the library inside the enclave private memory. As another example, consider when the enclave code wants to create a new thread local storage (TLS) segment. Due to the restrictive SGX environment, Ratel must execute the system call outside the enclave, and the new thread is created for the DynamoRIO runtime instead of the target application. In all such cases, Ratel takes care to explicitly propagate changes to inside the enclave, i.e., reflect changes to private memory.
When the enclave code copies input arguments or results for external functions (e.g., syscalls) between private and public memory, other enclave threads should not change the private memory. To ensure this,
Ratel uses DynamoRIO locks to synchronize private memory accesses introduced by two-copy mechanism. Our implementation adheres to principles (a)–(d) outlined in Section
3.1. One caveat that arises in
Ratel is when two threads in the same enclave attempt to perform syscalls simultaneously. Consider an example where the application has two threads, one of which is waiting/polling to acquire a lock, while the other holds it. As per the principle (c), the
Ratel instance running in thread 1 will try to acquire a DynamoRIO-specific lock (inside the
\( \tt {poll} \) syscall), which might already have been held by the
Ratel instance running in the thread 2. This can lead to a deadlock. To avoid deadlocks in such cases, we examine each system call and add DynamoRIO locks only when the syscall updates private memory.
Checking Memory State after Syscalls. Ratel resumes execution in the enclave only after the syscall state has been completely copied inside the enclave. This allows the enclave to employ sanitization of OS return values before using it. Previously known sanitization checks for Iago attacks can be implemented here [
77]. Note that all such sanitization checks must execute inside the enclave and
after the state from the public memory is copied into the enclave private memory. This caveat is important to avoid TOCTOU attacks wherein the OS modifies public memory state before or midway through the execution of the sanitization checks.
4.2 Memory Management
Ratel utilizes partial emulation for syscalls that change the process virtual memory layouts and permissions (e.g., mmap, mprotect, fsync, and so on). It executes the syscall outside the enclave and then explicitly reflects the memory layout changes inside the enclave. First, this is not straightforward. Due to R1–R4, several layout configurations are not allowed for enclave virtual memory (e.g., changing memory permissions). Second, Ratel does not trust the OS information (e.g., via \( {\tt procmap} \) ). Hence, Ratel must use a two-copy mechanism when it uses the partial delegation approach.
Specifically, Ratel maintains its own \( {\tt procmap} \) -like structure to keep its own view of the process virtual memory inside the enclave, tracks the memory-related events, and updates the enclave state. For example in the case of mmap syscall to map a file in enclave private memory, the handler outside the enclave creates a public memory region by invoking the OS first. Then, Ratel allocates a region of private memory, which mirrors the content of the file mapped outside the enclave, and updates its internal \( {\tt procmap} \) -like structure to record the new virtual addresses created. Further, Ratel synchronizes the two-copies of memories to maintain execution semantics on all subsequent changes to \( {\tt mmap} \) ped-memory. This is done whenever the application unmaps the memory or invokes the \( {\tt sync} \) / \( {\tt fsync} \) syscalls.
Ratel does not blindly replicate OS-dictated memory layout changes inside the enclave. It first checks if the resultant layout will violate any security semantics (e.g., mapping a buffer to zero-address). It proceeds to update enclave layout and memory content only if these checks succeed. To do this, Ratel keeps its metadata in private memory.
With interposition over memory management, Ratel transparently side-steps SGX restriction due to R2. When application makes changes to the permissions of a memory region (say \( X \) ) dynamically, Ratel moves the content to a memory region (say \( Y \) ), which has the required permissions. To do this, Ratel requires a stash of unused private memory regions ( \( Y \) ) that are originally not used by the application. These memory regions are allocated statically by Ratel at the start and their page permissions are set to readable, writable, and executable. Subsequently, when the application binary accesses memory \( X \) , Ratel dynamically translates the access to the copy in memory \( Y \) . This allows Ratel to transparently emulate the permission change, which is otherwise a disallowed behavior inside the enclave.
Note that the stash pool of private memory regions \( Y \) are private to the enclave, but have overly permissive access rights. This can be avoided by reserving regions with write-only and execute-only permissions, but this may decrease the size of usable non-stash memory. Alternatively, the access rights can be implemented through runtime monitoring of memory accesses, but this adds performance costs. The trade-offs are inherent to restrictions \( R1 \) and \( R2 \) .
4.3 Multi-Threading
Ratel supports multiple threads running in a single enclave. However, it has no support for fork to create multiple processes running in different enclaves. Therefore, our design restricts our concerns to enabling thread synchronization within a single enclave. This is still challenging, because restriction \( R2 \) requires the enclaved application to pre-declare a maximum number of threads before execution. SGX also does not allow the enclave to resume at arbitrary program points or execution contexts (restriction \( R5 \) ). This creates several challenges in adapting DynamoRIO to run in SGX.
In the vanilla DynamoRIO design, the dynamic translation engine and the target application share the same thread, but they have separate TLS segment for cleaner context switch. DynamoRIO keeps the default TLS segment for the target application and creates a new TLS segment for itself at a different address. It switches between these two TLS segments by changing the segment register—DynamoRIO uses \( {\tt gsbase} \) and the application uses \( {\tt fsbase} \) .
Multiplexing TLS Segments. Normally, to context switch between threads, one TLS segment to save the currently executing application thread context is sufficient. But with Ratel, we need an additional TLS segment to save the state of Ratel itself. Furthermore, SGX itself reserves one additional TLS segment for its own internal use. This brings the total number of required TLS segments, for a correct context switch, to 3 on SGX.
But, the x86_64 architecture itself provides only two base registers ( \( {\tt fsbase} \) and \( {\tt gsbase} \) ) for storing pointers to TLS segments. Therefore, when we attempt to run DynamoRIO inside SGX, there are not enough base registers to save three TLS segment offsets (one each for DynamoRIO, SGX, and the application). We circumvent this limitation of the SGX platform as follows. First, Ratel adds two fields in each TLS segment to store \( {\tt fsbase} \) and \( {\tt gsbase} \) register values for that segment. We use these TLS segment fields to save and restore pointers to the segment base addresses. This allows us to still maintain and switch between three clean TLS segment views per thread. Second, when Ratel has to restore a TLS segment, it searches through a list of TLS segment base addresses, to find the right one to restore—this is because it does not have enough base registers to store three TLS segment bases (which would have avoided the search).
Restoring TLS Segments on Context Switches. Ratel conceptually maintains a linked list of a maximum of three TLS segment base pointers. The head of the list is the \( {\tt fsbase} \) register, which serves as a pointer to the default first TLS segment created by SGX reserved for its own use. This default segment is called the primary and all TLS segments created subsequently are referred to as secondary. To point to the next TLS segment in the linked list, Ratel adds a new field in the TLS segment, which is NULL for the last element in the list. To traverse the list, the \( {\tt gsbase} \) register is used. The list search begins with the primary, and the right segment to restore is always the last element in the link list. This way, Ratel can search and decide which of the three TLS segments to restore using only two registers ( \( {\tt fsbase} \) and \( {\tt gsbase} \) ).
There are two ways in which control can enter/exit from the enclave: via synchronous exits (e.g., \( {\tt ECALL} \) / \( {\tt OCALL} \) used for syscalls) and via asynchronous exits (e.g., used for exceptions, timer interrupts, and so on). During synchronous exits, Ratel sets up the TLS segment link list such that the restored TLS context state upon resumption is for DynamoRIO and SGX, respectively. The subsequent exception handlers copy state from outside the enclave to inside, perform various Iago checks, and then setup the TLS segment to that of the application thread. During asynchronous exits, Ratel does not need to perform any special setup for the TLS segment link list. When the exception handler executes on resumption, it sets up the executing context just before the exit. Ratel performs similar checks and operations as in the case of synchronous exits, and then restores the context to that before the exit (which may be DynamoRIO’s context or that of the application thread). Therefore, our design works correctly for both synchronous and asynchronous exits.
Dynamic Threading. Since the number of TCS entries is fixed at enclave creation time on SGX, the maximum number of threads supported is capped.
Ratel multiplexes the limited TCS entries available among the application threads dynamically, as shown in Figure
4. When an application wants to create a new thread (e.g., via
clone),
Ratel first checks if there is a free TCS slot. If it is the case, then it performs an
\( {\tt OCALL} \) to do so outside the enclave. Otherwise, it busy-waits until a TCS slot is released. Once a TCS slot is available, the
\( {\tt OCALL} \) creates a new thread outside the enclave. After finishing thread creation, the parent thread returns back to the enclave and resumes execution. The child thread explicitly performs an
\( {\tt ECALL} \) to enter the enclave and DynamoRIO resumes execution for the application’s child thread.
For all threading operations, Ratel ensures transparent context switches to preserve binary compatibility as intended by DynamoRIO. For security, Ratel creates and stores all thread-specific context either inside the enclave or SGX’s secure hardware-backed storage at all times. It does not use any OS data structures or addresses for thread management.
Safety of Two-copy Mechanism. Ratel does not introduce any new concurrency issues due to its two-copy mechanism (i.e., one-way or two-way copy). Consider two threads in the same enclave sharing a region of memory and concurrently accessing it. If the original code itself has data races when accessing memory, then those races will be preserved in
Ratel with its two-copy mechanism. However, if the original code is race-free, then its execution with
Ratel will also be race-free. The two-copy mechanism ensures this, because it does not introduce any additional writes/reads to shared (public) memory—it only replaces the original reads/writes of the shared data with reads/writes to translated addresses in private memory. The use of a private copy does not alter the semantics of the original access. In Section
6.1.4, we evaluate the above scenarios for native, DynamoRIO, and
Ratel execution of real-world applications.
4.4 Thread Synchronization
SGX provides basic synchronization primitives (e.g., SGX mutex and conditional locks) backed by hardware locks. But they can only be used for enclave code. Thus, they are semantically incompatible with the lock mechanisms used by DynamoRIO or legacy applications, which use OS locks. For example, DynamoRIO implements a fast lock using the \( {\tt futex} \) syscall, where the lock is kept in a shared memory accessible to all application threads and the OS. Here, the OS needs the ability to read the lock state to determine whether it should wait during the \( {\tt FUTEX\_WAIT} \) syscall.
A naive design would be to maintain the \( {\tt futex} \) lock in public memory, such that it is accessible to the enclave(s) and the OS. However, the OS can arbitrarily change the lock state and attack the application. Specifically, it can reset the lock during the execution of a critical section in an application thread. Therefore, this design choice is not safe.
As an alternative, we can employ a two-copy mechanism for locks. The enclave can keep the lock in private memory. When it wants to communicate state change to the OS, Ratel can tunnel a futex \( {\tt OCALL} \) to the host OS. This approach is problematic as well. Threads inside the enclave may frequently update the locks in private memory. The futex state outside the enclave needs to be kept consistent with the private copy, when the OS kernel and the untrusted part of the enclave access it, or else the semantics of the lock may not uphold. The more frequent the local updates to the in-enclave copy of the lock state, the higher the chance of inconsistencies. In general, avoiding such race conditions usually involves using locks for synchronizing. But requiring locks to synchronize copies of other locks, as suggested in this design alternative, only results in a chicken-and-egg problem.
Supporting such semantics efficiently, where the OS has a shared read access to the lock state, is difficult with SGX because of restrictions
\( R1 \) and
\( R3 \) . Figure
3 shows the schematics of design choices for implementing synchronization primitives available on SGX. Note that options (a) and (b) are insecure as discussed above.
Ratel implements a lock manager inside the same enclave that executes the application. Our simplification has one limitation: Only threads within the same application process (and enclave) can utilize
Ratel synchronization primitives.
Ratel Lock Manager Implementation. Our design choice of using a single enclave to execute both the DynamoRIO engine and all application threads eliminates considerable complexity in implementation. It turns out that a futex-based locks become unnecessary, since we do not need sharing across the process boundary or with the OS. The DynamoRIO usage of futexes can thus be replaced with a simpler primitive such as spinlocks to achieve the same functionality. Specifically, Ratel implements a lock manager using the hardware spinlock exposed by SGX. Ratel invokes its in-enclave lock manager either when DynamoRIO uses futexes or when the application binaries perform lock-based synchronization. For the DynamoRIO code, we manually change it to invoke our lock manager. In case of application locks, Ratel loads application binary into the code cache and replaces thread-related calls (e.g., \( {\tt pthread\_cond\_wait} \) ) in the enclave-OS interface with stubs to invoke our lock manager to use Ratel-provided safe synchronization primitives.
4.5 Signal Handling
Ratel cannot piggyback on the existing signal handling mechanism exposed by the SGX, due to restriction
\( R5 \) . Specifically, when DynamoRIO executes inside the enclave, the DynamoRIO signal handler needs to get description of the event to handle it (Figure
5(a)). However, Intel’s SGX platform software removes all such information when it delivers the signal to the enclave. This breaks the functionality of programmer-defined handlers to recover from well known exceptions (e.g., divide by zero). Further, any illegal instructions inside the enclave generate exceptions, which are raised in the form of signals. Existing binaries may not have handlers for recovering from such illegal instructions. Therefore,
Ratel must provide handlers for all such exceptions.
Recall that SGX allows entering the enclave at fixed program points. Leveraging this,
Ratel employs a
primary signal handler that it registers with SGX. For any signals generated for DynamoRIO or the application, we always enter the enclave via the primary handler and we copy the signal number into the enclave. We then use the primary as a trampoline to route the control to the appropriate
secondary signal handler inside the enclave, based on the signal number. At a high-level, we realize a virtualized trap-and-emulate signal handling design. We use SGX signal handling semantics for our primary. For the secondary, we setup and tear down a separate stack to mimic the semantics in the software. The intricate details of handling the stack state at the time of such context switch are elided here. Figure
5(b) shows a schematic of our design, and we explain the flow of control and associated issues here.
Registration. The original DynamoRIO code and the application binary use
\( {\tt sigaction} \) to register signal handlers for itself. In
Ratel, first we change DynamoRIO logic to register only the primary signal handler with SGX. We then record the DynamoRIO and application registrations as secondary handlers. This way, when SGX or the OS delivers the signal to the enclave, SGX directs the control to our primary handler.
4 Since this is a pre-registered handler, SGX allows it. The primary handler checks the signal information (e.g., signal code) and explicitly routes execution to the secondary.
Delivery. A signal may arrive when the execution control is inside the enclave. In this case, Ratel executes a primary signal handler that delivers the signal to the enclave. However, if the signal arrives when the CPU is in a non-enclave context, SGX does not automatically invoke the enclave to redirect execution flow. To force this, Ratel has to explicitly enter the enclave. But it can only enter at a pre-registered program point with a valid context, as per restriction \( R5 \) . Thus, Ratel first wakes up the enclave at a valid point (via \( {\tt ECALL} \) ). Ratel copies the signal information, passed as an input argument to the \( {\tt ECALL} \) , to private memory. It then simulates the signal delivery by setting up the enclave stack in private memory to execute the primary handler.
Exit. After executing their logic, handlers use \( {\tt sigreturn} \) instruction for returning control to the point before the signal interrupted the execution. When Ratel observes this instruction in the secondary handler it has to simulate a return back to the primary handler instead. The primary handler then performs its own real \( {\tt sigreturn} \) . SGX then resumes execution from the point before the signal was generated.
Handling Nested Signals. One issue with platforms like SGX is that it supports synchronous and asynchronous signals. Signals can be nested, in the sense that signals can be delivered by the OS while the enclave is handling another one. The enclave cannot mask signals selectively at runtime on SGX. Accordingly, the potential for subtle reentrancy bugs in the enclave signal handling code arises. At a high-level, Ratel handles signals safely by ensuring that unsafe nesting is not possible. Specifically, the SGX platform automatically saves enclave state in private memory regions pointed to by hardware State Save Area (SSA) when an exception is to be delivered. To support nesting, SGX provides an array of SSAs frames, leaving the option to set the maximum size of array (hence, the maximum possible nesting depth) to the enclave. Ratel utilizes this feature to limit the nesting depth to 2. This is needed because in Ratel one SSA frame can be used by Ratel itself and the other by the target application. With the maximum nesting depth set to 2, if a nested signal of depth 3 is being attempted to be delivered, SGX securely aborts the enclave, since not enough SSA frames are available. With this design, there are only four possible combinations of reentrancy to reason about:
(1)
Ratel signal handler is interrupted with a signal to be delivered to the application;
(2)
application’s signal handler is interrupted with a signal to be delivered to Ratel;
(3)
Ratel signal handler is interrupted with a signal to be delivered to the Ratel;
(4)
application handler is interrupted with a signal to be delivered to the application.
In Cases 1 and 3, DynamoRIO gets execution control. In Case 1, DynamoRIO has one additional SSA frame available to save the current state of the primary handler and deliver control to the beginning of the primary signal handler but with the new context. In Case 3, the two SSA frames are already used up to save
Ratel’s primary handler state and the application’s secondary handler state. Therefore, SGX is unable to deliver the signal and the enclave execution is aborted. In Cases 2 and 4, similar to Case 3, the two SSA frames are already in use to store the current state of the primary and secondary signal handlers, respectively. SGX, thus, cannot deliver the signal and the enclave is aborted. Thus, in all the above scenarios,
Ratel design handles reentrancy safely. Note that
Ratel reentrancy handling for synchronous exceptions, which are used for threads and syscalls (Section
4.3), and asynchronous exceptions is the same regardless of the exception type.
Remark. Ratel signal handling extends the trap-and-emulate principle used by DynamoRIO. Since DynamoRIO design enforces transparency, we preserve this in Ratel. This goes beyond merely working around the SGX limitations, thus making our design different than existing frameworks. Specifically, existing library OS-based SGX frameworks (e.g., Graphene-SGX and Occlum) assume that all exception registration and execution will go via the prescribed library interfaces. So, they do not keep a separate signal context for the library OS signals and the application. These works do not discuss what happens during nested signals or when the enclave-OS interaction happens outside of the prescribed library registration and handler mechanisms.
5 Implementation
We implement
Ratel on DynamoRIO [
24]. We run DynamoRIO inside an enclave with the help of a standard Intel SGX development kit that includes user-land software (SDK v2.1), platform software (PSW v2.1), and a Linux kernel driver (v1.5). We make a total of 9667 LoC software changes to DynamoRIO and SDK infrastructure. We run
Ratel on unmodified hardware that supports SGX v1.
Ratel design makes several changes to DynamoRIO core (e.g., memory management, lock manager, signal forwarding). We discuss three high-level implementation challenges that arise in implementing our design while eliding lower-level challenges here for brevity. The root cause of our highlighted challenges is the way Intel SDK and PSW expose hardware features and what DynamoRIO expects.
Self-identifying Load Address. The vanilla DynamoRIO engine needs to know its own start location in memory to avoid overlapping its own address space with that of the target application, and so it uses a hard-coded address. Since our modifications change such hard-coded address assumptions,
Ratel uses a
\( {\tt call-pop} \) instruction sequence to self-identify the runtime location in memory for the DynamoRIO engine [
70,
87], aligns it at a page boundary, and updates the DynamoRIO logic to use code location-independent addressing.
Setting SSA Slots. The vanilla SGX SDK and PSW use just two SSA frames: one is used to process timer interrupts specially and the other to handle all other interrupts. As explained in Section
4.5, the
Ratel design aims to set the effective nesting depth to 2. Therefore, in the implementation, it uses three SSA slots: one reserved for the timer interrupt (to be handled by the SGX SDK), and the remaining two as described in Section
4.5. The timer interrupt handler simply routes control to whichever signal handler was interrupted. The SGX specification allows setting the required SSA slots by changing the
\( {\tt NSSA} \) field in our SDK implementation.
Preserving Execution Contexts. For starting execution of a newly created thread, Ratel invokes a pre-declared \( {\tt ECALL} \) to enter the enclave. This is a nested ECALL, which is not supported by SGX SDK. To allow it, we modify the SDK to facilitate the entrance of child threads and initialize the thread data structure for it. Specifically, we check if the copy of thread arguments inside the enclave matches the ones outside before resuming thread execution. We save specific registers so that the thread can exit the enclave later. Note that the child thread has its own execution path differentiating from the parent one, Ratel hence bridges its return address to the point in the code cache that a new thread always starts. After the thread is initialized, we explicitly update DynamoRIO data structures to record the new thread (e.g., the TLS base for application libraries). This way, DynamoRIO is aware of the new thread and can control its execution in the code cache.
Propagating Implicit Changes and Metadata. Thread uses \( {\tt exit/exit\_group} \) syscall for terminating itself. Then the OS zeros out the child thread ID ( \( {\tt ctid} \) ). In Ratel, we explicitly create a new thread inside the enclave, so we have to terminate it explicitly by zeroing out the pointers to the IDs. Further, we clean up and free the memory associated with each thread inside and outside the enclave.
Built-in Profilers. DynamoRIO supports two modes of instrumentation—built-in profilers and client plugins. Profilers are readily available with the DynamoRIO core and provide basic functionalities such as instruction tracing, logging, and tuning the DynamoRIO parameters (see Table
12). However, DynamoRIO clients are specific instrumentation plugins designed to perform user-desired tasks (e.g., Shadow stack). Since profilers are part of DynamoRIO,
Ratel supports all of them out-of-the-box. Our current implementation removes client plugin support to reduce TCB. We demonstrate
Ratel compatibility with these profilers in Section
6.4.