Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Dynamic Binary Translation for SGX Enclaves

Published: 09 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Enclaves, such as those enabled by Intel SGX, offer a hardware primitive for shielding user-level applications from the OS. While enclaves are a useful starting point, code running in the enclave requires additional checks whenever control or data is transferred to/from the untrusted OS. The enclave-OS interface on SGX, however, can be extremely large if we wish to run existing unmodified binaries inside enclaves. This article presents Ratel, a dynamic binary translation engine running inside SGX enclaves on Linux. Ratel offers complete interposition, the ability to interpose on all executed instructions in the enclave and monitor all interactions with the OS. Instruction-level interposition offers a general foundation for implementing a large variety of inline security monitors in thefuture.
    We take a principled approach in explaining why complete interposition on SGX is challenging. We draw attention to five design decisions in SGX that create fundamental trade-offs between performance and ensuring complete interposition, and we explain how to resolve them in the favor of complete interposition. To illustrate the utility of the Ratel framework, we present the first attempt to offer binary compatibility with existing software on SGX. We report that Ratel offers binary compatibility with over 200 programs we tested, including micro-benchmarks and real applications, such as Linux shell utilities. Runtimes for two programming languages, namely, Python and R, tested with standard benchmarks work out-of-the-box on Ratel without any specialized handling.

    1 Introduction

    Commercial processors today have native support for trusted execution environments (TEEs) to run user-level applications in isolation from other software on the system. A prime example of such a TEE is Intel Software Guard eXtensions (SGX) [59]. The hardware-isolated environment created by SGX, commonly referred to as an enclave, runs a user-level application without trusting privileged software. Enclaves offer a good basis for isolation, as they do not necessarily place trust on the OS and allow us to restrict the code base to trust. Further, they open up the possibility of reverse sandboxing, where the enclaved application protects itself from attacks arising from the OS [49].
    SGX exposes an extremely large interface between the enclave and the OS, including the potential to transfer control to the OS at every memory access (e.g., via memory faults) or instruction executed (e.g., via timer interrupts and exceptions). Furthermore, the demand for running commodity applications inside SGX has surged, but these applications are not written to deal with the threat of a malicious OS. Therefore, the ability to interpose on all control and data passed on the enclave-OS interface is an important building block. Such interposition can be used for implementing compatibility frameworks, a host of well-known inline security monitors, and sandboxing techniques inside enclaves [14, 47, 73, 83, 84].
    Complete interposition on enclave-OS interface is a known challenge. For example, a long line of work on frameworks that aim to run existing software on SGX highlights the difficulty of ensuring compatibility [16, 20, 30, 72, 75]. In this work, we address this challenge by taking a new approach: we enable dynamic binary translation(DBT), i.e., the ability to interpose on all enclave instructions executed in the enclave. Our work enables DBT on Intel SGX enclaves for unmodified x86_64 Linux binaries by designing a system called Ratel. Ratel is available open-source [1] and it builds on DynamoRIO, an industrial-strength DBT engine originally designed for non-enclave code [24]. The Ratel DBT engine does not trust the OS, and enclave applications running on Ratel are assumed to be unaware of its presence. A security monitor implemented using Ratel can mediate and intercept on all instructions, entry-exits, system calls, dynamically generated code, asynchronous events, virtual address accesses, and runtime loading of code and data in the enclave—a foundation for implementing a wide variety of security-related instrumentation on enclaves in the future, without specializing to individual applications or language runtimes.
    To illustrate one advantage of such seamless interposition, in this work, we use Ratel to build a binary compatibility layer for SGX. Binary compatibility creates the illusion for an unmodified application binary as if it is running in a normal OS process, rather than in a restricted environment such as an enclave. In designing this layer, we observe several trade-offs arise between ensuring complete interposition on the OS-enclave boundary and the resulting performance. These trade-offs are orthogonal to security concerns pointed out in prior works (cf. Iago attacks [31], side-channels [90]). We observe that these trade-offs are somewhat fundamental and rooted in five specific restrictions imposed by the SGX design, which create sweeping incompatibility with multi-threading, memory mapping, synchronization, signal-handling, shared memory, and other commodity OS abstractions. Our design resolves these trade-offs consistently in the favor of complete interposition rather than performance. In this sense, our work departs from prior works.
    Ratel is the first system that enables DBT and binary compatibility for SGX, to the best of our knowledge. Prior works have proposed a number of different ways of achieving partial compatibility—offering specific programming languages for authoring enclave code [17, 64, 86], keeping compatibility with container interfaces [16, 49], or conformance to specific versions of library interfaces provided by library OSes [20, 30, 68, 72, 75]. All of these designs, however, assume that the application binaries run benign code that uses a particular prescribed interface to achieve compatibility on SGX—for example, application binaries are expected to be relinked against specific versions of libraries (e.g., \( {\tt musl} \) , \( {\tt libc} \) , \( {\tt glibc} \) ), ported to a customized OS, or containerized. In contrast, Ratel interposes at the instruction-level execution of unmodified program binaries in the enclave, and this approach conceptually does not require such strong assumptions.
    Results. We highlight three results showing the egalitarian compatibility offered by Ratel. First, we find that Ratel supports more than one language runtimes (e.g., Python and R) out-of-the-box, without requiring any language-specific design decisions. Second, we successfully run a total of 203 unique unmodified binaries across five benchmark suites (58 binaries), four real-world application use-cases (12 binaries), and 133 Linux utilities. These encompass various work-load profiles including CPU-intensive (SPEC 2006), I/O system call intensive (FSCQ, IOZone), system stress-testing (HBenchOS), multi-threading support (Parsec-SPLASH2), a machine learning library (Torch), and real-world applications demonstrated in prior works on SGX. Ratel offers compatibility but does not force applications to use any specific libraries or higher-level interfaces. At the same time, our presented techniques work without any specialization per target application or runtime, highlighting that DBT can be a general solution to compatibility on enclaves. Last, we show that Ratel has comparable1 or better compatibility than Graphene-SGX, which requires relinking with a particular version of libc, and is one of the longest maintained SGX compatibility infrastructure available publicly.

    2 Why Is Complete Interposition Challenging?

    Intel SGX allows execution of code inside a hardware-isolated environment called an enclave for running user-level application code [35].2 Our goal is to interpose on all the instructions executed inside the enclave. This is a challenge on SGX because of its severe threat model and restrictions placed on enclave code. The OS is not trusted in this threat model. SGX enforces confidentiality and integrity of enclave-bound code and data. All enclave memory is private and only accessible when executing in enclave-mode. Data exchanged with the external world (e.g., the host application or OS) must reside in public memory, which is not protected. At runtime, execution control can only synchronously enter an enclave via \( {\tt ECALL} \) s and exit an enclave via \( {\tt OCALL} \) s, which are primary interfaces provided by SGX for effecting system calls (syscalls). Any illegal instructions or exceptions in the enclave create asynchronous entry-exit points. SGX restricts these to pre-specified points in the program. If the enclave execution is interrupted asynchronously, then SGX saves the enclave code execution context and resumes it at the entry point later [2].

    2.1 Restrictions Imposed by SGX Design

    Intel SGX protects the enclave by enforcing strict isolation at several points of interactions between the OS and the user enclave code. We outline five restrictions that the design of SGX imposes:
    (R1.)
    Spatial memory partitioning. SGX enforces spatial memory partitioning. It reserves a region that is private to the enclave and the rest of the virtual memory is public. Memory can either be public or private, not both.
    (R2.)
    Static memory partitioning. The enclave has to specify the spatial partitioning statically. The size, type (e.g., code, data, stack, heap), and permissions for its private memory have to be specified before creation and these attributes cannot be changed at runtime.
    (R3.)
    Non-shareable private memory. An enclave cannot share its private memory with other enclaves.
    (R4.)
    1-to-1 private virtual memory mappings. Private memory spans over a contiguous virtual address (VA) range, the start address of which is decided by the OS. The private VA space has a 1-to-1 mapping with the physical address (PA) space.
    (R5.)
    Fixed entry points. An enclave can only resume execution from its last point and context of exit. Any other entry points/contexts have to be statically pre-specified in the binary as valid ahead of time.

    2.2 Ramifications on Incompatibility

    Restrictions R1–R5 are a systematic way to understand the incompatibility created by design choices in SGX with the OS and application functionality. Table 1 summarizes the effect. All of the restrictions apply to SGX v1 [12]. \( R1 \) \( R4 \) apply to SGX v2 [58, 89], but \( R2 \) is relaxed, because SGX v2 enables dynamic page creation and permission changes. Thus, for the rest of the article, we describe our design based on SGX v1. We discuss their specific differences to SGX v2 and its ramifications in Section 7.2.
    Table 1.
    OS abstractionRestrictions affecting abstraction
    System call argumentsR1
    Dynamic loaded/generated codeR2
    Thread supportR5, R2
    Signal handlingR1, R5
    Thread synchronizationR3, R1
    File/memory mappingR1, R2, R3, R4
    IPC/shared memoryR3, R4
    Table 1. Ramifications of SGX Design Restrictions on Common OS Abstractions
    R1. Since SGX spatially partitions the enclave memory, any data that is exchanged with the OS requires copying between private and public memory. In normal applications, an OS assumes that it can access all the memory of a user process, but this is no longer true for enclaves. Any syscall arguments that reside in enclave private memory are not accessible to the OS or the host process. The enclave has to explicitly manage a public and a private copy of the data to make it accessible externally and to shield it from unwanted modification when necessary. We refer to this as a two-copy mechanism. Thus, \( R1 \) breaks functionality (e.g., system calls, signal handling, futex), introduces non-transparency (e.g., explicitly synchronizing both copies), and introduces security gaps (e.g., TOCTOU attacks [31, 43]).
    R2. Applications often require changes to the size or permissions of enclave memory. For example, memory permissions change after dynamic loading of libraries (e.g., \( {\tt dlopen} \) ) or files (e.g., \( {\tt mmap} \) ), executing dynamically generated code, creating read-only zero-ed data segments (e.g., .bss), and for software-based isolation of security-sensitive data. The restriction R2 is incompatible with such functionality. To work with this restriction, applications require careful semantic changes: either weaken the protection (e.g., read-and-execute instead of read-or-execute), use the two-copy mechanism, or rely on some additional form of isolation (e.g., using segmentation or software instrumentation).
    R3. SGX has no mechanism to allow two enclaves to share parts of their private memory directly. This restriction is incompatible with the synchronization primitives like locks and shared memory when there is no trusted OS synchronization service. Keeping two copies of a shared lock breaks its semantics and creates a chicken-and-egg issue: how to synchronize the two copies without another trusted synchronization primitive.
    R4. When applications demand new virtual address mappings (e.g., \( {\tt malloc} \) ), the OS adds these mappings. Normally, applications can ask the OS to map the same physical page at several different offsets, either with same or different permissions—for example, say when the same file is mapped as read-only at two places in the program space. On SGX, however, the same PA cannot be mapped to multiple enclave VAs. Any such mappings lead to memory protection faults.
    R5. SGX starts or resumes enclave execution only from controlled entry points, i.e., which have to be statically identified virtual addresses. Entry/exit points such as those via system calls are feasible to identify statically. However, there are several unexpected entry points to an application when we run them unmodified in an enclave (e.g., due to exception-generating instructions or faulting memory accesses). Determining all potential program points across the enclave boundary is not straightforward. When the control re-enters the enclave after an exit (e.g., an OCALL), SGX requires that the program execution context at the time of exit and re-entry should be the same. This does not adhere to typical program functionality. Normally, if the program wants to execute custom error handling code, say after a divide-by-zero ( \( {\tt SIGFPE} \) ) or illegal instruction ( \( {\tt SIGILL} \) ), it can resume execution at a handler function in the binary with appropriate execution context setup by the OS. On the contrary, SGX will resume enclave execution at the same instruction and same context (not the OS setup context for exception handling), thus re-triggering an exception if naively handled.
    It is possible to enable compatibility with exceptions by using the SGX features for exception handling [59]. However, it requires inserting logic to update exception handling data structures (e.g., SSA, enclave stack for EENTER, ERESUME) before and after the exception occurs. Doing such changes in unmodified binaries, though not conceptually impossible, is complex and intricate. Our design needs to prevent overriding any pre-existing legitimate behavior of the enclave binary and ensure that the context expected by the application binary remains the same, i.e., as if the binary was running outside the enclave. We explain the challenges and our design in Section 3, and Section 4 provides details.

    3 Overview

    Our work poses the following question: Can complete interposition on the OS-enclave interfaces be achieved on the SGX platform? We present the first system that allows interposing on all enclave-bound instructions, by enabling a widely-used DBT engine inside SGX enclaves. Our system is called Ratel.
    Before we present the design of our DBT engine, we emphasize a key design trade-off: Working with restrictions \( R1\text{--}R5 \) , we observe that one is forced to choose between complete interposition at the OS-enclave interfaces and performance. We explain these trade-offs in Section 3.2. Our design picks completeness of interposition over performance, wherever necessary. In this design principle, it fundamentally departs from prior work.
    Several different approaches to enable applications in SGX enclaves have been proposed. In nearly all prior works, performance consideration dominates design decisions. A prominent way to side-step the performance costs of ensuring compatibility is to ask the application to use a prescribed program-level interface or API. The choice of interfaces varies. They include specific programming languages [33, 41, 44, 86], application frameworks [53], container interfaces [16], and particular implementations of standard \( {\tt libc} \) interfaces. Figure 1 shows the prescribed interfaces in three approaches, including library OSes and container engines, and where they intercept the application to maintain compatibility. Given that complete instruction-level interposition is not the objective of prior works, they handle only subsets of \( R1\text{--}R5 \) . One drawback of these approaches is that if an application does not originally use the prescribed API, the application needs to be rewritten, recompiled from source, or relinked against specific libraries. Further, enclave programs may invoke the OS interfaces directly outside the prescribed API. The approach of complete interposition at the lowest level of interfaces (i.e., at each executed instruction) offers a powerful way of providing compatibility without specializing to specific target applications or making any such assumptions on the application behavior.
    Fig. 1.
    Fig. 1. Different abstraction layer choices for compatibility. Black shaded regions are untrusted, gray shaded regions are modifications or additions, thick solid lines are enclave boundaries, dotted lines are container/process boundaries, clear boxes are unmodified components, zigzag lines show break in compatibility. (a) Container abstraction with \( {\tt musl} \) or \( {\tt libc} \) interface (Scone [16]). (b) Library OS with \( {\tt glibc} \) interface (Graphene-SGX [30]). (c) Process abstraction with POSIX interface (Panoply [75]). (d) Dynamic Binary Translation with DynamoRIO in Ratel (This work).
    As an illustration of its utility, we show that we can build a binary compatibility layer for Linux-based SGX enclaves on Ratel. Application binaries are originally created with the intention of running on a particular OS in an unrestricted OS process environment. A binary compatibility layer runs below the application and translates any code illegal in the restrictive SGX environment to the appropriate enclave-OS interfaces. In concept, application code is thus free to use any library, direct assembly code, and runtime that uses the Linux system call interfaces. Furthermore, Ratel has about 26 additional instruction-level runtime profilers and monitors, which are pre-existing in our baseline DBT engine (see Section 6.4). These become available to applications running on SGX enclaves directly. Such instrumentation can be used for debugging, resource accounting, or implementing inline security monitors for enclaved code in the future.

    3.1 Background on DBT

    Dynamic binary translation (DBT) is a well-known approach to binary code instrumentation and implementing inline reference monitors. It intercepts each instruction in a program before it executes [52]. In this article, we choose DynamoRIO as our DBT engine, since it is open-source and widely used in industry [24].3 Vanilla DynamoRIO works much like a just-in-time compilation engine, which dynamically re-generates (and instruments) code of the application running on it. At a high level, DynamoRIO first loads itself and then loads the application code in a separate part of the VA space, as shown in Figure 2. Similarly, it sets up two different contexts, one for itself and one for the application. DynamoRIO can update the code on-the-fly before putting it in the code cache by re-writing instructions (e.g., convert a syscall instruction to a stub or library function call). Such rewriting ensures that DynamoRIO engine takes control before each block of code executes, enabling the ability to interpose on every instruction. Instrumented code blocks are placed in a region of memory called a code cache. When the code cache executes, DynamoRIO regains control as the instrumentation logic desires. It does post-execution updates to itself for book-keeping or to the program’s state. Additionally, DynamoRIO hooks on all events intended for the process (e.g., signals). The application itself is prevented from accessing DynamoRIO memory via address-space isolation techniques [52]. Thus, it acts as an arbiter between the application’s binary code and the external environment(e.g., OS, filesystem) with complete interposition.
    Fig. 2.
    Fig. 2. Ratel overview.
    The original DynamoRIO engine is designed to work for non-enclave code. We adapt it to work inside SGX enclaves, resulting in our Ratel system. To contrast it with the approach of changing \( {\tt libc} \) , DBT intercepts the application right at the point at which it interacts with the OS (Figure 1) for SGX compatibility. Ratel retains the entire low-level instruction translation and introspection machinery of DynamoRIO, including the code cache and its performance optimizations. This enables reusing well-established techniques for application instrumentation and performance enhancements. We eliminate the support for auxiliary client plugins to reduce trusted computing base (TCB), but a suite of built-in runtime profilers (see Table 12), which do not use the DynamoRIO client plugin interfaces are retained in Ratel.
    DynamoRIO instrumentation capability is also designed to execute a race-free application without introducing any new races [26]. To achieve this, vanilla DynamoRIO engine itself outlines the following design principles:
    (a)
    DynamoRIO injects itself into existing threads. DynamoRIO does not create any new threads during its own execution. Instead, it uses application threads to perform all translation operations.
    (b)
    Separate DynamoRIO and application locks. DynamoRIO does not re-use application locks for its own synchronization. It introduces new lock variables and exclusively uses them. Further, DynamoRIO does not modify any lock operations of the application code.
    (c)
    Lock ordering to avoid deadlocks. DynamoRIO does not hold any locks when the application code in the code cache holds locks. This way locks held by the application have higher priority in the order in which locks must be acquired, a well-known way of avoiding deadlocks.
    (d)
    Memory consistency for race-free applications. The original DynamoRIO implementation only inserts memory reads for indirect branches and certain system call arguments. To safeguard consistency for memory reads, DynamoRIO does not insert additional memory access or modify the order of accesses in the application. For system call processing, DynamoRIO uses locks following the principles (a), (b), and (c). Applications that originally have race conditions are not protected by DynamoRIO, but this is unavoidable.
    The four principles above avoid introducing any new concurrency bugs (e.g., race conditions) in the translated application. We explain how our approach in Ratel piggybacks on such thread-safety to ensure there are no concurrency issues beyond those that exist in the logic of the original translated program in Section 3.3.

    3.2 Ratel Approach

    Ratel provides compatibility for both the DynamoRIO DBT engine as well as any application binary code that DynamoRIO translates. We provide a high-level overview of our design and explain its key trade-offs.
    High-level Overview. As a first step, Ratel loads the DynamoRIO dynamic translation engine at a specific location in virtual memory, which we denote as \( A \) . Let us say that the vanilla dynamic translation engine in DynamoRIO is coded to access virtual address memory regions denoted as \( B \) and the target application accesses regions \( C \) , respectively. The basic principle behind the design of Ratel is to ensure referential transparency: whenever the DynamoRIO engine accesses any virtual address that would have been a location in \( B \) (without Ratel), it must now access the corresponding location with base at the relocated address \( A \) in Ratel. Similarly, if the application would have accessed a location in \( C \) originally, then Ratel must ensure that it accesses the corresponding to its translated location. To do this, Ratel must (a) intercept all operations that create virtual memory maps (e.g., via static and dynamic loading), and (b) keeps an address translation table in \( A \) for translating program accesses made by the target application dynamically. Note that such referential transparency for memory accesses provides compatibility with position-independent code, dynamically generated code, and shadow memory data structures (e.g., shadow stacks) that the application may have originally used—the memory references at runtime resolve consistently to the same translated address and the values read/written as thus consistent with the original run. For security, Ratel must ensure that all accesses to \( A \) originate from the dynamic translation engine itself, and the application code is unable to access \( A \) directly—a memory isolation policy. Ratel enforces this policy for the dynamic translation engine by modifying its code statically. For the target application, memory isolation can be enforced at runtime through the instruction rewriting capability of DynamoRIO itself, as done in program shepherding [52].
    In addition, Ratel modifies DynamoRIO to adhere to SGX virtual memory limitations (R1–R4). In designing Ratel, we statically change the DynamoRIO code to load it at a fixed memory region \( A \) . Note that \( A \) can be fixed at the time of initialization of the process (loading), therefore, it does not break compatibility with address-space layout randomization. This allows us to load Ratel and start its execution without violating the memory semantics of SGX. We register a fixed entry point in Ratel when entering or resuming the enclave. This entry point acts as a unified trampoline, such that upon entry, Ratel decides where to redirect the control flow, depending on the previously saved context. In DynamoRIO code, we statically replace all instructions that are illegal in SGX with an external call that executes outside the enclave. Thus, Ratel execution itself is guaranteed to never violate R5.
    Ratel has complete control over the loading and running of the translated application binary. Therefore, to ensure that the application adheres to R1–R5, Ratel dynamically rewrites the instructions before they are executed from the code cache. To keep compatibility with R2, we statically initialize the virtual memory size of the application to the maximum allowed by SGX; the type and permissions of memory are set to the specified type in the original binary. Ratel augments its memory manager to keep track of and transparently update the application memory layout as it changes during execution. At runtime, the application can make direct changes to its own virtual memory layout via system calls. Ratel dynamically adapts these changes to SGX by making two copies, wherever necessary, or by relocating the virtual address regions. Ratel intercepts all application interactions with the OS. It modifies application parameters, OS return values and events for monitoring indirect changes to the memory (e.g., thread creation). Before executing any application logic, Ratel scans the code cache for any instructions (e.g., \( {\tt syscall} \) , \( {\tt cpuid} \) ) that may potentially be deemed as illegal in SGX and replaces it with an external call. In the other direction, Ratel also intercepts OS events on the behalf of the application. Upon re-entry, if the event has to be delivered to the application (e.g., signals for application itself), it sets/restores the appropriate execution context and resumes execution via the trampoline. In this way, Ratel remedies the application on-the-fly to adhere to R1–R5.

    3.3 Resolving Key Design Trade-offs

    Ratel helps to interpose on the enclave code without relying on the untrusted OS. In doing so, the SGX restrictions \( R1\text{--}R5 \) give rise to trade-offs between ensuring complete interposition and having low performance overheads. We point out that these are somewhat fundamental and apply to Ratel and other compatibility efforts equally. However, Ratel chooses completeness in its interposition over performance, whenever conflicts arise.
    Two-copy mechanism for \( R1 \) & \( R2 \) . Due to restriction \( R1 \) , whenever the application wants to read or write data outside the enclave, the data needs to be placed in public memory. Computing on data in public memory, which is exposed to the OS, is insecure. Therefore, if the application wishes to securely compute on the data, then a copy must necessarily be maintained in a separate private memory space, as \( R2 \) forbids making changes to the memory permissions dynamically. Specifically, we use a two-copy mechanism, instances of which repeat throughout the design. Consider that case where an enclave wishes to write data to a file. The OS cannot access the buffer in enclave private memory. Here, the enclave can utilize a two-copy design, in which we place the file data in public memory. The enclave keeps an additional copy i.e., the second copy of that data in its private memory. When the enclave wishes to update the file, the enclave updates the private and the public copy. When reading data from the public memory such as a file, an enclave must copy the data to private memory and check it before further use.
    The above two-copy design pattern is used in other scenarios in Ratel too, specifically, when the enclave shares data with the OS (e.g., for system call handling) and when the data in its private memory needs to have different permissions over time. The OS can manipulate the public memory content at any time—while the enclave is computing on the data, after the enclave’s check but before its use. Such changes by the OS can result in TOCTOU bugs. Keeping a separate private copy in the enclave reduces the attack surface. The other scenario where two-copy design comes useful is when the enclave wants to change the permissions of its private memory, such as in switching read-only data to executable/writable, but within the enclave itself. Such permission switching for memory region is disabled on SGX due to R2. To address this, the two-copy mechanism creates a copy of the data in an additional separate memory region inside private memory, such that the new region has the new permissions. The two-copy mechanism, however, incurs both space and computational performance overheads. Every update to the data causes at least a memory copy operation. The total overhead depends on the application’s workload characteristics (kinds and frequency of shared data accesses) and can vary from \( 10\% \) for CPU-intensive workloads to \( 10\times \) for IO-intensive workloads (see Section 6.3).
    Memory Sharing for \( R3 \) & \( R4 \) . \( R3 \) creates an “all or none” trust model for enclaves. Either memory is shared with all entities (including the OS) or kept private to one enclave. \( R4 \) restricts sharing memory within an enclave further. These restrictions conflict with semantics of shared memory and synchronization primitives. For instance, synchronization primitives such as futexes are implemented with a single memory copy that the OS is trusted to manage securely—such a design is in direct conflict with the SGX security model. To implement such abstractions securely, designs on SGX must rely on a trusted software manager, which necessarily resides in an enclave, since the OS is untrusted (see Section 4.4). Applications can then maintain compatibility with locks and shared memory abstractions. But, this comes at a trade-off in performance costs: access to shared memory or synchronization primitives, which are originally inexpensive memory accesses, turn into (possibly remote) procedure calls to the trusted manager enclave.
    Secure entry-points for \( R5 \) . Restriction \( R5 \) requires that whenever the enclave resumes control after an exit, the enclave state (or context) should be the same as right before exit. This implies that the security monitor (e.g., the DBT engine) must take control before all exit points and after resumption, to save-restore contexts—otherwise, the interposition can be incomplete, creating security holes and incompatibility. Without guarantees of complete interposition, the OS can return control into the enclave, bypassing security checks that the DBT engine implements. The price for complete interposition on binaries is performance—the DBT engine must intercept all entry/exit points and simulate additional context switches in software. Prior approaches, such as library OSes, choose performance over completeness in interposition, by asking applications to link against specific library interfaces that constrict enclave-OS interaction via certain specified library interfaces. But, this does not enforce complete interposition. Applications, due to bugs or when exploited, can make direct OS interactions without using the prescribed API, use inline assembly, or override entry handlers setup by the library OS. All scenarios in which applications go outside the prescribed interfaces, the library OS design requires special handling.
    Several additional security considerations arise in the implementation details of our design. These include (a) avoiding naïve designs that have TOCTOU attacks; (b) saving and restoring the execution context from private memory; (c) maintaining Ratel-specific metadata in private memory to ensure integrity of memory mappings that change at runtime; and (d) explicitly zeroing out memory content and pointers after use. We explain them inline in Section 4.

    3.4 Threat Model and Scope

    Ratel is best viewed as a general framework for instruction-level interposition on enclaved binaries, rather than a stand-alone sandboxing engine that protects enclaves against all possible OS attacks. Ratel itself does not trust the OS or any security guarantees the OS provides. Binaries running on top of Ratel otherwise follow the same threat model as vanilla DBT engines [82]. Application binaries are assumed to not be aware of the presence of Ratel—they are benign binaries but they can be exploited via externally provided inputs. Under exploitation, malicious code may execute on Ratel. Ratel provides instruction-level instrumentation of all instructions executed and does not provide any higher-level security guarantees beyond that. For example, malicious code can readily determine that it is running on Ratel [42] and, therefore, Ratel is not suitable for analyzing analysis-evading malicious code. Using the instruction-level monitoring capability, the vanilla DynamoRIO engine itself provides certain built-in security mechanisms, which are preserved in Ratel. Specifically, ASLR for all heap regions is turned on and the code cache is randomized. Ratel isolates the stack, dynamically allocated memory, and file-mapped I/O regions through instrumentation. The main new challenges highlighted in this work are those due to enabling DBT on SGX, while most other threats to DBT-based instrumentation are pre-existing and known. The design trade-offs we emphasize apply to any interposition framework that runs on SGX, but have not been emphasized clearly in prior works.
    Building an end-to-end secure sandbox on top of Ratel requires additional security mechanisms, which are common to other systems and are previously known. These mechanisms include encryption/decryption of external file or I/O content [16, 30, 49, 72], sanitization of OS inputs to prevent Iago attacks [31, 51, 77, 81], defenses against known side-channel attacks [22, 48, 66, 73, 74], additional attestation or integrity of dynamically loaded/generated code [44, 45, 46, 85], and so on. These are important but largely orthogonal to our focus.
    Our binary compatibility layer has support for large majority but not all of the Linux system calls. The most notable of these unsupported system calls is \( {\tt fork} \) , which is used for multi-processing. Since Ratel does not support multi-process applications, we support locks and synchronization primitives for threads within a single enclave. The basic design of Ratel can be extended to support \( {\tt fork} \) with the two-copy mechanism, similar to prior work [30, 75]. However, maintaining compatibility with fork blindly is a questionable design decision, especially for enclaves, as has been argued extensively [19]. A recent work on SGX compatibility has left out support for \( {\tt fork} \) and multi-enclave locks based on the same observation [72].

    4 Ratel Design

    We explain how Ratel handles syscalls, memory, threads, synchronization, and exceptions/signals inside SGX enclaves.

    4.1 Syscalls and Unanticipated Entry-Exits

    SGX does not allow enclaves to execute several instructions such as \( {\tt syscall} \) , \( {\tt cpuid} \) , and \( {\tt rdtsc} \) . If the enclave executes them, then SGX exits the enclave and generates a \( {\tt SIGILL} \) signal. Gracefully recovering from the failure requires re-entering the enclave at a different program point. Due to R5, this is disallowed by SGX. In Ratel, either DynamoRIO or the application can invoke illegal instructions, which may create unanticipated exits from the enclave.
    Ratel changes DynamoRIO logic to convert such illegal instruction to stubs that either delegate or emulate the functionality. For the target application, whenever Ratel observes an illegal instruction in the code cache, it replaces the instruction with a call to the Ratel syscall handler function. Ratel has three ways of handling system call execution:
    (1)
    Complete delegation: Entirely delegate the syscall instruction and handler to code outside the enclave;
    (2)
    Partial delegation: Execute the syscall instruction outside, and then update the private in-enclave state; or
    (3)
    Emulation: Completely simulating the syscall behavior with a handler inside the enclave.
    Ratel uses complete delegation for file, networking, and timer-related system calls. It uses partial delegation for memory management, threads, and signal handling. We outline the details of other syscall subsystems that are fully or partially emulated by Ratel in Sections 4.2, 4.3, 4.4, and 4.5. Ratel uses emulation for very few system calls. For example, the \( {\tt arch\_prctl} \) syscall is used to read the FS base. Ratel emulates it by executing a \( {\tt rdfsbase} \) instruction.
    Creating and Synchronizing Memory Copies. Syscalls access process memory for input-output parameters and error codes. Since enclaves do not allow this, for delegating the syscall outside the enclave, Ratel creates a copy of input parameters from private memory to public memory. This includes simple value copies as well as deep copies of structures. The OS then executes the syscall and generates results in public memory. After the syscall completes, Ratel copies back the OS-provided return values and error codes to private memory.
    Memory copies alone are not sufficient. For example, when loading a library, the application uses \( {\tt dl\_open} \) , which in turn calls \( {\tt mmap} \) , which must execute outside the enclave. Thus, the \( {\tt mmap} \) call outside the enclave will map the library in the untrusted public address space of the application. However, the original intent of the application is to map the library inside the enclave private memory. As another example, consider when the enclave code wants to create a new thread local storage (TLS) segment. Due to the restrictive SGX environment, Ratel must execute the system call outside the enclave, and the new thread is created for the DynamoRIO runtime instead of the target application. In all such cases, Ratel takes care to explicitly propagate changes to inside the enclave, i.e., reflect changes to private memory.
    When the enclave code copies input arguments or results for external functions (e.g., syscalls) between private and public memory, other enclave threads should not change the private memory. To ensure this, Ratel uses DynamoRIO locks to synchronize private memory accesses introduced by two-copy mechanism. Our implementation adheres to principles (a)–(d) outlined in Section 3.1. One caveat that arises in Ratel is when two threads in the same enclave attempt to perform syscalls simultaneously. Consider an example where the application has two threads, one of which is waiting/polling to acquire a lock, while the other holds it. As per the principle (c), the Ratel instance running in thread 1 will try to acquire a DynamoRIO-specific lock (inside the \( \tt {poll} \) syscall), which might already have been held by the Ratel instance running in the thread 2. This can lead to a deadlock. To avoid deadlocks in such cases, we examine each system call and add DynamoRIO locks only when the syscall updates private memory.
    Checking Memory State after Syscalls. Ratel resumes execution in the enclave only after the syscall state has been completely copied inside the enclave. This allows the enclave to employ sanitization of OS return values before using it. Previously known sanitization checks for Iago attacks can be implemented here [77]. Note that all such sanitization checks must execute inside the enclave and after the state from the public memory is copied into the enclave private memory. This caveat is important to avoid TOCTOU attacks wherein the OS modifies public memory state before or midway through the execution of the sanitization checks.

    4.2 Memory Management

    Ratel utilizes partial emulation for syscalls that change the process virtual memory layouts and permissions (e.g., mmap, mprotect, fsync, and so on). It executes the syscall outside the enclave and then explicitly reflects the memory layout changes inside the enclave. First, this is not straightforward. Due to R1–R4, several layout configurations are not allowed for enclave virtual memory (e.g., changing memory permissions). Second, Ratel does not trust the OS information (e.g., via \( {\tt procmap} \) ). Hence, Ratel must use a two-copy mechanism when it uses the partial delegation approach.
    Specifically, Ratel maintains its own \( {\tt procmap} \) -like structure to keep its own view of the process virtual memory inside the enclave, tracks the memory-related events, and updates the enclave state. For example in the case of mmap syscall to map a file in enclave private memory, the handler outside the enclave creates a public memory region by invoking the OS first. Then, Ratel allocates a region of private memory, which mirrors the content of the file mapped outside the enclave, and updates its internal \( {\tt procmap} \) -like structure to record the new virtual addresses created. Further, Ratel synchronizes the two-copies of memories to maintain execution semantics on all subsequent changes to \( {\tt mmap} \) ped-memory. This is done whenever the application unmaps the memory or invokes the \( {\tt sync} \) / \( {\tt fsync} \) syscalls.
    Ratel does not blindly replicate OS-dictated memory layout changes inside the enclave. It first checks if the resultant layout will violate any security semantics (e.g., mapping a buffer to zero-address). It proceeds to update enclave layout and memory content only if these checks succeed. To do this, Ratel keeps its metadata in private memory.
    With interposition over memory management, Ratel transparently side-steps SGX restriction due to R2. When application makes changes to the permissions of a memory region (say \( X \) ) dynamically, Ratel moves the content to a memory region (say \( Y \) ), which has the required permissions. To do this, Ratel requires a stash of unused private memory regions ( \( Y \) ) that are originally not used by the application. These memory regions are allocated statically by Ratel at the start and their page permissions are set to readable, writable, and executable. Subsequently, when the application binary accesses memory \( X \) , Ratel dynamically translates the access to the copy in memory \( Y \) . This allows Ratel to transparently emulate the permission change, which is otherwise a disallowed behavior inside the enclave.
    Note that the stash pool of private memory regions \( Y \) are private to the enclave, but have overly permissive access rights. This can be avoided by reserving regions with write-only and execute-only permissions, but this may decrease the size of usable non-stash memory. Alternatively, the access rights can be implemented through runtime monitoring of memory accesses, but this adds performance costs. The trade-offs are inherent to restrictions \( R1 \) and \( R2 \) .

    4.3 Multi-Threading

    Ratel supports multiple threads running in a single enclave. However, it has no support for fork to create multiple processes running in different enclaves. Therefore, our design restricts our concerns to enabling thread synchronization within a single enclave. This is still challenging, because restriction \( R2 \) requires the enclaved application to pre-declare a maximum number of threads before execution. SGX also does not allow the enclave to resume at arbitrary program points or execution contexts (restriction \( R5 \) ). This creates several challenges in adapting DynamoRIO to run in SGX.
    In the vanilla DynamoRIO design, the dynamic translation engine and the target application share the same thread, but they have separate TLS segment for cleaner context switch. DynamoRIO keeps the default TLS segment for the target application and creates a new TLS segment for itself at a different address. It switches between these two TLS segments by changing the segment register—DynamoRIO uses \( {\tt gsbase} \) and the application uses \( {\tt fsbase} \) .
    Multiplexing TLS Segments. Normally, to context switch between threads, one TLS segment to save the currently executing application thread context is sufficient. But with Ratel, we need an additional TLS segment to save the state of Ratel itself. Furthermore, SGX itself reserves one additional TLS segment for its own internal use. This brings the total number of required TLS segments, for a correct context switch, to 3 on SGX.
    But, the x86_64 architecture itself provides only two base registers ( \( {\tt fsbase} \) and \( {\tt gsbase} \) ) for storing pointers to TLS segments. Therefore, when we attempt to run DynamoRIO inside SGX, there are not enough base registers to save three TLS segment offsets (one each for DynamoRIO, SGX, and the application). We circumvent this limitation of the SGX platform as follows. First, Ratel adds two fields in each TLS segment to store \( {\tt fsbase} \) and \( {\tt gsbase} \) register values for that segment. We use these TLS segment fields to save and restore pointers to the segment base addresses. This allows us to still maintain and switch between three clean TLS segment views per thread. Second, when Ratel has to restore a TLS segment, it searches through a list of TLS segment base addresses, to find the right one to restore—this is because it does not have enough base registers to store three TLS segment bases (which would have avoided the search).
    Restoring TLS Segments on Context Switches. Ratel conceptually maintains a linked list of a maximum of three TLS segment base pointers. The head of the list is the \( {\tt fsbase} \) register, which serves as a pointer to the default first TLS segment created by SGX reserved for its own use. This default segment is called the primary and all TLS segments created subsequently are referred to as secondary. To point to the next TLS segment in the linked list, Ratel adds a new field in the TLS segment, which is NULL for the last element in the list. To traverse the list, the \( {\tt gsbase} \) register is used. The list search begins with the primary, and the right segment to restore is always the last element in the link list. This way, Ratel can search and decide which of the three TLS segments to restore using only two registers ( \( {\tt fsbase} \) and \( {\tt gsbase} \) ).
    There are two ways in which control can enter/exit from the enclave: via synchronous exits (e.g., \( {\tt ECALL} \) / \( {\tt OCALL} \) used for syscalls) and via asynchronous exits (e.g., used for exceptions, timer interrupts, and so on). During synchronous exits, Ratel sets up the TLS segment link list such that the restored TLS context state upon resumption is for DynamoRIO and SGX, respectively. The subsequent exception handlers copy state from outside the enclave to inside, perform various Iago checks, and then setup the TLS segment to that of the application thread. During asynchronous exits, Ratel does not need to perform any special setup for the TLS segment link list. When the exception handler executes on resumption, it sets up the executing context just before the exit. Ratel performs similar checks and operations as in the case of synchronous exits, and then restores the context to that before the exit (which may be DynamoRIO’s context or that of the application thread). Therefore, our design works correctly for both synchronous and asynchronous exits.
    Dynamic Threading. Since the number of TCS entries is fixed at enclave creation time on SGX, the maximum number of threads supported is capped. Ratel multiplexes the limited TCS entries available among the application threads dynamically, as shown in Figure 4. When an application wants to create a new thread (e.g., via clone), Ratel first checks if there is a free TCS slot. If it is the case, then it performs an \( {\tt OCALL} \) to do so outside the enclave. Otherwise, it busy-waits until a TCS slot is released. Once a TCS slot is available, the \( {\tt OCALL} \) creates a new thread outside the enclave. After finishing thread creation, the parent thread returns back to the enclave and resumes execution. The child thread explicitly performs an \( {\tt ECALL} \) to enter the enclave and DynamoRIO resumes execution for the application’s child thread.
    For all threading operations, Ratel ensures transparent context switches to preserve binary compatibility as intended by DynamoRIO. For security, Ratel creates and stores all thread-specific context either inside the enclave or SGX’s secure hardware-backed storage at all times. It does not use any OS data structures or addresses for thread management.
    Safety of Two-copy Mechanism. Ratel does not introduce any new concurrency issues due to its two-copy mechanism (i.e., one-way or two-way copy). Consider two threads in the same enclave sharing a region of memory and concurrently accessing it. If the original code itself has data races when accessing memory, then those races will be preserved in Ratel with its two-copy mechanism. However, if the original code is race-free, then its execution with Ratel will also be race-free. The two-copy mechanism ensures this, because it does not introduce any additional writes/reads to shared (public) memory—it only replaces the original reads/writes of the shared data with reads/writes to translated addresses in private memory. The use of a private copy does not alter the semantics of the original access. In Section 6.1.4, we evaluate the above scenarios for native, DynamoRIO, and Ratel execution of real-world applications.

    4.4 Thread Synchronization

    SGX provides basic synchronization primitives (e.g., SGX mutex and conditional locks) backed by hardware locks. But they can only be used for enclave code. Thus, they are semantically incompatible with the lock mechanisms used by DynamoRIO or legacy applications, which use OS locks. For example, DynamoRIO implements a fast lock using the \( {\tt futex} \) syscall, where the lock is kept in a shared memory accessible to all application threads and the OS. Here, the OS needs the ability to read the lock state to determine whether it should wait during the \( {\tt FUTEX\_WAIT} \) syscall.
    A naive design would be to maintain the \( {\tt futex} \) lock in public memory, such that it is accessible to the enclave(s) and the OS. However, the OS can arbitrarily change the lock state and attack the application. Specifically, it can reset the lock during the execution of a critical section in an application thread. Therefore, this design choice is not safe.
    As an alternative, we can employ a two-copy mechanism for locks. The enclave can keep the lock in private memory. When it wants to communicate state change to the OS, Ratel can tunnel a futex \( {\tt OCALL} \) to the host OS. This approach is problematic as well. Threads inside the enclave may frequently update the locks in private memory. The futex state outside the enclave needs to be kept consistent with the private copy, when the OS kernel and the untrusted part of the enclave access it, or else the semantics of the lock may not uphold. The more frequent the local updates to the in-enclave copy of the lock state, the higher the chance of inconsistencies. In general, avoiding such race conditions usually involves using locks for synchronizing. But requiring locks to synchronize copies of other locks, as suggested in this design alternative, only results in a chicken-and-egg problem.
    Supporting such semantics efficiently, where the OS has a shared read access to the lock state, is difficult with SGX because of restrictions \( R1 \) and \( R3 \) . Figure 3 shows the schematics of design choices for implementing synchronization primitives available on SGX. Note that options (a) and (b) are insecure as discussed above. Ratel implements a lock manager inside the same enclave that executes the application. Our simplification has one limitation: Only threads within the same application process (and enclave) can utilize Ratel synchronization primitives.
    Fig. 3.
    Fig. 3. Lock synchronization design choices: (a) Futex. (b) Two-copy design with futex in public memory. (c) Ratel case: the threads and lock manager are in the enclave.
    Fig. 4.
    Fig. 4. Design for multi-threading in Ratel.
    Ratel Lock Manager Implementation. Our design choice of using a single enclave to execute both the DynamoRIO engine and all application threads eliminates considerable complexity in implementation. It turns out that a futex-based locks become unnecessary, since we do not need sharing across the process boundary or with the OS. The DynamoRIO usage of futexes can thus be replaced with a simpler primitive such as spinlocks to achieve the same functionality. Specifically, Ratel implements a lock manager using the hardware spinlock exposed by SGX. Ratel invokes its in-enclave lock manager either when DynamoRIO uses futexes or when the application binaries perform lock-based synchronization. For the DynamoRIO code, we manually change it to invoke our lock manager. In case of application locks, Ratel loads application binary into the code cache and replaces thread-related calls (e.g., \( {\tt pthread\_cond\_wait} \) ) in the enclave-OS interface with stubs to invoke our lock manager to use Ratel-provided safe synchronization primitives.

    4.5 Signal Handling

    Ratel cannot piggyback on the existing signal handling mechanism exposed by the SGX, due to restriction \( R5 \) . Specifically, when DynamoRIO executes inside the enclave, the DynamoRIO signal handler needs to get description of the event to handle it (Figure 5(a)). However, Intel’s SGX platform software removes all such information when it delivers the signal to the enclave. This breaks the functionality of programmer-defined handlers to recover from well known exceptions (e.g., divide by zero). Further, any illegal instructions inside the enclave generate exceptions, which are raised in the form of signals. Existing binaries may not have handlers for recovering from such illegal instructions. Therefore, Ratel must provide handlers for all such exceptions.
    Fig. 5.
    Fig. 5. (a) Original signal handling in DynamoRIO. (b) Signal handling in Ratel.
    Recall that SGX allows entering the enclave at fixed program points. Leveraging this, Ratel employs a primary signal handler that it registers with SGX. For any signals generated for DynamoRIO or the application, we always enter the enclave via the primary handler and we copy the signal number into the enclave. We then use the primary as a trampoline to route the control to the appropriate secondary signal handler inside the enclave, based on the signal number. At a high-level, we realize a virtualized trap-and-emulate signal handling design. We use SGX signal handling semantics for our primary. For the secondary, we setup and tear down a separate stack to mimic the semantics in the software. The intricate details of handling the stack state at the time of such context switch are elided here. Figure 5(b) shows a schematic of our design, and we explain the flow of control and associated issues here.
    Registration. The original DynamoRIO code and the application binary use \( {\tt sigaction} \) to register signal handlers for itself. In Ratel, first we change DynamoRIO logic to register only the primary signal handler with SGX. We then record the DynamoRIO and application registrations as secondary handlers. This way, when SGX or the OS delivers the signal to the enclave, SGX directs the control to our primary handler.4 Since this is a pre-registered handler, SGX allows it. The primary handler checks the signal information (e.g., signal code) and explicitly routes execution to the secondary.
    Delivery. A signal may arrive when the execution control is inside the enclave. In this case, Ratel executes a primary signal handler that delivers the signal to the enclave. However, if the signal arrives when the CPU is in a non-enclave context, SGX does not automatically invoke the enclave to redirect execution flow. To force this, Ratel has to explicitly enter the enclave. But it can only enter at a pre-registered program point with a valid context, as per restriction \( R5 \) . Thus, Ratel first wakes up the enclave at a valid point (via \( {\tt ECALL} \) ). Ratel copies the signal information, passed as an input argument to the \( {\tt ECALL} \) , to private memory. It then simulates the signal delivery by setting up the enclave stack in private memory to execute the primary handler.
    Exit. After executing their logic, handlers use \( {\tt sigreturn} \) instruction for returning control to the point before the signal interrupted the execution. When Ratel observes this instruction in the secondary handler it has to simulate a return back to the primary handler instead. The primary handler then performs its own real \( {\tt sigreturn} \) . SGX then resumes execution from the point before the signal was generated.
    Handling Nested Signals. One issue with platforms like SGX is that it supports synchronous and asynchronous signals. Signals can be nested, in the sense that signals can be delivered by the OS while the enclave is handling another one. The enclave cannot mask signals selectively at runtime on SGX. Accordingly, the potential for subtle reentrancy bugs in the enclave signal handling code arises. At a high-level, Ratel handles signals safely by ensuring that unsafe nesting is not possible. Specifically, the SGX platform automatically saves enclave state in private memory regions pointed to by hardware State Save Area (SSA) when an exception is to be delivered. To support nesting, SGX provides an array of SSAs frames, leaving the option to set the maximum size of array (hence, the maximum possible nesting depth) to the enclave. Ratel utilizes this feature to limit the nesting depth to 2. This is needed because in Ratel one SSA frame can be used by Ratel itself and the other by the target application. With the maximum nesting depth set to 2, if a nested signal of depth 3 is being attempted to be delivered, SGX securely aborts the enclave, since not enough SSA frames are available. With this design, there are only four possible combinations of reentrancy to reason about:
    (1)
    Ratel signal handler is interrupted with a signal to be delivered to the application;
    (2)
    application’s signal handler is interrupted with a signal to be delivered to Ratel;
    (3)
    Ratel signal handler is interrupted with a signal to be delivered to the Ratel;
    (4)
    application handler is interrupted with a signal to be delivered to the application.
    In Cases 1 and 3, DynamoRIO gets execution control. In Case 1, DynamoRIO has one additional SSA frame available to save the current state of the primary handler and deliver control to the beginning of the primary signal handler but with the new context. In Case 3, the two SSA frames are already used up to save Ratel’s primary handler state and the application’s secondary handler state. Therefore, SGX is unable to deliver the signal and the enclave execution is aborted. In Cases 2 and 4, similar to Case 3, the two SSA frames are already in use to store the current state of the primary and secondary signal handlers, respectively. SGX, thus, cannot deliver the signal and the enclave is aborted. Thus, in all the above scenarios, Ratel design handles reentrancy safely. Note that Ratel reentrancy handling for synchronous exceptions, which are used for threads and syscalls (Section 4.3), and asynchronous exceptions is the same regardless of the exception type.
    Remark. Ratel signal handling extends the trap-and-emulate principle used by DynamoRIO. Since DynamoRIO design enforces transparency, we preserve this in Ratel. This goes beyond merely working around the SGX limitations, thus making our design different than existing frameworks. Specifically, existing library OS-based SGX frameworks (e.g., Graphene-SGX and Occlum) assume that all exception registration and execution will go via the prescribed library interfaces. So, they do not keep a separate signal context for the library OS signals and the application. These works do not discuss what happens during nested signals or when the enclave-OS interaction happens outside of the prescribed library registration and handler mechanisms.

    5 Implementation

    We implement Ratel on DynamoRIO [24]. We run DynamoRIO inside an enclave with the help of a standard Intel SGX development kit that includes user-land software (SDK v2.1), platform software (PSW v2.1), and a Linux kernel driver (v1.5). We make a total of 9667 LoC software changes to DynamoRIO and SDK infrastructure. We run Ratel on unmodified hardware that supports SGX v1.
    Ratel design makes several changes to DynamoRIO core (e.g., memory management, lock manager, signal forwarding). We discuss three high-level implementation challenges that arise in implementing our design while eliding lower-level challenges here for brevity. The root cause of our highlighted challenges is the way Intel SDK and PSW expose hardware features and what DynamoRIO expects.
    Self-identifying Load Address. The vanilla DynamoRIO engine needs to know its own start location in memory to avoid overlapping its own address space with that of the target application, and so it uses a hard-coded address. Since our modifications change such hard-coded address assumptions, Ratel uses a \( {\tt call-pop} \) instruction sequence to self-identify the runtime location in memory for the DynamoRIO engine [70, 87], aligns it at a page boundary, and updates the DynamoRIO logic to use code location-independent addressing.
    Setting SSA Slots. The vanilla SGX SDK and PSW use just two SSA frames: one is used to process timer interrupts specially and the other to handle all other interrupts. As explained in Section 4.5, the Ratel design aims to set the effective nesting depth to 2. Therefore, in the implementation, it uses three SSA slots: one reserved for the timer interrupt (to be handled by the SGX SDK), and the remaining two as described in Section 4.5. The timer interrupt handler simply routes control to whichever signal handler was interrupted. The SGX specification allows setting the required SSA slots by changing the \( {\tt NSSA} \) field in our SDK implementation.
    Preserving Execution Contexts. For starting execution of a newly created thread, Ratel invokes a pre-declared \( {\tt ECALL} \) to enter the enclave. This is a nested ECALL, which is not supported by SGX SDK. To allow it, we modify the SDK to facilitate the entrance of child threads and initialize the thread data structure for it. Specifically, we check if the copy of thread arguments inside the enclave matches the ones outside before resuming thread execution. We save specific registers so that the thread can exit the enclave later. Note that the child thread has its own execution path differentiating from the parent one, Ratel hence bridges its return address to the point in the code cache that a new thread always starts. After the thread is initialized, we explicitly update DynamoRIO data structures to record the new thread (e.g., the TLS base for application libraries). This way, DynamoRIO is aware of the new thread and can control its execution in the code cache.
    Propagating Implicit Changes and Metadata. Thread uses \( {\tt exit/exit\_group} \) syscall for terminating itself. Then the OS zeros out the child thread ID ( \( {\tt ctid} \) ). In Ratel, we explicitly create a new thread inside the enclave, so we have to terminate it explicitly by zeroing out the pointers to the IDs. Further, we clean up and free the memory associated with each thread inside and outside the enclave.
    Built-in Profilers. DynamoRIO supports two modes of instrumentation—built-in profilers and client plugins. Profilers are readily available with the DynamoRIO core and provide basic functionalities such as instruction tracing, logging, and tuning the DynamoRIO parameters (see Table 12). However, DynamoRIO clients are specific instrumentation plugins designed to perform user-desired tasks (e.g., Shadow stack). Since profilers are part of DynamoRIO, Ratel supports all of them out-of-the-box. Our current implementation removes client plugin support to reduce TCB. We demonstrate Ratel compatibility with these profilers in Section 6.4.

    6 Evaluation

    We evaluate the following properties of Ratel empirically:
    Binary compatibility. How well does Ratel provide binary compatibility with common Linux programs on SGX?
    TCB. What is the size of the TCB for Ratel?
    Performance. How much overhead does Ratel introduce for applications?
    Graphene-SGX. Does Ratel compare in compatibility and performance to Graphene-SGX, the state-of-the-art library OS for SGX?
    Instrumentation capability. What kinds of low-level monitors does Ratel provide for end applications?
    Setup. All our experiments are performed on a Lenovo machine with SGX v1 support, 128 MB EPC of which approximately 90 MB is available for user-enclaves, 12 GB RAM, 64 KB L1, 256 KB L2, 4096 KB L3 cache, 3.4 GHz processor speed. We use Ubuntu 16.04, Intel SGX SDK v2.1, PSW v2.1, driver v1.5, DynamoRIO v6.2.17. All performance statistics reported are the geometric mean over 5 runs. To foster open science and reproducibility, we have made our implementation and evaluation public. Ratel code-base, our benchmarks, and case-studies are open-source and available [1].
    To compare Ratel’s binary compatibility and performance with other approaches, we have chosen Graphene-SGX, a library OS that runs inside the enclave. Graphene-SGX offers the lowest compatibility barrier of all prior systems to our knowledge, specifically offering compatibility with \( {\tt glibc} \) . It is a mature and publicly available system, which has been maintained for over 3 years as of this writing.

    6.1 Compatibility

    To evaluate compatibility, we initially select 310 binaries that cover an extensive set of benchmarks, utilities, and large-scale applications. These are commonly reported to be used as evaluations target for DynamoRIO and prior enclave-based systems [16, 25, 30, 34, 47, 76, 77] that we surveyed for our study. Further, they represent a mix of memory-intensive, CPU-intensive, multi-threading, network-intensive, and file I/O workloads. A total of 69 binaries are from micro-benchmarks: 29 from SPEC 2006 (CPU), 1 from IOZone (I/O) v3.487, 9 from FSCQ v1.0 (file API), 21 from HBenchOS v1.0 (system stress-test), and 9 from Parsec-SPLASH2 (multi-threading). We run 12 binaries from three real-world applications—cURL v7.65.0 (server-side utilities), SQLite v3.28.0 (database), Memcached v1.5.20 (key-value store), and nine applications from Privado (secure ML framework). We selected all 229 Linux utilities, which are available from our test system’s \( {\tt /bin} \) and \( {\tt /usr/bin} \) directories. Apart from these 310 binaries, we tested two language runtimes, Python (v2.7.17 and v3.8.2) and R v3.6.3. Python is tested with Python-CPU-Benchmark [6], Python Programming Examples [8], and PyBench v2.0 [7]. R is tested with the R-benchmark-25 [3].

    6.1.1 Compatibility Gains.

    Python and R benchmarks run out-of-the-box on Ratel, highlighting how multiple languages can be supported without any special handling. The R interpreter has JIT enabled, which constitutes an example of how Ratel can handle dynamically generated code gracefully, whereas the Python interpreter is bytecode-interpretation based. For the remaining 310 benchmarks and applications, we download the source code and compile it with default flags required to run them natively on our machine. We directly use the existing binaries for Linux utilities. We test the same binaries on native hardware, with DynamoRIO, and with Ratel without changing the original source code or the binaries. Of 310 binaries, 272 targets execute successfully with the native Linux and with the (unmodified) vanilla DynamoRIO. The remaining 38 binaries either use unsupported devices (e.g., NTFS) or do not run on our machine. We discard them from our Ratel experiments, since vanilla DynamoRIO also does not work on them. Of the remaining 272 binaries that work on the baselines, Ratel has support for the system calls used by 208 of these. Ratel runs them out-of-the-box with no additional porting effort.
    System Call Support & Coverage. Ratel supports a total of \( 212/318 \) ( \( 66.66\% \) ) syscalls exposed by the Linux Kernel. We emulate six syscalls purely inside the enclave and delegate 193 of them via OCALLs. For the remaining 13, we use partial emulation and partial delegation. Table 2 gives a detailed breakdown of our syscall support. Syscall usage is not uniform across frequently used applications and libraries [80]. Hence, we empirically evaluate the degree of expressiveness supported by Ratel. For all of the 272 binaries in our evaluation, we observe a total of 121 unique syscalls are used by the benchmarks. Ratel supports 115 of them. Table 2 shows the syscalls supported by Ratel and their usage in our benchmarks and real-world applications (see Section 6.1.2). Figures 6(a) and 6(b) show the distribution of unique syscalls and their frequency as observed over binaries supported by Ratel. Thus, our empirical study shows that Ratel supports \( 115/121 \) ( \( 95.0\% \) ) syscall observed in our benchmark programs.
    Fig. 6.
    Fig. 6. System calls statistics over all 208 binaries. (a) Unique syscalls for each binary; and (b) frequency per syscall.
    Table 2.
    SubsystemTotalImplImplementationCovered
    DelEmuP.EmuDR + Binaries
    Process1284223
    Filename based3725250016
    Signals1273406
    Memory18106044
    Inter process communication1244000
    File descriptor based6553480530
    File name or descriptor based1999005
    Networks1917150215
    Misc12479790036
    Total318212193613115
    Table 2. Ratel Syscall Support
    Columns \( 2\text{ and }3 \) : total Linux system calls and support in Ratel. Columns \( 4\text{--}6 \) : syscalls implemented by full delegation, full emulation, and partial emulation, respectively. Column 7: syscalls tested in Ratel.
    To support 212 syscalls, we added 3,233 LoC (15 LoC per syscall on average). In the future, Ratel can be extended to increase the number of supported syscalls. Readers are referred to Section 6.1.2 for more details. Ratel handles 31 of the 32 standard Linux signals—the SIGPROF signal is not handled, which vanilla DynamoRIO itself does not support.
    Library vs Binary Compatibility. We maintain full binary compatibility with all 208 binaries tested, for which we had system call support in Ratel. For them, Ratel works out-of-the-box in our experiments. We report that, given the same inputs as native execution, Ratel produces the same outputs. The advantage of Ratel is that it makes no assumptions about which specific implementation or version of \( {\tt libc} \) or higher-level API the application uses. To validate that this assumption is indeed empirically preserved, we test Ratel with binaries that use different \( {\tt libc} \) implementations. Specifically, we compile HBenchOS benchmark (12 binaries), which is an OS system stress testing benchmark, with two different \( {\tt libc} \) versions: \( {\tt glibc} \) v2.23 and \( {\tt musl} \) \( {\tt libc} \) v1.2.0. We report that Ratel executes these benchmarks out-of-the-box with both the libraries, without any modification or specialization to Ratel implementation.
    Last, as a point of comparison, we report our experience on porting our micro-benchmarks to the state-of-the-art library-compatibility system for SGX (Graphene-SGX) in Section 6.1.3. Of the 75 programs tested, Graphene-SGX fails on 13. Ratel works correctly for all except 1, which failed only due to the virtual memory limits of SGX hardware.

    6.1.2 Detailed Breakdown of Compatibility Tests.

    We provide a detailed breakdown of the compatibility observed for our Linux utilities and other benchmarks.
    Linux Utilities. Our tests include all the Linux built-in binaries available on our experimental Ubuntu system. These comprise 229 shared-objected binaries in total, which are typically are in the directories \( {\tt /bin} \) and \( {\tt /usr/bin} \) .
    We run each utility with the options and inputs, representative of their common usage. The specifics of the input configurations are reported as a script in our released system publicly. Of 229 benchmarked utilities, 195 worked with our test machine natively and with vanilla DynamoRIO. Of these 195 binaries, a total of 138 have all system calls presently supported in Ratel, all of which worked correctly in our tests out-of-the-box. The 57 programs that did not work fail for two reasons: missing syscall support and virtual memory limits imposed by SGX. Tables 3 and 4 list all Linux utilities and binaries from real-world applications and benchmarks that ran successfully, and present the number of unique system calls for each. Tables 5 and 6 summarize the reasons for all binaries that fail in Ratel and in native and DynamoRIO, respectively.
    Table 3.
    Utility# of sys.Utility# of sys.Utility# of sys.Utility# of sys.Utility# of sys.Utility# of sys.
    ed18ppdpo29dirmngr21hcitool20systemctl32systemd-cgtop25
    cvt13psnup14enchant20bluemoon21vim.basic45dirmngr-client13
    eqn36t1asm13epsffit13btattach21hciconfig25systemd-escape18
    gtf18troff15faillog18fwupdate19brltty-ctb21systemd-notify19
    pic13uconv13gendict14gatttool24fusermount19wpa_passphrase18
    tbl13bccmd25hex2hcd18gencnval13journalctl26gamma4scanimage13
    xxd14btmgmt24icuinfo15lessecho13sudoreplay16systemd-analyze31
    curl32busctl37kbxutil14loginctl31watchgnupg18systemd-inhibit31
    derb23catman16lastlog14makeconv14xmlcatalog12systemd-resolve31
    find27cd-it821lesskey13ppdmerge26zlib-flate13ulockmgr_server18
    gawk25expiry14lexgrog15psresize14cupstestdsc26systemd-tmpfiles34
    grep21genbrk14manpath14psselect14cupstestppd18gpg-connect-agent22
    htop26gencfu13obexctl35t1binary13hostnamectl30kerneloops-submit20
    kmod17grotty13pkgdata14t1binary13systemd-run29evince-thumbnailer21
    ppdc30l2ping21ppdhtml27t1disasm13timedatectl32fcitx-dbus-watcher18
    ppdi30l2test27preconv15t1disasm13brltty-trtxt22systemd-detect-virt18
    qpdf14psbook14sdptool22transfig15dbus-monitor31dbus-cleanup-sockets19
    gpg227pstops14ssh-add20vim.tiny34dbus-uuidgen13systemd-stdio-bridge25
    wget29rctest25t1ascii13dbus-send30fcitx-remote23systemd-ask-password20
    btmon23soelim12udevadm27gpg-agent18gpgparsemail14webapp-container-hook26
    genrb13whatis20volname15hciattach18systemd-hwdb17systemd-machine-id-setup18
    grops14rfcomm19xmllint15localectl30systemd-path18systemd-tty-ask-password-agent25
    mandb27bootctl19ciptool20pg_config16enchant-lsmod13dbus-update-activation-environment31
    Table 3. List of GNU Utilities (138) Tested with Ratel and the Number of Unique System Calls Invoked in their Single Execution
    Table 4.
    App# of sys.App# of sys.App# of sys.App# of sys.App# of sys.App# of sys.
    gcc43dealII43leslie3d43xalancbmk49bw_mmap_rd42water_spatial44
    fmm45soplex43calculix41LFS-write43resnet50app41inceptionv3app42
    curl55povray43GemsFDTD42multiopen41densenetapp42multicreatemany41
    milc42barnes45specrand42multiread41multicreate47multicreatewrite41
    namd42iozone47specrand42Memcached59lat_fslayer42lat_syscall(sbrk)42
    bzip248lat_fs41lenetapp41radiosity44lat_connect42lat_syscall(write)42
    gobmk42bw_tcp42vgg19app43ocean_ncp44resnext29app42lat_syscall(getpid)42
    hmmer41h264ref42raytrace45bw_mem_cp42resnet110app41lat_syscall(sigaction)42
    sjeng42omnetpp43ocean_cp45bw_mem_rd42wideresnetapp43lat_syscall(getrusage)43
    tonto44gromacs46volerand45bw_mem_wr42squeezenetapp41lat_syscall(gettimeofday)42
    astar43lat_sig44bw_bzero42libquantum42LFS-smallfile49  
    sqlite47lat_tcp42lat_mmap42multiwrite41LFS-largefile42  
    zeusmp43lat_udp42cactusADM43bw_file_rd42water_nsquare44  
    Table 4. List of Applications (12) and Individual Benchmarks (63) Tested with Ratel and the Number of Unique System Calls Invoked in Their Single Execution
    Table 5.
    Reason category# unsuccessfulCase examples
    fork49strace, scp, lat_proc and lat_pipe from HBenchOS, etc.
    execv1systemd-cat
    signal5colormgr, cd-iccdump, bluetoothctl, etc.
    Unsupported syscalls6webapp-container, webbrowser-app, etc.
    Out-of-memory3shotwell, mcf from SPEC 2006, lat_memsize from HBenchOS
    Table 5. Summary of the Reasons for Failure of All 64 Unsuccessful Binaries Tested with Ratel
    Table 6.
    Reason category# unsuccessfulCase examples
    NTFS related16ntfs-3g, ntfs-3g.probe, ntfs-3g.secaudit, etc.
    Printer related7lp, lpoptions, lpq, lpr, lprm, etc.
    Scanner related2sane-find-scanner, scanimage
    Failure in native run5umax_pp, cd-create-profile, and bwaves from SPEC 2006, etc.
    Failure in DynamoRIO run8ssh, ssh-keygen, dig, etc.
    Table 6. Summary of the Reasons for Failure of all 38 Unsuccessful Binaries Tested with Linux and DynamoRIO
    For the incompatible cases, 45 fail due to lack of multi-processing ( \( {\tt fork} \) ) support in Ratel. As explained in Section 3.4, not supporting \( {\tt fork} \) is an explicit design decision in Ratel. Five utilities use certain POSIX signals, which are outside the 32 standard signals in POSIX, for which presently Ratel has incomplete support (e.g., real-time signals \( \tt {SIGRTMIN+n} \) ). Another five utilities fail, because they invoke other system calls, which the restriction \( R3 \) in SGX fundamentally does not permit (e.g., shared memory syscalls such as \( {\tt shmat} \) , \( {\tt shmdt} \) , \( {\tt shmctl} \) , etc.). These syscalls are not supported in Ratel.5 One utility that fails is because of the virtual memory limit in SGX, as it loads more than 100 shared libraries. The remaining 1 utility fails because Ratel has no support to \( {\tt execv} \) syscall.
    Other Benchmarks & Applications. From the 81 binaries from micro-benchmarks and real applications, 11 do not work with Ratel. Five binaries from HBenchOS( \( {\tt lat\_proc} \) , \( {\tt lat\_pipe} \) , \( {\tt lat\_ctx} \) , \( {\tt lat\_ctx2} \) , \( {\tt bw\_pipe} \) ) either use fork or shared memory system calls disallowed by \( R3 \) . Two binaries ( \( {\tt lat\_memsize} \) from HBenchOS, mcf from SPEC 2006) with DynamoRIO require virtual memory larger than SGX limits on our experimental setup. The remainder (e.g., bwaves from SPEC 2006) fail to run even on the baseline setup of our Linux OS with vanilla DynamoRIO.

    6.1.3 Comparison to Graphene-SGX.

    Applications using Graphene-SGX have to work only with a specific library interface, namely, a custom \( {\tt glibc} \) , which requires re-linking and build process changes. Ratel, in contrast, has been designed for binary compatibility, which is a fundamental difference in design. To demonstrate the practical difference, we reported in Section 6.1 that HBenchOS benchmark works out-of-the-box when built with both \( {\tt glibc} \) and \( {\tt musl} \) .
    Graphene-SGX requires a manifest file, for each application, that specifies the main binary name as well as dynamic libraries, directories, and files used by the application. By default, Graphene-SGX does not allow creation of new files during runtime. We use the \( {\tt allow\_file\_creation} \) to disable this default. We tested all 75 benchmark and application binaries (HBenchOS, Parsec-SPLASH2, SPEC, IOZone, FSCQ, SQLite, cURL, Memcached, Privado), of which, 62 work with Graphene-SGX. Of the 13 that fail on Graphene-SGX, all except 1 work on Ratel, with the only failure being due to virtual memory limits. For Graphene-SGX, three of nine Parsec-SPLASH2 binaries ( \( {\tt water\_nsquare} \) , \( {\tt water\_spatial} \) , and \( {\tt volrend} \) ), IOZone binary, and SQLite database workload [56] failed due to I/O error, (e.g., Reference [40]), which is an open issue. Three of 24 binaries from SPEC 2006 failed. Graphene-SGX fails for \( {\tt cactusAMD} \) due failed due to a signal failure, which is mentioned as an existing open issue on its public project page [15]. The \( {\tt calculix} \) program fails with a segmentation fault. The \( {\tt omnetpp} \) could not process the input file in spite of making the input file as allowed in the corresponding manifest file. Four networking-related binaries from HBenchOS, namely, \( {\tt lat\_connect} \) , \( {\tt lat\_tcp} \) , \( {\tt lat\_udp} \) , and \( {\tt bw\_tcp} \) could not run, resulting in a \( {\tt bad} \) \( {\tt address} \) error while connecting to localhost. \( {\tt lat\_memsize} \) from HBenchOS fails on Graphene-SGX as it fails on Ratel too due to the virtual memory limit.

    6.1.4 Implications of Two-copy Design.

    The two-copy design employed by Ratel replaces the original reads and writes in the application code with updated memory addresses. As discussed in Section 4.3, this design is safe for all programs, including the ones that have multi-threading, as long as the enclave code is thread-safe. However, Ratel does not guarantee correctness if the original program has race conditions. To demonstrate this, we test synthetic and real-world multi-threaded applications under various race conditions.
    Listing 1.
    Listing 1. Example of code snippets with potential data races. Thread 1 reads data from a file into a global buffer. Thread 2 reads this global buffer at a particular offset and checks if the value is valid. If the check on Line 13 fails, then we flag it as a data race. For Ratel, the one-copy operations happen at line 6 where trusted thread 1 copies data to the trusted enclave memory (e.g., buf). If thread 2 sleeps a little or no time, then it may occur data races, since thread 1 needs time to complete the copy operations.
    First, we hand-coded a synthetic multi-threaded program to study the effect of bad synchronization. Listing 1 shows a snippet for application code that creates two threads to perform different tasks on the same data (e.g., buffer write and buffer read). However, we do not add any explicit locks before accessing the data. Instead, to exhibit thread-unsafe execution, we add \( \tt {sleep} \) statements with varying time duration, specifically 0, 1, 10, 100, and \( 1,\!000 \) ms. We execute these configurations on native Linux, vanilla DynamoRIO, and Ratel for 100 runs. For each configuration, we log the number of data races and report it in Table 7. We observe that shorter the sleep duration, the higher the frequency of data races across all platforms. However, Linux is less prone to race conditions compared to DynamoRIO and Ratel. Thus, for a thread-unsafe program, Ratel two-copy mechanism exacerbates the race-conditions.
    Table 7.
    SettingsFrequency of data races per sleep time
    0 ms1 ms10 ms100 ms1,000 ms
    Linux74%1%000
    DR97%1%000
    Ratel100%100%4%1%0
    Table 7. The Frequency of Data Races for the Example in Listing 1 when Executed on Linux, DynamoRIO, and Ratel
    Next, we select multi-threaded programs: nine individual benchmarks from Parsec-SPLASH2 and a real-world application Memcached. We execute them 25 times with four threads per program on native Linux, vanilla DynamoRIO, and Ratel. For each run, we check the correctness of their outputs. We do not observe any program crashes, non-termination, or incorrect outputs on Ratel. This empirically shows that for thread-safe code, Ratel does not introduce race conditions.
    Finally, we emulate thread-unsafe code in these 10 applications. For each application, we inject one synchronization bug by randomly removing lock operations. We execute the 10 modified applications 25 times with four threads on native Linux, vanilla DynamoRIO, and Ratel. On execution, we observe four different outcomes: correct output, wrong output, crash, and non-termination. As shown in Table 8, three applications are unaffected while other three non-termination on all platforms. The remaining four applications exhibit mixed behavior. More importantly, the execution on Ratel is similar to either native Linux or DynamoRIO, but it is not identical.
    Table 8.
     water_nsquarewater_spatialbarnesfmmraytraceradiosityocean_cpocean_ncpvolrendMemcached
    Linux25R25R25C25T17C \( \slash \) 8R21C \( \slash \) 4R25T25T21T \( \slash \) 4R25R
    DR25R25R21C \( \slash \) 4T25T25R11C \( \slash \) 14R25T25T25T25R
    Ratel25R25R25C25T25R12C \( \slash \) 13R25T25T25T25R
    Table 8. Testing Data Race Impact in Parsec-SPLASH2 and Memcached with One Crafted Synchronization bug (in 25 Runs)
    R denotes correct, W denotes incorrect outputs, C denotes crash, and T denotes non-termination.
    In summary, we empirically validate that if the application is thread safe, Ratel does not introduce any race conditions. However, Ratel does not guarantee race-free execution of applications that have thread-unsafe code. Further, the two-copy mechanism exacerbates the impact of race conditions.

    6.2 TCB Breakdown

    We trust Intel SGX support software (SDK and PSW) that executes inside the enclave and interfaces with the hardware. This choice is the same as any other system that uses enclaves. Ratel comprises one additional trusted component—DynamoRIO. Put together Ratel amounts to \( \text{277,803} \) LoC TCB. This is comparable to existing SGX frameworks that have 100K to 1M LoC [16, 30], but provide library-based compatibility at best.
    Table 9 (columns 2 and 3) summarizes the breakdown of the LoC included in the trusted components of the PSW, the SGX SDK, the DynamoRIO system, as well as the code contributed by each of the sub-systems supported by Ratel. The original DynamoRIO engine comprises \( \text{353,139} \) LoC. We reduce it to \( \text{129,875} \) LoC (trusted) and \( \text{66,629} \) LoC (untrusted) by removing the components that are not required or used by Ratel. Then, we add \( \text{8,589} \) LoC to adapt DynamoRIO to SGX as per the design outlined in Section 4. Apart from this, as described in Section 5, we change the libraries provided by Intel SGX (SDK and PSW) and add \( \text{1,078} \) LoC.
    Table 9.
    FunctionTrustedUntrusted 
    SDK+PSWDRTotalSDK+PSWDRDriverHostTotal 
    Original147,928129,875277,80349,83866,6292,8801,769121,116 
    Loader6916041,6732789N/A332448 
    MM462,2412,287440N/A044 
    Syscalls01,8011,80100N/A1,4321,432 
    Instr0454500N/A2626 
    TLS186078180N/A018 
    Signals2012364371360N/A0136 
    Threading3893937821300N/A157287 
    Sync017317300N/A00 
    Table 9. Breakdown of Ratel TCB
    Of the \( \text{277,803} \) LoC of trusted code, \( \text{123,322} \) LoC is from the original DynamoRIO code base responsible for loading the binaries, code cache management, and syscall handling. A total of \( \text{110,848} \) LoC and \( \text{37,080} \) LoC are from Intel SGX SDK and PSW, respectively. Ratel implementation adds only \( \text{6,553} \) LoC on top of this implementation. A large fraction of our added TCB (27.5%) is because of the \( {\tt OCALL} \) wrappers that are amenable to automated testing and verification [51, 77]. Rest of the \( \text{4,752} \) LoC are for memory management, handling signals, TLS, and multi-threading interface.
    Ratel relies on, but does not trust, the code executing outside the enclave in the host process (e.g., \( {\tt OCALL} \) s). This includes \( \text{2,391} \) LoC changes. We give a detailed breakdown of this in Table 9 (Columns 5–9).

    6.3 Performance Analysis

    We present the performance implications of our design choices made in Ratel. We have two main findings. First, the performance overheads vary significantly based on the application workload. Second, most of the overheads come from SGX restrictions R1–R5 and the enclave physical memory limits specific to our present test hardware. We point out that future SGX implementations may have over a \( 1,\!000\times \) larger private physical memory (1 TB EPC) compared to our test system [4]. Therefore, we expect that the performance bottlenecks due to physical memory limitations can be eliminated and are not fundamental to Ratel design. For completeness, we report Ratel memory footprint and impact of 90 MB limit in Section 6.3.2.
    Methodology for Performance Measurement. For each target binary, we record the execution time in three settings:
    Baseline 1 (Linux). We execute the application binary with the native Linux (without SGX and DynamoRIO).
    Baseline 2 (DynamoRIO). We execute the application binary directly with DynamoRIO (without SGX) on Linux.
    Ratel. We use Ratel to execute the application binary in the enclave. We offset the execution time by deducting the overhead to create, initialize, load DynamoRIO and the application binary inside the enclave, and to destroy the enclave. It is well-known that SGX incurs a high overhead for enclave creation and attestation. However, this is a one-time cost per application. Server-end applications, as studied in this work, have a long execution time and can tolerate high initialization time. To avoid skewing the performance overheads, we deduct the enclave setup and tear-down overheads. This allows us to present a fair comparison of the actual execution overheads of Ratel with respect to Linux and DynamoRIO. Several previous works adopt the same measurement setup [20, 75].
    To measure performance overheads, we collect various statistics of the execution profile of 58 programs in our micro-benchmarks and four real-world applications (12 binaries in total). Specifically, we log the target application LoC, binary size, number of \( {\tt OCALL} \) s, \( {\tt ECALL} \) s, syscalls, enclave memory size, peak virtual memory (VmPeak) for Linux and DynamoRIO, untrusted and trusted VmPeak for Ratel, number of page faults, and number of context switches. We refer readers to Appendix A.1 and A.2 for detailed performance breakdowns. Table 11 provides detailed statistics. We also provide the overheads comparison between our Baseline 2 (DynamoRIO) and Baseline 1 (Linux) in all the related tables and figures to provide a breakdown of the performance overhead.

    6.3.1 Performance Breakdown.

    There are two main avenues of overhead costs we observe.
    First, fundamental limitations of SGX result in increased memory-to-memory operations (e.g., two-copy design) or usage of slower constructs (e.g., spin-locks instead of fast futexes). Our evaluation on system stress workloads for each subsystem measure the worst-case cost of these operations. We report that on an average, SPEC CPU benchmarks result in \( 217.80\% \) and \( 34.91\% \) overheads (Figure 7), when compared to vanilla Linux (without DynamoRIO or SGX) and to DynamoRIO (without SGX) baselines, respectively, while I/O-intensive workloads cost \( 87.5\% \) and \( 66.2\% \) slowdown (Figure 8 for IOZone benchmarks). Further, the performance overheads increases with larger I/O record sizes. The same is observed for HBenchOS binaries as reported in Table 10. The expensive spin-locks incur cost that increases with number of threads (Figure 9 for Parsec-SPLASH2 benchmarks). Overall, we observe that benchmarks that require large memory copies consistently exhibit significant slow-downs compared to others, highlighting the costs imposed by the two-copy design. The cost of signal handling also increases due to added context saves and restores in Ratel, as seen in a dedicated benchmark of HBenchOS (see last two rows in Table 10).
    Fig. 7.
    Fig. 7. Ratel performance for SPEC 2006 (CPU). Vanilla DynamoRIO execution time w.r.t. Linux, Ratel execution time w.r.t. Linux, and Ratel execution time w.r.t. vanilla DynamoRIO; lower value of overheads indicates better performance.
    Fig. 8.
    Fig. 8. Ratel performance for IOZone. Vanilla DynamoRIO bandwidth w.r.t. Linux, Ratel bandwidth w.r.t. Linux, and Ratel bandwidth w.r.t. vanilla DynamoRIO; value close to 0 indicates smaller bandwidth loss and hence better performance.
    Fig. 9.
    Fig. 9. Ratel performance for Parsec-SPLASH2 (multi-threading): Panels (a), (b), (c), and (d) show vanilla DynamoRIO execution time overhead w.r.t. Linux, Ratel execution time overhead w.r.t. Linux, and Ratel execution time overhead w.r.t. vanilla DynamoRIO, with 1, 4, 8, and 16 thread(s), respectively; the data for 2 threads has been included in Table 11; lower value indicates better performance.
    Table 10.
    PropertySub-propertyPerformanceOverhead (in %)
    LinuxDRRatelGraphene-SGXDRRatelR-DRGraphene-SGX
    Memory Intensive Operations Bandwidth (MB/s) More iteration Less Chunk sizeRaw Memory Read17,915.3324976.7324665.0523,210.8939.4237.68-1.2529.56
    Raw Memory Write9,928.4812615.1312580.3612,114.7627.0626.71-0.2822.02
    Bzero Bandwidth62,393.2960877.4265072.4147,844.32-2.434.296.89-23.32
    Memory copy libc aligned41,565.3556883.6760377.0463,423.4436.8545.266.1452.59
    Memory copy libc unaligned9,497.1756270.5261543.8169,444.44492.5548.029.37631.21
    Memory copy unrolled aligned9,221.9612272.9312351.2212,161.0433.0833.930.6431.87
    Memory copy unrolled unaligned9,151.1812279.5512295.4010,079.8634.1934.360.1310.15
    Mmapped Read706.63423.85190.733,814.69-40.02-73.01-55.00439.84
    File Read74.1629.1512.05325.52-60.69-83.75-58.66338.94
    Memory Intensive Operations Bandwidth (MB/s) Less iteration More Chunk sizeRaw Memory Read10,708.6413292.245717.965,310.0624.13-46.6-56.98-50.41
    Raw Memory Write9,008.4210664.764563.603,495.8618.39-49.34-57.21-61.19
    Bzero Bandwidth21,794.5431315.644166.624,046.7943.69-80.88-86.69-81.43
    Memory copy libc aligned12,948.3712969.821570.941,534.590.17-87.87-87.89-88.15
    Memory copy libc unaligned12,870.2613141.001556.581,545.082.1-87.91-88.15-87.99
    Memory copy unrolled aligned6,609.186714.932054.552,009.461.6-68.91-69.40-69.6
    Memory copy unrolled unaligned6,035.195853.262081.051,999.58-3.01-65.52-64.45-66.87
    Mmapped Read4,839.577163.383299.771,454.3048.02-31.82-53.94-69.95
    File Read285.393724.39769.66134.421205169.68-79.33-52.9
    File System Latency(us)Filesystem create2.3732.43115.341,272.861,268.354,766.67255.6653,607.17
    Filesystem delforward0.9418.4133.531,185.101,858.513,467.0282.13125,974.47
    Filesystem delrand0.9221.1737.281,073.692,201.093,952.1776.10116,605.43
    Filesystem delreverse0.9918.3133.131,266.381,749.493,246.4680.94127,817.17
    System Call Latency(us)getpid0.00870.00650.00580.0901-25.12-33.18-10.77938.02
    getrusage0.48310.64017.45040.090332.501,442.211,063.94-81.31
    gettimeofday0.02760.02396.59866.8500-13.2823,842.6727,509.2124,754.86
    sbrk0.00760.00640.00650.0102-16.23-14.921.5633.51
    sigaction0.62692.21012.79030.5904252.56345.1126.25-5.82
    write0.49270.51047.23010.53033.591367.441316.567.63
    Signal Handler Latency(us)Installing Signal0.482.242.790.60365.75480.1124.5524.76
    Handling Signal1.188.8881.580.37652.96,816.84818.69-68.63
    Table 10. Summary of HBenchOS Benchmark Results for Graphene-SGX along with Linux, DynamoRIO, and Ratel
    Second, the current SGX hardware implementation has limited secure physical memory (called the EPC) of 90 MB. Executing anything on a severely limited memory resource results in large slow-downs (e.g., increased page-faults). Further, cost of each page-in and page-out operation itself is higher in SGX because of hardware-based memory encryption. We measure the impact of this limitation by executing benchmarks and applications that exceed the working set size of 90 MB for both data and code. For example, we test varying download sizes in cURL (Figure 11(a)) and database sizes in SQLite (Figure 10(a)). When the data exceeds 90 MB, we observe a sharp increase in throughput loss. Similarly, when we execute varying sizes of ML models that require increasing size of code page memory, we observe increase in page faults and lowered performance (Figure 11(b)). We observe similar loss of latency and throughput when applications reach a critical point in memory usage as in \( {\tt Memcached} \) (Figure 10(b)). Appendix A.1 and A.2 detail the performance breakdown. The memory footprint of our system is reported separately in Section 6.3.2 as well.
    Fig. 10.
    Fig. 10. Ratel performance for SQLite and Memcached. Panel (a) shows SQLite’s average time per operation (micros/op) with increasing database size represented as number of primary keys in thousands (K) across Linux, vanilla DynamoRIO, and Ratel; panel (b) shows the throughput versus latency of Memcached on Linux, vanilla DynamoRIO, and Ratel.
    Fig. 11.
    Fig. 11. Ratel performance for (a) cURL and (b) Privado. Vanilla DynamoRIO execution time w.r.t. Linux, Ratel execution time w.r.t. Linux, and Ratel execution time w.r.t. vanilla DynamoRIO.
    Performance Comparison with Graphene-SGX. The performance overheads in Ratel vary based on workloads. This is observed for Graphene-SGX as well. As a direct point of comparison, we tested HBenchOS—a benchmark with varying workloads—with Graphene-SGX and find similar variation in performance based on the workload. The performance overheads of Graphene-SGX for HBenchOS benchmarks as compared to Linux, DynamoRIO, and Ratel is reported in Table 10. The slowdown in both Ratel and Graphene-SGX systems is comparable for I/O benchmarks, since both of them incur two copies. Graphene-SGX is significantly faster than the DynamoRIO baseline and Ratel for syscall and signal handling, because it implements a library OS inside the enclave and avoids expensive context switches. Ratel delegates most of the system calls to the OS and does not emulate it like Graphene-SGX, offering compatibility with multiple libraries in contrast. Further, Ratel offers instruction-level instrumentation capability. Some performance overheads in Graphene-SGX are expected to be better than Ratel due to the differing design choices. Graphene-SGX does not use spin-locks and tunnels all signal handling through \( {\tt libc} \) as it prioritizes performance over binary compatibility, and has reduced overheads compared to Ratel. However, Ratel offers better binary compatibility as opposed to Graphene-SGX, which provides compatibility with \( {\tt glibc} \) , as shown in Section 6.1.1.

    6.3.2 Effects of Memory Constraints on Ratel.

    On our current experimental setup, SGX has a maximum of 128 MB EPC, i.e., private physical memory, of which approximately 90 MB is available for user-enclaves. Further, the platform supports at most 64 GB of virtual memory per enclave. These limitations adversely impact Ratel performance.
    Physical Memory Footprint. We report the physical memory required to execute each application with Ratel (Column 5, Table 11), the smallest being 236 MB for FSCQ benchmark binaries. Ratel uses this memory for the target application as well as for DynamoRIO binaries and the code cache. Since the EPC size is only 90 MB, executing enclaves with a physical memory footprint larger than this size (236 MB or more in our experiments) causes a high number of page faults (Column 12, Table 11). This is one of the main sources for Ratel performance overheads.
    Table 11.
    Suite NameBenchmark/Application NameCompile StatsRuntime Stats Time (s) Overhead (in %)Time(s)Overhead(in %)
    LOCBinary SizeMem SizeLinux VmP.DR VmP.Trust. VmP.Untru. VmP.Out CallsSys CallsPage FaultsCtx SwtLinuxDRRatelDRRatelR-DR
    SPEC CINT2006astar4,28056 KB237.6821.483,101.62257.94101.3426,561618,555271,5441817.258.7710.7120.9747.7222.01
    bzip25,73473 KB519.05208.983,289.131024248.1326,048618,115443,86923819.2121.7434.4913.1779.5458.99
    gobmk157,6504.4 MB244.8435.703,115.85258.07107.4326,594618,629272,9571500.821.734.37110.98432.93152.60
    hmmer20,680331 KB238.257.933,088.11258.62101.6626,144629,203271,1061390.190.980.85415.79347.37-13.27
    sjeng10,549162 KB355.39178.863,259.00512217.6426,606618,6382,049,4043822.985.084.5770.4753.36-10.04
    libquantum2,61151 KB237.297.323,087.46257.1101.3325,969618,004271,0711500.030.500.471,566.671,466.67-6.75
    h264ref36,097602 KB240.5934.643,114.79259.16102.1827,033619,033272,1002848.8718.2534.83105.75292.6790.16
    omnetpp26,652871 KB239.8220.673,100.81258.4210226,961618,990271,8131510.262.472.72850.00946.1510.12
    Xalan267,3766.3 MB248.7323.533,103.68261.93105.8428,121620,185273,9531980.055.534.051,0960.008,000.00-26.76
    gcc385,7833.8 MB251.1620.173,100.28265.15105.4425,758656,24156,2014545.3812.856.30138.8517.10-51.16
    gromac87,9211.1 MB239.9124.173,104.32259.62102.2826,783654,60055,0196330.482.854.79493.75897.9268.07
    SPEC CFP2006leslie3d2,983177 KB238.8628.493,108.63258.54101.3926,831618,865271,7232045.2518.8021.63258.10312.0014.89
    milc9,580150 KB238.4116.043,096.18256101.4532,551624,587271,5061927.2713.0522.4179.50208.2570.99
    namd3,892330 KB238.4658.143,138.29256101.628,550620,582271,6651738.1318.8619.41131.98138.752.65
    cactusADM60,235819 KB596.88415.053,495.191,027.35454.1927,619619,634370,2171901.135.347.78372.57588.5045.69
    calculix105,1231.8 MB395.25169.343,249.48512208.48273,19629,243313,3131740.033.063.9010,100.0012,900.0027.45
    dealII94,4584.3 MB277.3797.083,177.76515.73138.6126,858618,872273,47124010.2924.6824.73139.84140.330.00
    GemsFDTD4,883440 KB1,021.82841.853,924.162,048883.4825,207617,226366,8891771.245.442.08338.7167.74-61.76
    povray78,6841.2 MB242.7016.543,096.69262.25102.4729,082621,108272,2671660.354.424.721,162.861,248.576.79
    soplex28,282507 KB239.6816.903,097.04256101.7126,880618,909271,8611640.012.072.0820,600.0020,700.000.48
    specrand (998)548.7 KB236.764.253,084.39256.01101.325,863617,897270,9241500.230.350.2752.1717.39-22.41
    specrand (999)548.7 KB236.764.253,084.39256101.325,863617,897270,9901640.210.340.3361.9057.14-2.94
    tonto107,2284.6 MB248.2820.543,100.68264.75105.5730,562622,574273,6691860.436.896.651,502.331,446.51-3.48
    zeusmp19,030280 KB1,357.841,132.984,213.132,051.731,219.4927,163619,2011,755,1304347.5520.6552.03173.51589.14152.43
    IOZONEread/reread26,5451.1 MB241.6448.233,128.37257.86105.1227,254622,79123,7851,2150.060.880.891,366.671,383.331.14
    random r./w.26,5451.1 MB241.6448.233,128.37257.86105.1227,376622,91323,7448440.070.881.091,157.141,457.1423.86
    backward read26,5451.1 MB241.6448.233,128.37257.86105.1227,431622,96823,8541,1590.070.841.381,100.001,871.4364.29
    fwrite/frewrite26,5451.1 MB241.6448.363,128.50257.86105.1227,212622,75024,3175810.070.870.861,142.861,128.57-1.15
    fread/freread26,5451.1 MB241.6448.363,128.50257.86105.1227,223622,76023,7423740.060.860.651,333.33983.33-24.42
    FSCQfscq large file38325 KB236.824.263,084.40256.27101.325,8891,165,892270,9141680.120.473.41291.672,741.67625.53
    fscq small file16119 KB236.824.293,084.43256.19101.3426,352929,795270,9591810.010.340.173,300.001,600.00-50.00
    fscq write file7418 KB236.824.253,084.39256.04101.3262,015930,226270,8671430.010.310.133,000.001,200.00-58.06
    multicreatewrite2011 KB236.814.243,084.39257.5101.365,721969,595270,9692480.110.380.83245.45654.55120.74
    multiopen149.8 KB236.814.243,084.39257.5101.3225,7191,129,593270,8424520.160.572.44256.251,425.00328.07
    multicreate189.9 KB236.814.243,084.39257.5101.355,720949,625270,8662120.070.310.67342.86857.14116.13
    multiwrite169.9 KB236.814.243,084.39257.5101.335,720939,594270,8641520.010.230.302,200.002,900.0028.21
    multicreatemany1911 KB236.814.243,084.39257.5101.345,729959,605271,0341980.070.360.77414.291,000.00115.08
    multiread179.9 KB236.814.243,084.39257.5101.3325,7211,229,595270,9015890.210.623.70195.241,661.90494.86
    PARSEC-SPLASH2water_nsquare2,88546 KB239.2718.073,098.21258.53129.2227,109622,99225,0931990.050.940.881780.001,660.00-6.08
    water_spatial3,65246 KB456.63145.713,225.85514.88256.8627,171622,99161,9921640.041.052.312,525.005,675.00120.00
    barnes4,94246 KB247.1367.423,147.56257.68178.5726,801622,74828,1912870.190.851.02347.37436.8420.14
    fmm7,61164 KB455.38146.113,226.25513.14257.2627,218622,90962,025580.010.972.189,600.0021,700.00125.44
    raytrace200,09192 KB455.63186.003,266.15512.05297.1527,291623,17565,3231940.051.232.592,360.005,080.00110.57
    radiosity21,586230 KB455.6363.153,143.30512.01174.327,609623,11826,1621391.924.305.82123.96203.1335.35
    ocean_cp10,51981 KB238.6331.763,111.91258.31142.9127,234622,99025,480860.051.191.052,280.002,000.00-11.76
    ocean_ncp6,27565 KB238.2540.363,120.51257.11151.5127,052622,94825,362910.051.031.081,960.002,060.004.85
    volrend27,152271 KB238.00146.023,226.17256.01129.1727,309623,16725,0821780.010.750.887,400.008,700.0017.02
    ApplicationsSQLite(10K keys)140,4201.3 MB241.2424.583,104.72256101.39400,5481,818,195272,3231,2615.055.826.9415.2537.4319.24
    cURL (10 MB)22,06430 KB266.0776.773,156.91512.95127.2235,897940,802272,5521,0310.071.781.172,442.861,571.43-34.27
    Memcach.(100K)44,921795 KB1,589.42595.553,547.692,0481,408.991,021,2411,118,691540,649104,7655.285.999.4613.4579.1757.93
    densenetapp12,55132 MB752.04575.703,655.411,028.5614.4127,826616,749123,8943543.747.2512.9093.85244.9277.93
    lenetapp230313 KB237.298.123,088.26256.38101.5926,029616,41121,1663620.010.490.264,800.002,500.00-47.05
    resnet110app9,528110 MB270.4394.423,175.39512134.3827,238696,29123,7162000.341.982.45482.35620.5923.74
    resnet50app2,82698 MB605.50430.673,511.611,025.7470.6226,591616,291136,1392745.097.6411.8150.10132.0254.45
    resnext29app1,753132 MB575.01400.853,481.001,025.08439.9926,410616,728187,2844119.7611.3816.3316.6067.3242.98
    squeezenetapp9144.8 MB242.1559.413,139.55258.15106.0726,258616,29023,0012520.401.221.11205.00177.50-9.02
    vgg19app99077 MB345.78171.653,252.65514.15211.4626,192630,87296,6474020.661.402.44112.12269.7074.29
    wideresnetapp1,495140 MB564.14390.193,470.721,025.43429.7126,352631,004172,71230319.2520.0255.384.00187.69177.00
    inceptionv34,87592 MB656.53481.823,561.961,024520.9626,8621,088,344250,88035511.3913.2524.6316.33116.2484.96
    Table 11. Ratel Statistics for Benchmarks and Real-world Applications
    Columns 3 and 4: total application LoC and binary size. Columns 5: maximum physical memory size (in MB) required to execute each application with Ratel. Columns 6 and 7: peak virtual memory usage (in MB) on Linux and DynamoRIO. Columns 8 and 9: trusted and untrusted peak virtual memory usage (in MB) on Ratel. Columns \( 10\text{--}13 \) : total \( {\tt OCALL} \) s, system calls, page faults, and context switches recorded in one run. Columns \( 14\text{--}16 \) : execution time on Linux, vanilla DynamoRIO, and Ratel. Column \( 17\text{--}19 \) : execution overhead of DynamoRIO w.r.t. Linux, Ratel w.r.t. Linux, and Ratel w.r.t. DynamoRIO. Ratel performs better than DynamoRIO in some cases (denoted by negative overheads)—SGX loads the binaries during enclave attestation, and we do not include the enclave creation time in Ratel execution time.
    Virtual Memory Footprint. We monitor the peak virtual memory usage for Linux and DynamoRIO using \( {\tt ptrace} \) and \( {\tt procmaps} \) . We use \( {\tt sgxtop} \) to monitor the enclave peak virtual memory at runtime (Column 8, Table 11 [9]). Further, we monitor the peak virtual memory used by the untrusted host application corresponding to the enclave (Column 9, Table 11). On average, DynamoRIO incurs a high memory overhead of \( 205\times \) compared to Linux. However, Ratel only imposes \( 24\times \) overhead compared to Linux. There are two reasons why Ratel incurs significantly lower virtual memory usage compared to DynamoRIO.
    First, DynamoRIO reserves 2 GB of heap memory region as a scalability improvement for Linux x64 [5]. Our analysis of DynamoRIO with several binaries shows that this region is rarely used. SGX has a limited EPC and requires pre-specifying maximum heap size. We disable this reservation logic in Ratel to reduce its virtual memory usage and subsequent page faults for loading physical pages. Further, a low memory footprint speeds up the enclave creation and attestation, because SGX has to initialize and measure a smaller memory region. Thus, Ratel virtual memory peak is always smaller than DynamoRIO by at least a margin of 2 GB (see Columns 7 and 8 in Table 11).
    Second, DynamoRIO executes directly on Linux. Thus, it can demand arbitrarily large physical memory (as long as it is available on the RAM). It uses the default Linux memory manager to optimize memory allocations. However, Ratel has to pre-specify the maximum physical footprint before execution and it uses a modified SGX SDK memory manager for its own heap (see Section 4.2). We initialize just enough enclave memory such that Ratel can execute target applications and not any more than that. Such conservative allocation allows Ratel to quickly create and launch enclaves. This explains why Ratel has lower virtual memory peaks (typically between 256 MB to 2 GB) compared to DynamoRIO (1–2 GB, after accounting for 2 GB for the above optimizations). To empirically verify our hypothesis, we perform a controlled experiment. For each binary from SPEC 2006, we run Ratel with increasing size of maximum heap memory (ranging from 256 MB to 4 GB) and measure the virtual memory peak. We report that the virtual memory peak continues to increase with increasing maximum heap size and then it plateaus at a certain point. The plateau point of each binary matches the corresponding DynamoRIO peak. This confirms our claim that Ratel has a smaller virtual memory peak, because we limit the maximum heap size in our configuration. These two phenomena explain why Ratel has a much smaller virtual memory footprint compared to DynamoRIO.
    Note that Ratel additionally incurs virtual memory overheads in the untrusted host application (Column 9, Table 11). The two copy model used in Ratel design accounts for a high virtual memory peak in the untrusted part of the process.
    Code Cache Size. Ratel and DynamoRIO allow users to configure the maximum code cache size via a configuration file before launching the enclave. The cache is used to store basic blocks and traces. At runtime, the DBT engine is allowed to use a cache up to this size. Often, the peak cache size is smaller than the maximum, because the basic blocks and traces may fit in a smaller memory for a given application. We execute applications with different code cache sizes for both DynamoRIO and Ratel. Our tests start with a maximum cache size of 4 KB and we double the size up to 64 MB. For each run, we measure the peak basic block cache and trace cache size in DynamoRIO and Ratel.
    Once the maximum cache size is large enough for the application, both of them execute successfully. Further, the peak size stays constant even if we keep increasing the maximum size. In the case of Memcached executing YCSB workload A, DynamoRIO peaks at 231.24 KB basic block cache and 92.78 KB for the trace cache. Ratel peaks at 201.54 KB and 32.84 KB, respectively. Increasing the cache size beyond the peak value does not improve the performance of DynamoRIO or Ratel. Specifying a large cache size for Ratel results in a larger enclave physical memory. Ratel takes more time to initialize and create the enclave, it also incurs more frequent page faults. Thus, beyond the peak size, these two factors slow-down the application with an increase in the cache size. When we reduce the cache size below the peak value, DynamoRIO suffers an order of magnitude slowdown. This is a well-known and expected behavior [27]. Relatively, Ratel does not suffer such a slowdown, partly because a smaller cache size results in fewer page faults. However, if the specified code cache size is small compared to the peak value, Ratel fails to execute a given application.
    In summary, the limited EPC size in SGX v1 not only results in high execution overhead but also invalidates expected performance gains via a large code cache.

    6.4 Compatibility with Built-in Proilers

    Ratel primarily achieves binary compatibility by leveraging the complete interposition offered by DynamoRIO. Additionally, this instruction-level interposition allows Ratel to monitor various in-enclave behavior (e.g., events, instructions, control-flows, etc.) out-of-the-box. Specifically, DynamoRIO provides 26 built-in profilers for dynamically tracing, analyzing, and fine-tuning the target application. Table 12 summarizes the names and the profiling services they offer. When we run vanilla DynamoRIO on our experiment platform, 25 of 26 profilers work stably. One profiler, -prof_pcs, is unstable and causes time-outs. This is a well-documented issue with DynamoRIO [10, 11]. Of all 25 profilers that work with DynamoRIO, Ratel retains support for all of them. We experimentally demonstrate that Ratel maintains compatibility with built-in profilers. We randomly choose four application binaries from each of the six categories listed in Table 11. We run a total of 24 applications that exhibit diverse execution behavior with all of the 25 profilers. We report that all of 25 profilers worked with all our applications.
    Table 12.
    Built-in ProfilerSupportDescription
    -opt_memoryReduce memory usage, but potentially at the cost of performance.
    -prof_pcsA simple sampler to periodically interrupt DBT engine and query which part of DBT engine was running.
    -stack_size <number>Increase the size of DBT engine’s per-thread stack.
    -signal_stack_size <number>Specify the size of signal handling stack.
    -thread_privateRequest code caches that are private (shared across threads by default) to each thread.
    -disable_tracesDisable trace building (e.g., basic block cache and trace cache), which can have a negative performance impact.
    -enable_full_apiDefault internal options balance performance with API usability.
    -max_bb_instrsStop building a basic block if it hits this application instruction count limit.
    -max_trace_bbsBuild a trace with less than this number of constituent basic block.
    -synch_at_exitIn debug builds, synchronize with all remaining threads at process exit time.
    -syntax_intelOutput all disassembly using Intel syntax rather than the default AT& T-style syntax.
    -tracedump_textA text dump option to output all traces that were created to the log file traces-shared.0.TID.html.
    -tracedump_binaryA binary dump option to output all traces that were created to the log file traces-shared.0.TID.html.
    -tracedump_originsDump only a text list of the constituent basic block tags of each trace to the trace log file.
    -reachable_heapGuarantee all of the heap memory is reachable from the code cache, at the risk of running out of memory.
    -multi_thread_exitAvoid synchronizing with all remaining threads at process exit time.
    -cache_bb_maxSet maximum basic block code cache sizes.
    -cache_trace_maxSet maximum trace code cache sizes.
    -msgbox_mask 0xNControl whether the system waits for a key press, when presenting information.
    -stderr_mask 0xNControl the output to standard error.
    -pause_on_errorSuspend the process so that a debugger can be attached when encountering an assert or crash.
    -debugUse the DBT engine debug library for debugging.
    -loglevel NPrint out a log of DBT engine’s actions. The greater the value of N, the more information the system prints.
    -logmask 0xNSelect which DBT engine modules print out logging information, at the -loglevel level.
    -ignore_assert_list ‘*’Ignore all DBT engine asserts of the form “<file>:1234”.
    -logdir <path>Specify the directory to use for log files.
    Table 12. Names and Description of Built-in Profilers in DynamoRIO that are Available Directly in Ratel
    ✓ indicates that the profiler is supported out-of-the-box in Ratel. ✗ indicates that the profiler is not supported, because it crashes even in vanilla DynamoRIO [10, 11].

    7 Related Work

    7.1 SGX Frameworks

    Several prior works have targeted SGX compatibility. There are two main ways that prior work has overcome these challenges. The first approach is to fix the application interface. The target application is either re-compiled or is relinked to use such interfaces. The approach that enables the best compatibility exposes specific Libc (glibc or musl libc) versions as interfaces. This allows them to adapt to SGX restrictions at a layer below the application. Container or library OS solutions use this to execute re-compiled/re-linked code inside the enclave as done in Haven [20], Scone [16], Graphene-SGX [30], Ryoan [49], SGX-LKL [68], and Occlum [72]. Another line of work is compiler-based solutions. They require applications to modify source code to use language-level interface [41, 62, 75, 86].
    Both style of approaches can have better performance than Ratel, but require recompiling or relinking applications. For example, library OSes like Graphene-SGX and containerization engines like Scone expose a particular \( {\tt glibc} \) and \( {\tt musl} \) version that applications are asked to link with. New library versions and interfaces can be ported incrementally, but this creates a dependence on the underlying platform interface provider, and incurs a porting effort for each library version. Applications that use inline assembly or runtime code generation also become incompatible as they make direct access to system calls, without using the API. Ratel’s approach to handle R1–R5 comprehensively offers complete interposition, without any assumptions about specific interfaces beyond that implied by binary compatibility.
    Security Considerations. As in Ratel, other approaches to SGX compatibility eventually have to use \( {\tt OCALL} \) s, \( {\tt ECALL} \) s, and syscalls to exchange information between the enclave and the untrusted software. This interface is known to be vulnerable [31, 37, 81]. Several shielding systems for file [28, 77] and network IO [18], provide specific mechanisms to safeguard the OS interface against these attacks. For security, defense techniques offer compiler-based tools for enclave code for memory safety [54], ASLR [71], preventing controlled-channel leakage [73], data location randomization [22], secure page fault handlers [66], and branch information leakage [47].
    Ratel uses DynamoRIO’s in-built intra-process isolation primitives for separating application code from DynamoRIO code. It supports multi-threading within an enclave but does not support multi-processing. Recent library OSes such as Occlum support multi-process applications by executing them in the same enclave together with software-based fault isolation (SFI) to isolate within the process boundary. The combined use of SFI with Ratel instrumentation engine, instead of a library OS, constitutes promising future work to support multi-processing.
    Performance. Several other works build optimizations by modifying existing enclave-compliant library OSes. Hotcalls [88] and Eleos [65] add exit-less calls to reduce the overheads of \( {\tt OCALL} \) s. These optimizations are now available in the default Intel SGX SDK.
    Language Runtimes. Recent body of work has shown that executing either entire [85] or partial [33] language runtimes inside an enclave can help to port existing code written in interpreted languages such as Python [60, 69], Java [33], web-assembly [45], Go [44], and JavaScript [46].
    Programming TEE Applications. Intel provides a C/C++ SGX software stack, which includes a SDK and OS drivers for simulation and PSW for running local enclaves. There are other SDKs developed in memory safe languages such as Rust [41, 62, 86]. Frameworks such as Asylo [17], OpenEnclave [64], and MesaTEE [61] expose a high-level front-end for writing native TEE applications using a common interface. They support several back-end TEEs including Intel SGX and ARM TrustZone. Many of the challenges faced by Ratel are common to these frameworks.

    7.2 Future TEEs and SGX v2

    New enclave TEE designs have been proposed [29, 36, 39, 55, 76]. Micro-architectural side-channels [21] and new oblivious execution capabilities [36, 57] are significant concerns in these designs. Closest to our underlying TEE is the recent Intel SGX v2 [13, 58, 89].
    Dynamic Permission Management. On SGX v1, once an enclave is initialized completely, enclave pages can no longer be added, removed, or modified with their permissions. On SGX v2, such restriction has been removed due to new SGX instructions. Thus, Ratel can address R2. Recall that, to dynamically load programs into an enclave, DynamoRIO has to reserve a large block of enclave memory region with full permissions (RWX) during the Ratel enclave setup. This is a common design choice in existing SGX runtimes, such as Occlum and Graphene-SGX. With SGX v2, the RWX code cache region in Ratel can be protected (execute-only) via dynamic permission management.
    Larger EPC. Until recently, SGX v1 only supported 128 MB EPC memory. Intel recently started shipping machines for SGX v1 with 256 MB EPC and publicly released the plans to support 1 TB EPC [4]. With a larger EPC, the performance effect due to R1, R3, R4, and R5 will be greatly improved. Specifically, a larger code cache capacity can speedup Ratel enclave creation and application execution. Moreover, the page swapping frequency will significantly be reduced, since a large memory usage is involved in Ratel (see Columns 8 and 9 in Table 11).
    Dynamic Scaling of Virtual Memory and TCS. SGX v2 introduces several instructions for dynamic enclave memory management. This allows an enclave to perform dynamic heap allocation, stack expansion, and thread context creation [58, 89]. These features can improve performance by alleviating R2. Ratel performs dynamic heap allocation, stack expansion, and TCS management. With SGX v2, Ratel can manage these allocations in an on-demand way instead of using fixed pre-allocated memory. This can save EPC memory and speed up the program execution by reduced page swapping. For dynamic TCS control, Ratel will no longer need multiplexing the limited TCS entries that are fixed at enclave creation time (Section 4.3). Instead, Ratel can create threads on-demand in SGX v2.
    Conditional Exception Handling. SGX categorizes hardware exceptions (AEX events) into two groups: unconditional and conditional exceptions. On SGX v1, only eight exception types (e.g., #DE) can be handled unconditionally while other three (e.g., #GP, #PF, and #CP) are not supported by default in vanilla SDK. Ratel enables these three exceptions by modifying the SGX SDK. On SGX v2, Ratel can configure the enclave parameter (e.g., \( {\tt MiscSelect} \) ) to support these exceptions.
    In summary, SGX v2 features can improve Ratel performance and simplify security. The features can address \( R2 \) to some extent but do not address other restrictions. Thus, Ratel design largely applies to SGX v2 as well.

    8 Conclusion

    We present the design of Ratel, which enables dynamic binary translation inside SGX enclaves. It offers the ability to interpose on all the instruction executed in an enclave, which serves as a foundation for implementing other security monitors to safeguard enclaves from bugs and from the untrusted OS. Ratel also provides the first evidence that binary compatibility with existing Linux software on SGX is feasible. We empirically report on an extensive evaluation with over 200 common Linux applications and multiple scripting language runtimes. Our observations about the restrictive design choices made in SGX may be of independent interest to designers of next-generation enclave systems.

    Availability

    Ratel implementation, including modified Intel SGX SDK, PSW, and driver, is available at https://ratel-enclave.github.io/. Our project webpage and GitHub repository also contains unit tests, benchmarks, Linux utils, and case studies evaluated in this article.

    Footnotes

    1
    We chose not to support fork in Ratel [19].
    2
    Unless stated otherwise, we use the term Intel SGX v1 to refer to the hardware as well as the trusted platform software (PSW) and the trusted software development kit (SDK), as shown in Figure 2.
    3
    Another option is Intel Pin [50], but it is not open-source.
    4
    Vanilla SGX PSW does not provide an API to register for any signals for the enclave. Other frameworks circumvent this limitation by piggybacking on default signal handlers of SGX PSW [30, 72]. For simplicity, we have directly changed the PSW to register our primary signal handler.
    5
    Note that the \( {\tt ioctl} \) syscall involves more than 100 variable parameters. Ratel syscall stubs currently does not cover all of them.

    A Performance Breakdown

    A.1 Detailed Breakdown for Micro-benchmarks

    We measure the performance on diverse workloads to explain the costs associated with executing with Ratel.
    System Stress Workloads. We use HBenchOS [23]—a benchmark to measure the performance of primitive functionality provided by an OS and hardware platform. In Table 10, we show the cost of each system-level operation such as system calls, memory operations, context switches, and signal handling. Memory-intensive operation latencies vary with benchmark setting: (a) when the operations are done with more iterations (in millions) and less memory chunk size (4 KB) the performance is comparable; (b) when the operations are done with less iterations (1 K) and more memory chunk size (4 MB) Ratel incurs bandwidth loss ranging from \( -169.68 \) % to 87.91% and 53.94% to 88.15% over Linux and DynamoRIO, respectively. This happens because when the chunk size is large, we need to allocate and de-allocate memory inside enclave for every iteration as well as copy large amounts of data.
    These file operation latencies match with latencies we observed in our I/O intensive workloads (Figure 8). Specifically, the write operation incurs large overhead. Hence, the \( {\tt create} \) workload incurs 4766.67% and 255.66% overhead over Linux and DynamoRIO, because the benchmark creates a file and then writes predefined sized data to it. Costs of system calls that are executed as \( {\tt OCALL} \) s vary depending on return value and type of the system call. For example, system calls such as \( {\tt getpid, sbrk, sigaction} \) that return integer values are much faster. Syscalls such as \( {\tt getrusage, gettimeofday} \) returns structures or nested structures. Thus, copying these structures back and forth to/from enclaves causes much of the performance slowdown. Ratel has a custom mechanism for registering and handling signal (Section 4.5); it introduces a latency of 480.11% and 6,816.84% with respect to Linux as well as 24.55% and 818.69% with respect to DynamoRIO, respectively. Registering signals is cheaper, because it does not cause a context switch as in the case of handling the signal. Further, after accounting for the \( {\tt OCALL} \) costs, our custom forwarding mechanism does not introduce any significant slowdown.
    CPU-bound Workloads. Ratel incurs \( 217.80\% \) and \( 34.91\% \) overhead averaged over 24 applications from SPEC 2006 [78] with respect to Linux and DynamoRIO, respectively. Table 11 shows the individual overheads for each application with respect to all baselines. From Table 11, we observe that applications that incur higher number of page faults and \( {\tt OCALL} \) s suffer larger performance slow-downs. Thus, similar to other SGX frameworks, the costs of enclave context switches and limited EPC size are the main bottlenecks in Ratel.
    IO-bound Workloads. Ratel performs \( {\tt OCALL} \) s for file I/O by copying read and write buffers to and from enclave. We measure the per-API latencies using FSCQ suite for file operations [32]. Table 11 shows the costs of each file operation and file access patterns, respectively. Apart from the cost of the \( {\tt OCALL} \) , writes are more expensive compared to reads in general; the multiple copy operations in Ratel amplify the performance gap between them. Next, we use IOZone [63], a commonly used benchmark to measure the file I/O latencies. Figure 8 shows the bandwidth over varied file sizes between 16 MB to \( 1,\!024 \) MB and record sizes between 4 KB to \( 4,\!096 \) KB for common patterns. The trend of writes being more expensive holds for IOZone too. Ratel incurs an average slowdown of 87.5% and 66.2% over all operations, record sizes, and file sizes with respect to Linux and DynamoRIO, respectively.
    Multi-threaded Workloads. We use the standard Parsec-SPLASH2 [67] benchmark suite. It comprises a variety of high-performance computing and graphics applications. We use it to benchmark Ratel overheads for multi-thread applications. Since some of the programs in Parsec-SPLASH2 mandate the thread count to be power to 2 (e.g., ocean_ncp), we fixed the maximum number of threads in our experiment to 16. Ratel changes the existing SGX design to handle thread creation and synchronization primitives, as described in Sections 4.3 and 4.4. We measure the effect of this specific change on the application execution by configuring the enclave to use varying number of threads between 1–16. The data for 2 threads is shown in Table 11.
    Figure 9 shows a performance overhead of \( 10,\!156.52 \) % and 92.55% compared to Linux and DynamoRIO, on average, across all benchmarks and thread configurations. Particularly, DynamoRIO imposes an average overhead of \( 5,\!010.07 \) % over Linux with the same setting. For single-threaded execution, on average, Ratel causes an overhead of \( 2,\!050.92 \) % and 3.17% with respect to Linux and DynamoRIO, respectively, while they increase to \( 21,\!238.0 \) % and 159.78% in 16-thread execution. We measure the breakdown of costs and observe that, on average: (a) creating each thread contributes to a fixed cost of 57 ms; (b) shared access to variables becomes expensive by a factor of \( 1\text{--}7 \) times compared to the elapsed time of \( {\tt futex} \) synchronization with increase in number of threads. This is expected, because synchronization is cheaper in Linux and DynamoRIO execution, in which they use unsafe \( {\tt futex} \) primitives exposed by the kernel. However, Ratel uses expensive spinlock mechanism exposed by SGX hardware for security. Particularly, some of the individual benchmarks, such as \( {\tt water\_spatial} \) , \( {\tt fmm} \) and \( {\tt raytrace} \) that involve lots of lock contention events and have extremely high frequency of spinlock calls (e.g., the spinning counts of about \( \text{423,\!000} \) ms in Ratel while the \( {\tt futex} \) calls of about 500 ms in DynamoRIO for the raytrace with eight threads). Thus, they incur large overheads in synchronization.

    A.2 Real-world Case Studies

    We work with 4 representative real-world applications: a database server (SQLite), a command-line utility (cURL), a machine learning inference-as-a-service framework (Privado), and a key-value store (Memcached). These applications have been used in prior work [75].
    SQLite. is a popular database [79]. We select it as a case-study because of its memory-intensive workload. We configure it as a single-threaded instance. We use a database benchmark workload [56] and measured the throughput (ops/sec) for each database operation with varying sizes of the database (total number of entries). Table 11 shows the detailed breakdown of the runtime statistics for a database with \( \text{10,\!000} \) entries. Figure 10(a) shows the average throughput over all operations. With Ratel, we observe a throughput loss of 36.88% and 28.71% on average over all database sizes compared to Linux and DynamoRIO. The throughput loss increases with increase in the database size. The drop is noticeable at \( 500\text{ K,} \) where the database size crosses the maximum enclave size threshold and results in significant number of page faults. This result matches with observations from other SGX frameworks that report SQLite performance [16].
    cURL. is a widely used command line utility to download data from URLs [38]. It is network intensive. We test it with Ratel via the standard library test suite. Table 11 shows detailed breakdown of the execution time on Ratel. We measure the cost executing cURL with Ratel for downloading various sizes of files from an Apache (2.4.41) server on the local network. Figure 11(a) shows the throughput for various baselines and file sizes. On average, Ratel causes a loss of 604.11% and 142.11% throughput as compared to Linux and DynamoRIO. For all baselines, small files (below 100 MB) have smaller download time; larger file sizes naturally take longer time. This can be explained by the direct copying of packets to non-enclave memory, which does not add any memory pressure on the enclave. The only remaining bottleneck in the cost of dispatching \( {\tt OCALL} \) s, which increase linearly with the requested file size.
    Privado. is a machine learning framework that provides secure inference-as-a-service [47]. It comprises of several state of the art models available as binaries that can execute on an input image to predict its class. The binaries are CPU intensive and have sizes ranging from 313 KB to 140 MB (see Table 11). We execute models from Privado on all the images from the corresponding image dataset (CIFAR or ImageNet) and measure inference time. Figure 11(b) shows the performance of baselines and Ratel for nine models in increasing order of binary size. We observe that Ratel performance degrades with increase in binary size. This is expected, because the limited enclave physical memory leads to page faults. Hence, largest model (140 MB) exhibit highest inference time and smallest model (313 KB) exhibit lowest inference time. Thus, Ratel and enclaves in general can add significant overheads, even for CPU intensive server workloads, if they exceed the working set size of 90 MB.
    Memcached. is an in-memory key-value cache. We evaluate it with YCSB’s all four popular workloads A ( \( 50\% \) read and \( 50\% \) update), B ( \( 95\% \) read and \( 5\% \) update), C ( \( 100\% \) read), and D ( \( 95\% \) read and \( 5\% \) insert). We run it with four default worker threads running in Linux, DynamoRIO and Ratel settings. We vary the YCSB client threads with Load and Run operations (to load the data and then run the workload tests, respectively). We fix the data size to \( \text{1,\!000,\!000} \) with Zipfian distribution of key popularity. We increase the number of clients from 1 to 100 to find out a saturation point of each targeted/scaled throughput for the settings. Here, we only present workload A (throughput vs average latency for the read and update); the other workloads display similar behavior.
    As shown in Figure 10(b), the client latencies of the DynamoRIO and Ratel settings for a given throughput are slightly similar until approximately \( \text{10,000} \) ops/s, while it nearly keeps unchanged on Linux until more than \( \text{40,000} \) ops/s. Specifically, Ratel jitters until it achieves maximum throughput around \( \text{17,000} \) ops/s, while DynamoRIO is flat until \( \text{15,000} \) ops/s (the maximum is \( \text{21,\!000} \) operations per second). The shared reason of the deceleration for both is that DynamoRIO slows down the speed of Read and Update. For Ratel, the additional bottleneck is the high frequency of lock contention with spin-lock primitive. For e.g., Ratel costs \( \text{18,320,000} \) ms while DynamoRIO’s the futex calls cost only around 500 ms for a given throughput of \( 10,\!000 \) with 10 clients.

    Acknowledgments

    We thank David Kohlbrenner, Zhenkai Liang, and Roland Yap for their feedback on improving earlier drafts of the article. We thank Shipra Shinde for help on formatting the figures in this article.

    References

    [1]
    Ratel Team. 2020. Dynamic Binary Translation for SGX Enclaves. Retrieved from https://ratel-enclave.github.io/.
    [2]
    SGX 101. [n.d.]. Enclave Layout - SGX 101. Retrieved 01 May, 2020 from http://www.sgx101.com/portfolio/enclave_layout/.
    [3]
    Simon Urbanek. [n.d.]. R Benchmark 2.5. Retrieved from https://www.nscee.edu/R-benchmark-25.R-big.
    [4]
    Dylan Martin. [n.d.]. Intel Xeon Ice Lake CPUs To Get SGX With Expanded Security Features. Retrieved 01 May, 2020 from https://www.crn.com/news/components-peripherals/intel-xeon-ice-lake-cpus-to-get-sgx-with-expanded-security-features.
    [5]
    Derek Bruening. [n.d.]. Make -vm_size 2G by default for 64-bit. Retrieved 01 Sept., 2020 from https://github.com/DynamoRIO/dynamorio/issues/3570.
    [6]
    Alexander Ochs. [n.d.]. Noxmiles/Python-CPU-Benchmark: Small Python CPU Benchmark for Linux Systems over CLI/Terminal. Retrieved 01 Sep., 2020 from https://github.com/Noxmiles/Python-CPU-Benchmark.
    [7]
    Florian Rathgeber. [n.d.]. pybench—PyPI. Retrieved 09 Jan., 2020 from https://pypi.org/project/pybench/.
    [8]
    GeeksforGeeks. [n.d.]. Python Programming Examples–GeeksforGeeks. Retrieved 01 Sept., 2020 from https://www.geeksforgeeks.org/python-programming-examples/.
    [9]
    Fortanix. [n.d.]. sgxtop—Fortanix. Retrieved 01 Sept., 2020 from https://github.com/fortanix/sgxtop.
    [10]
    HEWLETT-PACKARD COMPANY and Massachusetts Institute of Technology. [n.d.]. Using DynamoRIO. Retrieved 01 Sep., 2020 from http://groups.csail.mit.edu/pag/application_communities_cvs/manuals/dynamorio_beta_documentation/using.html.
    [11]
    Toshi Piazza and Toshi Piazza. [n.d.]. “Virtual timer expired” on ARM with -prof_pcs—Issue #2907—DynamoRIO. Retrieved 01 Sep., 2020 from https://github.com/DynamoRIO/dynamorio/issues/2907.
    [12]
    Intel Corporation. 2013. Software Guard Extensions Programming Reference. Retrieved from https://software.intel.com/sites/default/files/329298-001.pdf.
    [13]
    Intel Corporation. 2014. Software Guard Extensions Programming Reference Rev. 2. Retrieved from https://software.intel.com/sites/default/files/329298-002.pdf.
    [14]
    Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. [n.d.]. Control-flow integrity. InProceedings of CCS’05.
    [15]
    Aexerr [n.d.]. AEX vector error graphene. Retrieved from https://github.com/oscarlab/graphene/issues/1155.
    [16]
    Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, Daniel O’Keeffe, Mark L. Stillwell, David Goltzsche, Dave Eyers, Rüdiger Kapitza, Peter Pietzuch, and Christof Fetzer. 2016. SCONE: Secure linux containers with intel SGX. In Proceedings of OSDI’16.
    [17]
    Asylo 2019. Google Asylo: An open and flexible framework for enclave applications. Retrieved from https://asylo.dev/.
    [18]
    Pierre-Louis Aublin, Florian Kelbert, Dan O’Keeffe, Divya Muthukumaran, Christian Priebe, Joshua Lind, Robert Krahn, Christof Fetzer, David M. Eyers, and Peter R. Pietzuch. 2017. TaLoS: Secure and Transparent TLS Termination inside SGX Enclaves. Technical Report. Department of Computing, Imperial College London.
    [19]
    Andrew Baumann, Jonathan Appavoo, Orran Krieger, and Timothy Roscoe. 2019. A fork() in the road. InProceedings of HotOS’19.
    [20]
    Andrew Baumann, Marcus Peinado, and Galen Hunt. 2014. Shielding applications from an untrusted cloud with haven. In Proceedings of OSDI’14.
    [21]
    Thomas Bourgeat, Ilia A. Lebedev, Andrew Wright, Sizhuo Zhang, Arvind, and Srinivas Devadas. 2019. MI6: Secure enclaves in a speculative out-of-order processor. In Proceedings of MICRO’19.
    [22]
    Ferdinand Brasser, Srdjan Capkun, Alexandra Dmitrienko, Tommaso Frassetto, Kari Kostiainen, Urs Müller, and Ahmad-Reza Sadeghi. 2017. DR.SGX: Hardening SGX enclaves against cache attacks with data location randomization. Retrieved from https://abs/1709.09917.
    [23]
    Aaron B. Brown. 2019. HBench-OS Operating System Benchmarks. Retrieved from https://www.eecs.harvard.edu/margo/papers/sigmetrics97-os/hbench/.
    [24]
    Derek Bruening, Evelyn Duesterwald, and Saman Amarasinghe. 2001. Design and implementation of a dynamic optimization framework for windows. In Proceedings of the ACM Workshop on Feedback-Directed and Dynamic Optimization.
    [25]
    D. Bruening and Q. Zhao. 2011. Practical memory checking with Dr. Memory. In Proceedings of CGO’11.
    [26]
    Derek Bruening, Qin Zhao, and Saman Amarasinghe. 2012. Transparent dynamic instrumentation. In Proceedings of VEE’12.
    [27]
    Derek L. Bruening and Saman Amarasinghe. 2004. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.D. Dissertation.
    [28]
    Dorian Burihabwa, Pascal Felber, Hugues Mercier, and Valerio Schiavoni. 2018. SGX-FS: Hardening a file system in user-space with intel SGX. In Proceedings of CloudCom’18.
    [29]
    D. Champagne and R. B. Lee. 2010. Scalable architectural support for trusted software. InProceedings of HPCA’10.
    [30]
    Chia che Tsai, Donald E. Porter, and Mona Vij. 2017. Graphene-SGX: A practical library OS for unmodified applications on SGX. In Proceedings of ATC’17.
    [31]
    Stephen Checkoway and Hovav Shacham. 2013. Iago attacks: Why the system call API is a bad untrusted RPC interface. InProceedings of ASPLOS’13.
    [32]
    Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. 2015. Using crash hoare logic for certifying the FSCQ file system. InProceedings of SOSP’15.
    [33]
    Tsai Chia-Che, Jeongseok Son, Bhushan Jain, John McAvey, Raluca Ada Popa, and Donald E. Porter. 2020. Civet: An efficient Java partitioning framework for hardware enclaves. In Proceedings of USENIX Security’20.
    [34]
    JaeWoong Chung, Michael Dalton, Hari Kannan, and Christos Kozyrakis. 2008. Thread-safe dynamic binary translation using transactional memory. In Proceedings of HPCA’08.
    [35]
    Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. Cryptology ePrint Archive, Report 2016/086. Retrieved from http://eprint.iacr.org/2016/086.
    [36]
    Victor Costan, Ilia Lebedev, and Srinivas Devadas. 2016. Sanctum: Minimal hardware extensions for strong software isolation. In Proceedings of USENIX Security’16.
    [37]
    Jinhua Cui, Jason Zhijingcheng Yu, Shweta Shinde, Prateek Saxena, and Zhiping Cai. 2021. SmashEx: Smashing SGX enclaves using exceptions. In Proceedings of ACM SIGSAC’21. 779–793.
    [38]
    Curl. 2019. curl Home Page. Retrieved from https://curl.haxx.se/.
    [39]
    Andrew Ferraiuolo, Andrew Baumann, Chris Hawblitzel, and Bryan Parno. 2017. Komodo: Using verification to disentangle secure-enclave hardware from software. In Proceedings of SOSP’17.
    [40]
    Filemaperr. [n.d.]. File map error graphene. Retrieved from https://github.com/oscarlab/graphene/issues/433.
    [41]
    Fortanix-rust-sgx. [n.d.]. fortanix/rust-sgx: The Fortanix Rust Enclave Development Platform. Retrieved from https:// github.com/fortanix/rust-sgx.
    [42]
    Tal Garfinkel, Keith Adams, Andrew Warfield, and Jason Franklin. 2007. Compatibility is not transparency: VMM detection myths and realities. InProceedings of HOTOS’07.
    [43]
    Tal Garfinkel, Mendel Rosenblum, and Dan Boneh. 2003. Flexible OS support and applications for trusted computing. In Proceedings of HotOS’03, Michael B. Jones (Ed.).
    [44]
    Adrien Ghosn, James R. Larus, and Edouard Bugnion. 2019. Secured routines: Language-based construction of trusted execution environments. In Proceedings of USENIX ATC’19.
    [45]
    David Goltzsche, Manuel Nieke, Thomas Knauth, and Rüdiger Kapitza. 2019. AccTEE: A webassembly-based two-way sandbox for trusted resource accounting. In Proceedings of Middleware’19.
    [46]
    David Goltzsche, Colin Wulf, Divya Muthukumaran, Konrad Rieck, Peter Pietzuch, and Rüdiger Kapitza. 2017. Trustjs: Trusted client-side execution of Javascript. In Proceedings of the 10th European Workshop on Systems Security.
    [47]
    Karan Grover, Shruti Tople, Shweta Shinde, Ranjita Bhagwan, and Ramachandran Ramjee. 2018. Privado: Practical and Secure DNN Inference with Enclaves. Retrieved from arXiv:1810.00602.
    [48]
    Daniel Gruss, Julian Lettner, Felix Schuster, Olya Ohrimenko, Istvan Haller, and Manuel Costa. 2017. Strong and efficient cache side-channel protection using hardware transactional memory. In Proceedings of USENIX Security’17.
    [49]
    Tyler Hunt, Zhiting Zhu, Yuanzhong Xu, Simon Peter, and Emmett Witchel. 2016. Ryoan: A distributed sandbox for untrusted computation on secret data. In Proceedings of OSDI’16.
    [50]
    Intel. [n.d.]. Pin—A Dynamic Binary Instrumentation Tool. Retrieved from https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool.
    [51]
    Mustakimur Rahman Khandaker, Yueqiang Cheng, Zhi Wang, and Tao Wei. 2020. COIN attacks: On insecurity of enclave untrusted interfaces in SGX. InProceedings of ASPLOS’20.
    [52]
    Vladimir Kiriansky, Derek Bruening, Saman P Amarasinghe, et al. 2002. Secure execution via program shepherding. In Proceedings of USENIX Security’02.
    [53]
    Roland Kunkel, Do Le Quoc, Franz Gregor, Sergei Arnautov, Pramod Bhatotia, and Christof Fetzer. 2019. TensorSCONE: A secure tensorflow framework using intel SGX. Retrieved from arXiv:1902.04413.
    [54]
    Dmitrii Kuvaiskii, Oleksii Oleksenko, Sergei Arnautov, Bohdan Trach, Pramod Bhatotia, Pascal Felber, and Christof Fetzer. 2017. SGXBOUNDS: Memory safety for shielded execution. InProceedings of EuroSys’17.
    [55]
    Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanović, and Dawn Song. 2020. Keystone: An open framework for architecting trusted execution environments. In Proceedings of EuroSys’20.
    [56]
    Leveldb [n.d.]. LevelDB Benchmarks. Retrieved from http://www.lmdb.tech/bench/microbench/benchmark.html.
    [57]
    Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi, Krste Asanovic, John Kubiatowicz, and Dawn Song. 2013. PHANTOM: Practical oblivious computation in a secure processor. InProceedings of CCS’13.
    [58]
    F. McKeen, I. Alexandrovich, I. Anati, D. Caspi, S. Johnson, R. Leslie-Hurd, and C. Rozas. 2016. SGX instructions to support dynamic memory allocation inside an enclave. In Proceedings of HASP’16.
    [59]
    Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V. Rozas, Hisham Shafi, Vedvyas Shanbhogue, and Uday R. Savagaonkar. 2013. Innovative instructions and software model for isolated execution. InProceedings of HASP’13.
    [60]
    Mesa-py [n.d.]. mesalock-linux/mesapy: A Fast and Safe Python based on PyPy. Retrieved from https://github.com/mesalock-linux/mesapy.
    [61]
    Mesatee [n.d.]. MesaTEE: A Framework for Universal Secure Computing. Retrieved from https://mesatee.org/.
    [62]
    Mesatee-sgx [n.d.]. apache/mesatee-sgx: Rust SGX SDK provides the ability to write Intel SGX applications in Rust Programming Language. Retrieved from https://github.com/apache/mesatee-sgx.
    [63]
    W. Norcott and D. Capps. 2019. IOzone file system benchmark. Retrieved from www.iozone.org.
    [64]
    Open-enclave [n.d.]. Open Enclave SDK. Retrieved from https://openenclave.io/sdk/.
    [65]
    Meni Orenbach, Pavel Lifshits, Marina Minkin, and Mark Silberstein. 2017. Eleos: ExitLess OS services for SGX enclaves. InProceedings of EuroSys’17.
    [66]
    Meni Orenbach, Yan Michalevsky, Christof Fetzer, and Mark Silberstein. 2019. CoSMIX: A compiler-based system for secure memory instrumentation and execution in enclaves. In Proceedings of USENIX ATC’19.
    [67]
    Parsec-splash2 2019. The PARSEC Benchmark Suite. Retrieved from https://parsec.cs.princeton.edu/.
    [68]
    Christian Priebe, Divya Muthukumaran, Joshua Lind, Huanzhou Zhu, Shujie Cui, Vasily A Sartakov, and Peter Pietzuch. 2019. SGX-LKL: Securing the host OS interface for trusted execution. Retrieved from https://arXiv:1908. 11143.
    [69]
    Python-sgx [n.d.]. adombeck/python-sgx: Python interface to the SGX SDK. Retrieved from https://github.com/adombeck/python-sgx.
    [70]
    Prateek Saxena and Prateek Saxena. 2007. Static Binary Analysis And Transformation For Sandboxing Untrusted Plugins. Master’s Thesis. Computer Science Department, Stony Brook University. Section 3.1.4.
    [71]
    Jaebaek Seo, Byoungyoung Lee, Seong Min Kim, Ming-Wei Shih, Insik Shin, Dongsu Han, and Taesoo Kim. 2017. SGX-shield: Enabling address space layout randomization for SGX programs. In Proceedings of NDSS’17.
    [72]
    Youren Shen, Hongliang Tian, Yu Chen, Kang Chen, Runji Wang, Yi Xu, Yubin Xia, and Shoumeng Yan. 2020. Occlum: Secure and efficient multitasking inside a single enclave of intel SGX. InProceedings of ASPLOS’20.
    [73]
    Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. 2017. T-SGX: Eradicating controlled-channel attacks against enclave programs. In Proceedings of NDSS’17.
    [74]
    Shweta Shinde, Zheng Leong Chua, Viswesh Narayanan, and Prateek Saxena. 2016. Preventing page faults from telling your secrets. In Proceedings of AsiaCCS’16.
    [75]
    Shweta Shinde, Dat Le Tien, Shruti Tople, and Prateek Saxena. 2017. Panoply: Low-TCB linux applications with SGX enclaves. In Proceedings of NDSS’17.
    [76]
    Shweta Shinde, Shruti Tople, Deepak Kathayat, and Prateek Saxena. 2015. PodArch: Protecting Legacy Applications with a Purely Hardware TCB. Technical Report. National University of Singapore.
    [77]
    Shweta Shinde, Shengyi Wang, Pinghai Yuan, Aquinas Hobor, Abhik Roychoudhury, and Prateek Saxena. 2020. BesFS: A POSIX filesystem for enclaves with a mechanized safety proof. In Proceedings of USENIX Security’20.
    [78]
    SPEC 2006 Benchmarks 2019. SPEC 2006 Benchmarks. Retrieved from https://www.spec.org/.
    [79]
    Sqlite 2019. SQLite Home Page. Retrieved from https://www.sqlite.org/index.html.
    [80]
    Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E. Porter. 2016. A study of modern linux API usage and compatibility: What to support when you’re supporting. In Proceedings of EuroSys’16.
    [81]
    Jo Van Bulck, David Oswald, Eduard Marin, Abdulla Aldoseri, Flavio D. Garcia, and Frank Piessens. 2019. A tale of two worlds: Assessing the vulnerability of enclave shielding runtimes. InProceedings of CCS’19.
    [82]
    A. Vasudevan and R. Yerraballi. [n.d.]. Cobra: Fine-grained malware analysis using stealth localized-executions. In Proceedings of SP’06.
    [83]
    Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. 1993. Efficient software-based fault isolation. In Proceedings of SOSP’93.
    [84]
    Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, and Abhik Roychoudhury. 2019. oo7: Low-overhead Defense against Spectre Attacks via Program Analysis. Retrieved from arXiv:1807.05843.
    [85]
    Huibo Wang, Erick Bauman, Vishal Karande, Zhiqiang Lin, Yueqiang Cheng, and Yinqian Zhang. 2019. Running language interpreters inside SGX: A lightweight,legacy-compatible script code hardening approach. InProceedings of AsiaCCS’19.
    [86]
    Huibo Wang, Pei Wang, Yu Ding, Mingshen Sun, Yiming Jing, Ran Duan, Long Li, Yulong Zhang, Tao Wei, and Zhiqiang Lin. [n.d.]. Towards memory safe enclave programming with rust-SGX. InProceedings of CCS’19.
    [87]
    Richard Wartell, Vishwath Mohan, Kevin W. Hamlen, and Zhiqiang Lin. [n.d.]. Binary stirring: Self-randomizing instruction addresses of legacy X86 binary code. InProceedings of CCS’12.
    [88]
    Ofir Weisse, Valeria Bertacco, and Todd Austin. 2017. Regaining lost cycles with hotcalls: A fast interface for SGX secure enclaves. In Proceedings ofISCA’17.
    [89]
    Bin Cedric Xing, Mark Shanahan, and Rebekah Leslie-Hurd. [n.d.]. Intel software guard extensions (Intel SGX) software support for dynamic memory allocation inside an enclave. InProceedings of HASP’16.
    [90]
    Yuanzhong Xu, Weidong Cui, and Marcus Peinado. [n.d.]. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In Proceedings of S&P’15.

    Cited By

    View all
    • (2023)A verified confidential computing as a service framework for privacy preservationProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620502(4733-4750)Online publication date: 9-Aug-2023
    • (2023)Confidential Consortium Framework: Secure Multiparty Applications with Confidentiality, Integrity, and High AvailabilityProceedings of the VLDB Endowment10.14778/3626292.362630417:2(225-240)Online publication date: 1-Oct-2023
    • (2023)SoK: A Systematic Review of TEE Usage for Developing Trusted ApplicationsProceedings of the 18th International Conference on Availability, Reliability and Security10.1145/3600160.3600169(1-15)Online publication date: 29-Aug-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Privacy and Security
    ACM Transactions on Privacy and Security  Volume 25, Issue 4
    November 2022
    330 pages
    ISSN:2471-2566
    EISSN:2471-2574
    DOI:10.1145/3544004
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 July 2022
    Online AM: 02 May 2022
    Accepted: 01 April 2022
    Revised: 01 January 2022
    Received: 01 March 2021
    Published in TOPS Volume 25, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Trusted execution environments
    2. trusted computing
    3. TEEs
    4. enclaves
    5. SGX design restrictions
    6. complete interposition
    7. dynamic binary translation
    8. dynamorio
    9. compatibility
    10. instrumentation
    11. porting
    12. lift and shift

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Research Foundation, Prime Ministers Office, Singapore
    • National Cybersecurity R&D Program
    • National Cybersecurity R&D Directorate
    • National Science Foundation
    • Center for Long-Term Cybersecurity

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,157
    • Downloads (Last 6 weeks)93
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A verified confidential computing as a service framework for privacy preservationProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620502(4733-4750)Online publication date: 9-Aug-2023
    • (2023)Confidential Consortium Framework: Secure Multiparty Applications with Confidentiality, Integrity, and High AvailabilityProceedings of the VLDB Endowment10.14778/3626292.362630417:2(225-240)Online publication date: 1-Oct-2023
    • (2023)SoK: A Systematic Review of TEE Usage for Developing Trusted ApplicationsProceedings of the 18th International Conference on Availability, Reliability and Security10.1145/3600160.3600169(1-15)Online publication date: 29-Aug-2023
    • (2023)Intel Software Guard Extensions Applications: A SurveyACM Computing Surveys10.1145/359302155:14s(1-38)Online publication date: 17-Jul-2023
    • (2023)A Comparison Study of the Compatibility Approaches for SGX Enclaves2023 IEEE 32nd Asian Test Symposium (ATS)10.1109/ATS59501.2023.10317965(1-6)Online publication date: 14-Oct-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media