Security Analysis of Processor Instruction Set Architecture For Enforcing Control-Flow Integrity
Security Analysis of Processor Instruction Set Architecture For Enforcing Control-Flow Integrity
Security Analysis of Processor Instruction Set Architecture For Enforcing Control-Flow Integrity
ABSTRACT
1. INTRODUCTION
Intel has developed Control-flow Enforcement Technology (CET)
Introduction of security features like No-Execute [1] to prevent
[27] that provides CPU instruction set architecture (ISA)
data execution (DEP), supervisory mode execute prevention
capabilities to defend against Return-oriented Programming (ROP)
(SMEP) [1], supervisory mode access prevention (SMAP) [1], etc.
and call/jmp-oriented programming (COP/JOP) style control-flow
have pushed the state of art of exploitation of software to code-
subversion attacks. This attack methodology uses code sequences
reuse attacks based exploitation techniques like Return-Oriented
in authorized modules with at least one instruction in the sequence
Programming (ROP) [2], Jump-Oriented Programming (JOP) [3]
being a control transfer instruction that depends on attacker-
and Call-oriented programming (COP) [4]. Defending against
controlled data either in the return stack or in a register/memory
such exploits requires prevention of malicious attempts to invoke
for the target address. Attackers stitch these sequences together by
control flows that are not part of the program’s control-flow graph.
diverting the control flow instruction (e.g. RET, CALL, JMP) from
Control- flow Integrity (CFI) [3, 16] proposed that a program
its original target address to a new target (via modification in the
should only execute control flows that are programmed in by the
data stack or in the register or memory used by these instructions).
programmer. Specifically, CFI requires that an indirect branch
This paper describes CET security objectives, threat model and
should only target instructions in the program that have been
various architectural design choices to ensure that the design meets
designated as targets of indirect branches. Consequently, a forward
the security objectives. We conclude the paper with performance
branch through an indirect call or jump should only transfer control
data and related work in this domain.
to valid targets for calls and jumps in the program and a return
instruction should only transfer control to the call site that initiated
CCS CONCEPTS
the call to the procedure being returned from.
• Security and privacy → Security in hardware → Hardware Intel Control-flow Enforcement Technology (CET) is a CPU
security implementation; • Security and privacy → instruction set extension to implement CFI and defend against
Intrusion/anomaly detection and malware mitigation. ROP/JOP style control-flow subversion attacks. It adds the
following capabilities to the Intel instruction set architecture:
KEYWORDS
Control-flow integrity, control flow subversion attacks, shadow • Shadow Stack – return address protection to prevent Return
stack, ROP, JOP, COP. Oriented Programming
• Indirect branch tracking – free branch protection to enforce
ACM Reference format: security properties on Jump/Call Oriented Programming.
Vedvyas Shanbhogue, Deepak Gupta and Ravi Sahita. 2019. Security
analysis of processor instruction set architecture for enforcing control-flow CET shadow stack is a second stack used exclusively during
integrity. In Proceedings of the 8th International workshop on Hardware control transfer operations to store a copy of the return address
and Architectural Support for Security and Privacy (HASP’ 19), June 23,
pointer. The shadow stack is write protected using a new extension
2019, Phoenix, AZ, USA. ACM, New York, NY, USA, 11 pages.
https://doi.org/10.1145/3337167.3337175 to page permissions to prevent software-originated unintended
writes to the shadow stack. The CALL instruction pushes a copy of
the return address on the shadow stack in addition to the legacy
Permission to make digital or hard copies of part or all of this work for personal or behavior of pushing the return address on the data stack. The RET
classroom use is granted without fee provided that copies are not made or instruction is modified to pop the return address from both stacks
distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this and if the two return addresses do not match, causes an exception
work must be honored. For all other uses, contact the owner/author(s). and thus prevents and reports attempts to modify the return address
HASP’19, June 23, 2019, Phoenix, AZ, USA
© 2019 Copyright held by the owner/author(s). Publication rights licensed to
maliciously (or in error).
ACM. CET Indirect branch tracking introduces a new instruction
978-1-4503-7226-8/19/06...$15.00 ENDBRANCH that is used to mark valid code targets for indirect
https://doi.org/10.1145/3337167.3337175
HASP’19, June, 2019, Phoenix, Arizona USA Vedvyas Shanbhogue et al.
call and jumps in the program. If an indirect call or jump targets an • D7: Provide only the necessary (minimal) capabilities for
instruction other than ENDBRANCH, the processor generates an software to be able to impose instruction alignment and return
exception and thus prevents and reports attempts to redirect control address protection for language and usage-specific policies.
flow to unintended code targets in the program. The approach • D8: Preserve the stack and function call ABI
taken here is a coarse-grained forward branch enforcement that can • D9: Must not place any restrictions on common software
be further restricted using software instrumentation and compiler constructs like tail calls, co-routines, etc.
techniques.
CET introduces a new fault-type exception – control protection In section 2, we describe the security model of the shadow stack
(#CP) – to notify privileged software when control flow violations capability and the new instructions introduced to manage the
are detected. The #CP exception also reports an error code to shadow stack. In section 3, we describe the security model of the
notify the reason to the exception handler. indirect branch tracking capability. In section 4, we discuss the
speculation-safe hardening properties of CET. In later sections, we
1.1. Adversary model and CET objectives present the performance evaluation, security metrics, and discuss
We enumerate the following threats (written as adversary related work, with conclusions.
capabilities) to define the scope of mitigations we aim to provide
with CET: 2. CET SHADOW STACK
A shadow stack is a second stack used exclusively for control
• T1: Can find software vulnerabilities that allows the adversary transfer operations to store and to retrieve the return address
to read and write anywhere in software-accessible memory. pointers. The shadow stack is distinct from the ordinary data stack,
• T2: Can discover the complete layout of the address space and holds no data.
(e.g. where stacks, heaps, and images are mapped).
• T3: Can repeat memory reads and writes at will. 2.1. SSP register encoding and type safety
• T4: Can produce stimulus that make code take different paths CET defines a shadow stack pointer (SSP) register that contains the
and observe the state of the program including the stack state linear address of the top of the current shadow stack. Since SSP is
in these paths. a new architectural register and must be protected (per design goal
• T5: Can send data to a “computation server” to compute D1) against an adversary that can use code re-use capability (per
payload needed for subsequent reads/writes. threat T6) to change it. Thus opcode encoding does not allow SSP
• T6: Can perform control transfers to existing code to execute register to be directly encoded as a source, destination or memory
(code re-use) in order to change state of processor registers. operand in instructions other than shadow stack management
instructions. This reduces potential useful code re-use gadgets in
Following assumptions were made about constraints on the program memory and helps mitigate techniques like stack pivoting
adversary: used to subvert control flow.
CET enforces type safety by enforcing that values that are
• R1: Cannot add new code without verification and all existing generated as part of CET instructions or implicit ISA flows (e.g.
code is read-only and not modifiable (e.g. W ⊕ X policy). privilege transitions) are consumed only by complementary CET
instructions or complimentary ISA flows. Examples of such
While defining the technology, we enumerate following design enforcement are cited in section 2.3, 2.4, 2.7 and 2.8.
goals and constraints:
2.2. Shadow stack for each privilege and exception
• D1: Must provide protection mechanism for new architectural delivery
assets (i.e. new hardware register, memory) against code re-
One of the design goals of CET has been to provide support for
use and memory safety errors.
control flow enforcement at each privilege level of processor (D2,
• D2: Must be applicable to CPU-enforced privilege levels
D3 and D4). The x86 segmentation protection mechanism supports
(user/supervisor)
four privilege levels, numbered from 0 to 3. The greater numbers
• D3: Must be applicable to CPU modes used by commodity mean lesser privileges. Most operating systems running on x86
software, such as 32/64-bit, hypervisor, system management architecture only use two privilege levels where the operating
mode, enclaves, etc. system kernel and its services execute at privilege level 0 and the
• D4: Must ensure no loss of control-flow enforcement applications execute in user mode at privilege level 3. CET
protection occurs at transitions points between modes and considers programs executing at privilege level 0, 1 and 2 to be
context switches. equally privileged and supervisory programs and thus when CET is
• D5: Must have minimal overheads on performance, memory enabled for supervisor mode it is active at privilege levels lower
usage and code size growth. than 3. In the x86 architecture, call gates facilitate controlled
• D6: Avoid embedding programming language specific transfer of program control between different privilege levels and
constructs in the ISA when a call gate is used to transfer control to a more privileged
level, the processor automatically switches to the data stack for the
Security analysis of processor instruction set architecture for
HASP’19, June, 2019, Phoenix, Arizona USA
enforcing control-flow integrity
destination privilege level using the pointers to the privilege level shadow stack of the called procedure. The pushes on the shadow
0, 1 and 2 stacks stored in the Task-State Segment (TSS) data stack are always performed as 8 byte pushes. Pushing the previous
structure of the current running task. SSP on the shadow stack prevents a far CALL from being paired
Intel 64 bit architecture supports an interrupt stack table (IST) to with a near return as the addresses on the shadow stack are non-
provide a method for specific interrupts (such as NMI, double- executable (enforcing type safety). For these far transfers, the
fault, and machine check) to always execute with a known good choice to store the Linear Instruction Pointer on the shadow stack
stack. The IST mechanism provides up to seven IST pointers that instead of the logical address Return Instruction Pointer was made
can be selected using a 3-bit IST index from the interrupt-gate- to ensure detection of conditions where the base address of the
descriptor (IDT), which when not 0, is used to select the pointer to Code Segment descriptor may have been changed between a far
the data stack to switch to when delivering those specific CALL and the matching far RET. When the far transfer to higher
interrupts. privilege level originates in user mode i.e. privilege level 3, the
CET extends this automatic stack switching to also switch the SSP processor switches both - the data and shadow stack to that of the
when CET is enabled in supervisor mode. The SSP for the new privilege level. The user mode shadow stack pointer is saved
privilege level 0, 1 or 2 is obtained from one of following new to into a new MSR called IA32_PL3_SSP. The processor does not
model specific registers (MSRs) depending on the target privilege push any return addresses on the new supervisor shadow stack.
level: This follows the trust model where the user level programs have
• IA32_PL2_SSP, if transitioning to ring 2 the supervisor in their trust boundary, so the OS can change the
• IA32_PL1_SSP, if transitioning to ring 1 address to subsequently return to or the SSP of the user space
• IA32_PL0_SSP, if transitioning to ring 0 program. If the far transfer is to 32-bit mode the processor causes a
general protection fault if the SSP is not in the lower 4 GB of the
The collection of these three MSRs is referred to as linear address space. By faulting and not implicitly truncating the
IA32_PL0/1/2_MSR in following sections. When a stack switch SSP to 32 bits the CET architecture avoids any unintended or
occurs through IST mechanism and CET is enabled at privilege malicious aliasing to another shadow stack. A far transfer in the
level of the interrupt handler, a new SSP is selected using the IST x86 architecture may or may not involve a privilege change. When
index from a table of SSPs pointed to by the there is a privilege change it is associated with a stack switch and
IA32_INTERRUPT_SSP_TABLE_ADDR MSR defined by CET. the processor requires the new stack to be 8 byte aligned. When
CET treats the OS kernel as being in the trust boundary of the user there is no privilege change, the processor prior to pushing the
mode programs. Thus CET does not attempt to restrict any control return address information on the shadow stack aligns it to the next
transfers initiated by the OS to user mode programs. In x86 8 byte boundary and zeroes out any alignment hole created to
architecture the ring 0 is the most privileged ring and can invoke avoid unknown data from appearing on the shadow stack. The
privileged instructions. Rings 1 and 2 cannot invoke privileged section 2.7 discusses stack switching on privilege changes.
instructions however they have same memory access privileges as Alignment of shadow stack to 8 byte boundaries and saving the
ring 0. Since they have equal privilege for memory accesses the return address information in 8 byte elements avoids type
CET architecture treats them as being in the “same privilege confusion when transitioning between 64-bit and 32-bit modes.
class”. The architecture therefore enforces the control transfers
between these privileged rings through shadow stacks and indirect 2.4. RET/IRET operation
branch tracking when CET is enabled for supervisor mode. The RET instruction allows near and far returns to match the near
and far versions of the CALL instruction. The IRET instruction
2.3. CALL operation returns program control from an interrupt or exception handler to
In the Intel 64 and IA-32 architecture (x86 architecture), the near the interrupted procedure. When CET is enabled, the near RET
CALL instruction allows control transfers to local procedures instruction pops the return address from both the shadow stack and
within the current code segment. The far call allows control the data stack. If the return address values popped from the two
transfer to procedures in a different code segment and can be used stacks are not equal then the processor causes a control protection
to access operating system procedures. A far call also allows exception (#CP) with error code “NEAR-RET”. When CET is
transitioning to a 32-bit code segment to allow legacy (32 bit) enabled, the far RET and IRET (except when transition to user
binary to co-exist with 64-bit binary in 64-bit mode. The near space) pops the return-SSP, LIP and the CS from the shadow stack.
CALL instruction pushes the Return Instruction Pointer on the data If the CS and LIP do not match the return address as determined by
stack. When CET is enabled, the near CALL additionally pushes popping the CS and Return Instruction Pointer from the data stack,
the Return Instruction Pointer on the shadow stack. The far CALL the processor causes a #CP exception with error code “FAR-
instruction, or an interrupt or exception flow, pushes the Code RET/IRET”. The error code provided with the resulting #CP
Segment (CS) selector and the Return Instruction Pointer on the exception helps identify the type of call frame that caused the fault.
data stack of the called procedure. When CET is enabled, the far If the return was successful then the SSP is set to the return-SSP. If
CALL additionally pushes the CS, the Linear Instruction Pointer a RET or IRET instruction is used to return to user space i.e. to
(LIP is computed by the CPU as the base of the Code Segment privilege level 3, the processor establishes the SSP for the user
descriptor plus the logical address value of the Return Instruction mode using contents of the IA32_PL3_SSP MSR. No return
Pointer) and the SSP at the time of initiating the far transfer on the address verification is done. The OS is allowed to switch shadow
HASP’19, June, 2019, Phoenix, Arizona USA Vedvyas Shanbhogue et al.
stacks and return to any address in user mode. This follows the pages mapped as “supervisor shadow stack” pages in EPT do not
trust model where the user level programs have the OS in their require the EPT to provide write permission. This allows a VMM
trust boundary. to write protect OS/guest supervisor shadow stack pages from CPU
initiated stores as well as device DMA accesses (when the EPT is
2.5. Write protecting the shadow stack shared by the IOMMU).
Adversary model for CET assumes that attacker has capabilities to
perform read and writes at will innumerable number of times (T1 2.6. Shadow stack tokens
and T3) using some memory safety bug. CET further assumes that As stated earlier direct manipulation of SSP register using move
attacker has computational capabilities (T5) to make intelligent instruction is not supported by CET. To allow multiple execution
decisions on exercising memory writes. CET addresses attempts to context within application programs and in operating system, CET
corrupt the shadow stack through malicious writes by exploiting provide mechanisms for saving and restoring shadow stack pointer
vulnerabilities like buffer overflows, use-after-free, etc. by (i.e. SSP register) to and from memory without compromising
extending the page tables such that pages mapped as shadow stack design goals and security properties. In order to establish a new
pages are not writeable by software use of memory store shadow stack in SSP register while continuing to provide
instructions. The CPU enforces that software writes to the shadow guarantees against memory safety bugs, the following properties
stack occur only in the context of a CALL instruction and new are enforced by the CPU:
CET ISA for shadow stack management invoked by software.
Control transfer instructions/flows and shadow stack management • Secure storage: Memory storing shadow stack pointer must
instructions perform loads/stores to the shadow stack. Such be protected against memory safety errors (T1 and T3).
load/stores from control transfer instructions and shadow stack • Immutability: Even if an adversary is able to obtain a write
management instructions are termed as shadow_stack_load and primitive to this secure storage, they shouldn’t be able to
shadow_stack_store (or collectively as shadow_stack_accesses; write any value of their choice. If an adversary changes the
enforcing type safety in accesses) to distinguish them from pointer value, using that value should result in a processor
load/store performed by other instructions like MOV, XSAVES, etc fault and notify the operating system of that violation.
that are performed by software. • One time use: Pointer stored in memory to establish new
shadow stack can be used once and further usage should result
2.5.1. x86 paging protections. CET extends x86 paging into a fault. This prevents any possibility of two execution
architecture to allow pages to be mapped in linear address space as contexts (e.g. two program threads) establishing same shadow
shadow stack pages. A page mapped as not-writeable-but-dirty i.e. stack.
W=0, D=1 is treated by the CPU as a shadow stack page. By using
this software-unused encoding of writeable and dirty attributes we As described earlier, CET shadow stack memory access-control
avoid introducing new paging attribute bits. The chosen page satisfies the property of read-only permissions while still allowing
control bit encodings for shadow stack mappings also ensure that stores using shadow_stack_store primitive in selected architectural
shadow stack pages are not writeable and hence naturally protected flows. This access-control model allows CET to use shadow stack
from unintended or malicious software writes. CET further memory itself as secure storage for shadow stack pointers. To
enforces that shadow_stack_accesses must be to shadow stack enforce immutability, CET enforces that shadow stack pointer
regions by causing a page fault if the shadow stack addresses are itself should be a function of the address on which it is stored. As
not mapped to shadow stack pages. This helps detect any attempts part of the save sequence, shadow stack is first aligned on 8 byte
to pivot the SSP to writeable memory or to overflow/underflow the boundary and then shadow stack pointer is saved on it. To further
SSP beyond the bounds of the current active shadow stack. CET harden type checking and one time usage property, CET uses
also enforces that shadow_stack_accesses from supervisor mode lower 2 bits of the stored shadow stack value for keeping extra
must be to shadow stacks mapped as supervisor pages i.e. using a information to track state of pointer (see section 2.5 and 2.6 for
user shadow stack in supervisor mode is disallowed by the CPU. usage of lower 2 bits). This results in saving the shadow stack
Lastly, CET enforces that paging write protection (CR0.WP) pointer in a specific format - collectively these stored shadow stack
cannot be disabled when CET is enabled to prevent unintended pointer formats are called as ‘shadow stack tokens’.
writes to shadow stack by disabling paging write protection. Section 2.7 and 2.8 describes different form of shadow stack
tokens and their usage in shadow stack switching mechanisms.
2.5.2. Second level page table protection. OS/supervisor shadow
stacks can be write-protected using the extended page tables (EPT) 2.7. Processor initiated stack switch
established by a virtual machine manager (VMM) by using a new
The OS is required to program the IA32_PL0/1/2_MSRs to point to
EPT attribute “supervisor shadow stack” to designate the (guest)
the bottom of the supervisor shadow stacks of the current task and
physical pages used by the OS for shadow stacks as supervisor
to ensure that no two logical processors have the MSRs pointing to
shadow stack pages. When this functionality is enabled
the same shadow stack. However, the operating system may
shadow_stack_accesses to supervisor shadow stacks are only
context switch IA32_PL0/1/2_SSP MSRs and save them in
allowed to (guest) physical pages mapped as “supervisor shadow
memory where they are susceptible to being modified (adversary
stack pages” under EPTs by the VMM. Shadow stack writes to
capabilities T1 and T3). Likewise shadow stacks pointers in
Security analysis of processor instruction set architecture for
HASP’19, June, 2019, Phoenix, Arizona USA
enforcing control-flow integrity
• Loads the “supervisor shadow stack token” from the address The load in step 1 and store in step 4 are done as a shadow stack
in IA32_PL0/1/2_SSP. accesses to ensure that the address points to a page mapped as a
shadow stack page. The steps 1 through 4 are done as an atomic
• Verifies the busy bit and all reserved bits in the token is 0.
transaction to avoid TOCTOU issues. The checks 2 and 3 when
This prevents a given shadow stack from being made active
successful indicate that the SSP is at the bottom of the shadow
on two logical processors simultaneously.
stack i.e. there are no valid call frames on the shadow stack. If
• Verifies that the address programmed in the MSR matches the
there are valid call frames on the shadow stack then the shadow
address in the “supervisor shadow stack token”.
stack remains busy.
• If the checks 2 and 3 are successful then the busy bit in the
token is set to 1 and the processor switches the SSP to the 2.8. Shadow stack management instructions
value specified in the IA32_PL0/1/2_SSP MSRs.
Shadow stack management instructions provide controlled and safe
The load in step 1 and store in step 4 are done as shadow stack ways to manipulate SSP to implement common software constructs
accesses to ensure that the address points to a page mapped as a like stack unwinding, thread switching, etc. The following
shadow stack page. Step 3 ensures that the address in the MSR is descriptions group the instructions by their typical usage.
pointing to the bottom of the shadow stack i.e. an empty shadow 2.8.1. Stack unwinding.
stack. This check relies on the property that an 8 byte aligned Like the data stack, the shadow stack grows from high to low
location on the shadow stack having a value that is the address of address and thus unwinding the shadow stack involves
that 8 byte location never occurs on a shadow stack except when incrementing the SSP. To support unwinding the shadow stack, the
created by the OS by storing this “supervisor shadow stack token”. RDSSP instruction may be used to read the contents of the SSP as
The steps 1 through 4 are done as an atomic transaction to avoid needed in the program – for example by the setjmp function. To
TOCTOU issues. If the checks 2 or 3 fail then the busy bit is not unwind to the snapshot, the INCSSP instruction can be used – for
set and a general protection (#GP) exception is caused. example by the longjmp function – to unwind the current SSP to
Figure 1 illustrates this token check to make the shadow stack the value recorded at the previous snapshot. Since the shadow
active. In this example, the IA32_PL0_SSP MSR points to address stack only holds return addresses the number of bytes to unwind is
0xFF8. The token check loads the 8 byte token at address 0xFF8 usually small. For example, to unwind from a call depth of 100
and verifies that busy bit is 0 and that the address in the token functions the INCSSP instruction would be invoked with operand
matches the address in the MSR. As the token check succeeds, the 100. Here are summary descriptions of INCSSP and RDSSP.
busy bit in the token is set to 1 and the SSP is now updated to point
to 0xFF8 making this shadow stack active, Next push on this • INCSSP – increment the SSP by ‘n * operand size of shadow
shadow stack saves at the address 0xFF0. stack’, where n is an 8 bit operand. The instruction does a
‘pop-and-discard’ on the first and last frame in the range. The
HASP’19, June, 2019, Phoenix, Arizona USA Vedvyas Shanbhogue et al.
‘pop-and-discard’ and the restriction of ‘n’ to be at most 255 • Loads the “shadow stack restore token” from the address in
prevents using INCSSP to roll off one shadow stack into specified as the memory operand.
another by skipping over an intervening guard page. • Verifies reserved bits in the token are 0.
• RDSSP – instruction used to read the contents of the SSP • Verifies that the SSP recorded in bits 63:2 of the token is 8
register into a GPR. bytes or 12 bytes higher than the address of the token.
• Verifies that if the current mode of the machine is 64-bit then
2.8.2. Software initiated stack switching. the bit 0 is 1 else it must be 0.
Stack switching is required when the OS scheduler schedules a • If the checks 2, 3 and 4 succeed replaces the “shadow stack
new task and switches from the current task stack to the next task restore token” on shadow stack with a “previous SSP token”
which records the SSP active when the RSTORSSP
stack. Similar thread switching may be performed in user space to
support user space thread schedulers and co-routines. The instruction was invoked
RSTORSSP and SAVPREVSSP instructions are provided to • Switches SSP to the value address of the token such that now
perform the stack switching in a controlled manner. When the the “previous SSP token” is at the top of the stack.
scheduler switches away from an active shadow stack and later
switches back to that shadow stack, CET ensures that the SSP The load in step 2 and store in step 6 is done as
established is same as at the time of switching away. shadow_stack_accesses to ensure that the address points to a page
The shadow stack switching sequence is a two-step process; mapped as a shadow stack page. The steps 2 through 6 are done as
execute RSTORSSP to verify and switch to the new shadow stack, an atomic transaction to avoid TOCTOU issues. The property
then execute SAVEPREVSSP to record a restore point on the old verified by step 1 and 4 ensures the token is a valid token as the
shadow stack. A restore point is recorded in the form of saving a SAVEPREVSSP pushes the “shadow stack restore token” after
“Shadow Stack Restore token” at the top of the old shadow stack. alignment to the next 8 byte boundary.
Alternatively, the OS can create the restore point when setting up a The “previous SSP token” records the SSP that was active at the
new shadow stack. time the RSTORSSP instruction was invoked and is formatted as
CET enforces there can be only one restore point valid on the follows:
shadow stack (one time use property of token) and if a restore
point is valid on the shadow stack then that shadow stack is not • Bit 63:2 – previous SSP pointing to the top of old shadow
active. CET further enforces that when a shadow stack is activated stack i.e. the SSP active when RSTORSSP was invoked
the SSP is restored to the last value of SSP when that shadow stack • Bit 1 – set to 1 to indicate this is a “previous SSP token”
was previously active. CET further enforces that the restore point • Bit 0 – Mode bit. If 0 then this “previous SSP token” can be
that records the last active SSP is protected from unintended used by SAVEPREVSSP in 32-bit mode. If 1 then this
writes. Lastly CET enforces that a restore point created in 32-bit or “previous SSP token” can be used by SAVEPREVSSP in 64-
64-bit mode can be restored only in the matching mode (enforcing bit mode.
type safety).
The RSTORSSP instruction verifies a “Shadow Stack Restore” This is illustrated by the following example (Figure 2):
token referenced by the memory operand of this instruction to
determine a valid restore point on the new shadow stack. This
“Shadow stack restore token” is a 64-bit value formatted as
follows:
• Bit 63:2 – 4-byte aligned SSP for which this restore point was
created. This SSP must be at an address that is 8 or 12 byte
above the address where this token itself is found. The
RSTORSSP instruction verifies this property.
• Bit 1 – reserved. Must be zero
• Bit 0 – Mode bit. If 0 then this shadow stack restore token can
be used by RSTORSSP instruction in 32-bit mode. If 1 then
this shadow stack restore token can be used by the
RSTORSSP instruction in 64-bit mode.
the new SSP to restore as 0x4000. The RSTORSSP instruction is WRSS can only write to user shadow stack when invoked in user
invoked with the memory operand specifying the address of the mode and can only write to supervisor shadow stacks in supervisor
“shadow stack restore token” as 0x3FF8. The RSTORSSP mode. CET provides supervisory controls that allows an OS to
instruction verifies the mode of the machine against the mode M enable this instruction for user and supervisor mode if the current
recorded in the token, verifies that the reserved bit at position 1 is 0 user program or OS needs this function. For most applications it is
and that the address is in the token, 0x4000 in this example, is 8 or expected that this instruction will be disabled and when disabled
12 bytes from the address of the token itself. Since these checks invocation of this instruction leads to an invalid opcode fault.
succeed, the SSP is now set to 0x3FF8 and the “shadow stack
2.8.4. Fast system call support
restore token” is replaced by the “previous SSP token”. Subsequent
The Intel 64 architecture defines SYSCALL and SYSENTER
to switching to the new shadow stack, a restore point can be
created on the old shadow stack using SAVEPREVSSP instructions to invoke an OS system call handler at privilege level
0 and switch to the OS data stack. When CET is enabled, these
instruction. The SAVEPREVSSP instruction uses the “previous
SSP token” created by the RSTORSSP instruction to create a instructions save the user mode SSP to the IA32_PL3_SSP MSR
and set the SSP to 0 (invalid). The OS returns to user mode
“shadow stack restore token” on the old shadow stack. The
SAVEPREVSSP instruction does not take any operand but following the system call handling using the SYSRET or
SYSEXIT instructions. These instructions restore the user mode
consumes a “previous SSP token” at the top of the shadow stack
i.e. at the current SSP as follows: SSP from the IA32_PL3_SSP MSR. An OS that needs to make
function calls from the system call handler must first activate a
supervisor mode shadow stack because the SSP following
• Verifies that the SSP 8 byte aligned address.
SYSCALL/SYSENTER is 0 (invalid). CET provides the
• Pops 8 bytes of “previous SSP token” from the shadow stack.
SETSSBSY instruction to activate the privilege level 0 shadow
• Verifies that the bit 1 is set to 1.
stack referenced by IA32_PL0_SSP. SETSSBSY instruction
• Verifies that if the current mode of the machine is 64-bit then verifies the “supervisor shadow stack token” referenced by the
the bit 0 is 1 else it must be 0.
IA32_PL0_SSP MSR and if verification is successful, makes the
• Aligns the previous SSP recorded in the “previous SSP token” token busy and sets SSP to content of IA32_PL0_SSP MSR. If
to next 8 byte boundary and pushes a “shadow stack restore token verification fails, the processor will raise a #CP exception
token” to the old shadow stack. with error code “SETSSBSY”. If a system call handler has
activated a shadow stack, it must use CLRSSBSY instruction to
In this example, continuing with the state following the deactivate this shadow stack. The CLRSSBSY instruction takes a
RSTORSSP, the SAVEPREVSSP instruction is invoked. The memory operand that points to the “supervisor shadow stack
SAVEPREVSSP instruction finds the “previous SSP token” with token” of the stack to deactivate and if the token verifies, clears the
the previous SSP recorded as 0x1000 and verifies it. Following this busy bit in the token. If token verification fails, processor sets
verification, the processor pushes a “shadow stack restore token” carry flag (CF) as error indicator. If the CF is set following
on the previous shadow stack at address 0xFF8. If a restore point CLRSSBSY instruction the OS should consider this a fatal error.
on the old shadow stack is not needed, then the “previous SSP The SSP following the CLRSSBSY instruction is set to 0 (invalid).
token” created by the RSTORSSP instruction on the current
shadow stack can be popped using the INCSSP instruction.
3. INDIRECT BRANCH TRACKING
2.8.3. Shadow stack fixup. To detect and prevent attempts to redirect control flow to
CET defines two instructions to enable software to fix-up the unintended targets, CET added support for indirect branch
shadow stack contents if required. The first instruction WRUSS tracking. Indirect branch tracking introduces new branch
(Writes User Shadow Stacks) is a privileged instruction that can termination instructions: ENDBR32 for 32-bit programs and
only be invoked by the OS. The OS may use WRUSS to, for ENDBR64 for 64-bit programs. CET detects and prevents attempts
example, create a bootstrap “shadow stack restore token” for a to redirect control flow to unintended targets in the program by
user mode thread or for actions like creating a call frame for signal causing a #CP exception if the instruction at the target of an
delivery. A second instruction - WRSS – does a write to the indirect call or jump targets is not a matching branch termination
Shadow Stack. WRSS is expected to be used only in specific instruction.
instances to support a software construct (e.g. if the program The ENDBR32 and ENDBR64 opcodes are selected such that they
implements an unusual control transfer using a push followed by a are NOP instructions on Intel 64 processors that do not support
RET) for the short term before the software can be updated to not CET. On processors supporting CET, these instructions are still
require such fix ups. NOP-like as they do not affect the execution state of the program,
WRUSS can be used by the OS to write to user mode shadow do not cause any additional register pressure and are minimally
stacks but not to supervisor mode shadow stacks. A page fault intrusive from power and performance perspective. This allows
exception occurs if the address operand of the instruction does not CET instrumented programs to execute on processors that do not
reference a user mode shadow stack and prevent any attempts to support CET.
maliciously modify the parameters of this instruction to point to a To track indirect call/jump for terminations, the processor
supervisor shadow stack. implements two state machines; one for user mode and one for
HASP’19, June, 2019, Phoenix, Arizona USA Vedvyas Shanbhogue et al.
supervisor mode. At reset the user and supervisor mode state (1Eh)” and “STI (FBh) or CLI (FAh)” to form an unintended
machines are in IDLE state. When instructions other than indirect ENDBRANCH instruction. If the last 3 bytes of an instruction are
call/jump retire the state machine stays in the IDLE state. On an to be F3 0F 1E then the next instruction must be “STI (FBh) or
indirect call or jump instruction completion, the state machine CLI (FAh)” to form an unintended ENDBRANCH instruction.
transitions to WAIT_FOR_ENDBRANCH state. In this state, the CLI/STI and PUSH DS are not typically compiler generated
state machine will cause a #CP fault with error code instructions. Push DS is not a valid instruction in 64-bit mode. If
“ENDBRANCH” if the next instruction (i.e. the instruction at the an instruction encodes an immediate that matches the
target of the indirect call or jump) is not ENDBR64 in 64-bit mode ENDBR32/ENDBR64 instruction then the compiler/code
or ENDBR32 in 32-bit mode. If the instruction is a proper generator should elide those using techniques like constant
ENDBRANCH, the state machine moves back to IDLE state. blinding.
shadow stack and the RET instruction model updated to pop return property that the register y used to link back to the dispatcher be
address from shadow stack and compare against the return address preserved, which further restricts choice from the restricted gadget
from the data stack. The geometric mean of instruction-per-cycle catalog. Likewise, calling functions is also restricted to functions
(IPC) loss across workload traces is around 1.65%. The range of that have their address taken and the invocation has to be at the
IPC loss ranged from 0.08% (HPC and multimedia kernels traces) function entry point placing further constraints on control of stack
and 2.71% (sysmark benchmark traces). and register contents. With the CPU providing the indirect branch
The performance impact of indirect branch tracking was evaluated tracking and return address protection, software and toolchains can
by compiling C and C++ programs from the SPEC CPU 2006 further augment protection with language and platform specific
C/C++ using a modified ICC compiler with CET support. As policies and restrictions on control flow enforcement to increase its
ENDBRANCH instructions execute as NOP on current shipping precision. One example policy could be to restrict the indirect calls
processors, (and will execute as NOP on future processors that to only land on functions that have the same prototype as intended
support CET) these programs are executed with ENDBRANCH by the call site [20, 17, 22]. With this policy a call site may look
instrumentation on Core i7-6500U Processor to measure the like: mov $0xaabbccdd, %rax; call *%rbx and a hash check
performance impact - No perceptible slowdown was measured on performed in the prolog of the address-taken functions as:
average. endbranch; cmp $0xaabbccdd, %rax; jne error. Other policies
may be to restrict sensitive kernel functions to core OS and not
5.2. Security Metrics drivers, restricting sensitive functions to be invoked only from
The shadow stack restricts the flexibility available in creating ROP specific call sites, etc.
gadget chains by enforcing matching calls and returns and also Average indirect target reduction
enforcing a LIFO order on the returns. The shadow stack being Average indirect target reduction (AIR) [9] is a metric proposed by
write protected blocks attempts to inject return address frames on Zhang et. al. to measure strength of control flow integrity (CFI) of
the shadow stack through arbitrary writes (thwarting adversary a program and represents set of reachable program addresses via
capabilities T1 and T3). The shadow stack pointer register not being indirect control transfer sites in program. We use the AIR metric as
directly writeable and paging checks that require the page a measure of the improvement to a program using CET using
referenced by call and returns to be mapped as a shadow stack below equation.
page blocks attempts to pivot the shadow stack to writeable
(
memory or to another shadow stack. Re-using old call frames on
the shadow stack is not possible as the only instructions provided (1 − 𝑇i 𝑆)
)*+
are to unwind the shadow stack through INCSSP.
With indirect branch tracking, COP/JOP gadgets are now limited Here n is the total number of indirect branch transfer sites in the
to only calling or jumping to indirect callable functions, as only program, S represents set of program addresses which all the
such functions would have an ENDBRANCH instruction. The indirect branch transfer sites can direct control flow with no CFI
exploit author will also need to precisely control the parameters protections. And Ti is represents set of program addresses to which
needed to be passed to each of the functions in the chain. Likewise ith indirect branch transfer site can direct control flow with CFI
since entire functions are called and the use of “unintended protections. On x86, indirect control transfers can target any byte
gadgets” is blocked by the indirect branch tracking, the parameters in program, thus S is program code size. Lower AIR value
to such functions will need to be carefully controlled to not have a represents bigger (weak CFI) set of reachable addresses via
return in the function path as a function that was jumped to but indirect control transfer sites while higher AIR value represents a
returned from will be fatal to the gadget chain due to the shadow smaller (strong CFI) set of reachable addresses via indirect control
stack enforcement. Requiring that JOP/COP chains to call or jump transfer sites.
to the entry point of functions also constrains the attackers ability With CET enabled, the ret instruction can target exactly one target
to retain control on the stack and registers as the x86 calling in the program which is the return address at top of the shadow
convention requires the called procedure to restore all of the stack and the call/jmp indirect can only target an endbranch. For
registers and so they begin with pushing the registers on the stack the SPEC CPU 2006 C/C++ benchmark the average AIR metric
and end by popping them off. Not being able to exploit function was computed as 99.98% and for individual programs ranges from
tails to do register restores creates an impediment in gadget 99.93% (xalancbmk) to 99.99%.
chaining. Exploit techniques like call-preceded ROP [4] are Linux Kernel Gadget Analysis
effectively blocked by the shadow stack and indirect branch
We analyzed the Linux kernel (v4.9.9) binary for available gadgets
tracking. Characteristics of the x86 ISA allows finding sufficient
using the ROPgadget tool [21]. The ROPgadget tool searches
byte sequences [18] that decode to jmp through register instruction.
binaries for gadgets to facilitate research related to ROP exploits.
However unlike ret, the indirect jmp through register is much less
The Linux binary we used was a default configuration kernel build
frequent in programs [19]. Indirect branch tracking significantly
of size 25 MB. We restricted our ROPgadget tool scan up to a
restricts the gadget catalog by requiring that gadgets must be of the
gadget depth of 10 bytes yielding a sum of 197241 gadgets – we
endbranch; jmp *y form and be valid instruction sequences in the
expect the usable gadget count to be more than that sum since we
program. Chaining of these gadgets through an update-load-branch
restricted our search space to a depth of 10 bytes due to limitations
gadget [19, 3] of the form endbranch; pop x; jmp *x requires the
of the tool. The distribution of the counts for different gadget sizes
HASP’19, June, 2019, Phoenix, Arizona USA Vedvyas Shanbhogue et al.
is shown in figure 4. Gadgets harvested via the ROPgadget tool the associated performance overheads. Pointer authentication Code
ends in an indirect branch and are linkable/chainable. In contrast, (PAC) [23] and CCFI [24] have proposed using cryptographic
in a CET-enabled binary, the exploit writer is restricted to using message authentication code (MACs) to protect control flow
exported functions that have an ENDBRANCH and returning to elements such as return addresses. Safestack separates the program
the last address on the shadow stack. In the Linux kernel binary stack into two regions to protect return addresses [25]. Davi et. al.
analyzed, 18412 exported functions were found. These exported propose HAFIX [12] that uses hidden label stack based hardware
functions need to be chained using an indirect call/jump and not implementation to restrict returns to active call sites. Lee et. al.
through malicious use of ret. The measured average size of the propose a Secure Return Address Stack (SRAS) [14] that modifies
kernel exported functions is 214 bytes and indicate the increase in call and return instructions to implement a secure stack in
complexity of using COP due to larger side-effects. hardware and use pinned physical memory as a backing store. CET
approach to shadow stack has parallels to the SRAS scheme but
unlike SRAS, it does not implement a hidden shadow stack and
supports shadow stack in linear address space of program. A
desirable property of the shadow stack is to protect it from
unintended writes. Shadow stack schemes using software
instrumentation have relied on information hiding [11] to prevent
writes to the shadow stack whereas hardware schemes have been
proposed [12, 14] using on-chip memory for the shadow stack.
CET extends x86 paging and EPT architecture to allow the OS and
VMM to write-protect the shadow stacks. While protecting the
return addresses using shadow stacks it is also important to be able
to preserve the last-in-first-out (LIFO) [12, 13] property of control
flow. CET defines a LIFO shadow stack and defines instructions to
Figure 4: Gadgets found in Linux Kernel v4.9.9
enable non-LIFO software constructs in a safe manner. An indirect
call or jump can target any executable byte in the program and
We also analyzed the 18412 exported functions for the presence of
given the dense encoding of the x86 ISA the byte stream thus
an outgoing indirect CALL/JMP from those functions, and
targeted may be interpreted as a valid sequence of instructions with
possible dispatch loops - the presence of a loop around an outgoing
high probability. Control-flow integrity schemes have tried to
indirect branch that allows for functions to be chained. We found
address this issue by introducing instrumentation to check if the
2988 functions with outgoing indirect CALLs and none with
target of the indirect call or jump is a valid target. Abadi et. al. [6]
indirect JMP. Of the 2988 exported functions with forward links,
propose using prefetch instruction to embed an ID at valid indirect
we found 148 functions that have at least one dispatch loop. This
call/jump targets and inject a code sequence prior to the indirect
elimination of unintended gadgets and small number of exported
call/jump to check the ID. Microsoft Control Flow Guard (CFG)
functions that can be linked indicates that the attack surface can
[7] introduces a bitmap where each bit indicates whether there is a
now be analyzed systematically to eliminate un-needed cases and
start of a function in the 16 bytes of process address space
address un-safe code constructions via redesign or focused checks
corresponding to that bit. A guard check function is invoked prior
via known software techniques.
to an indirect call to test the CFG bitmap to determine if the target
is a valid target. Hardware Control Flow Integrity proposal [5]
proposes a pair of new instructions called “jump landing point”
(JLP) and “call landing point” (CLP) that can be used to mark
destinations of control flow branches in a program. LLVM Indirect
Function Call Check (IFCC) [17] generates jump tables for
indirect-call targets and adds code at indirect-call sites to transform
function pointers such that they point to a jump table entry.
Schuster et. al [15] have proposed defenses by restricting the
invocation of sensitive functions to specific call sites in the
program. CET provides the branch terminating instructions
(ENDBR32/64) to enforce instruction alignment and restrict
control transfers to valid indirect call targets in the program with
low overhead and enables complimentary software and language
Figure 5: Linux exported function size distribution specific policies to be anchored around the hardware enforcement.
to restrict COP/JOP attacks. CET design strives for minimal [13] M. Theodorides and D. Wagner. 2017. Breaking Active-Set
performance and memory overheads, while meeting strong security Backward-Edge CFI. Proceedings of the IEEE International
Symposium on Hardware Oriented Security and Trust (HOST '17)
and compatibility objectives. In this paper, we perform an analysis
of the enforcement of the security objectives for CET. In future [14] R. B. Lee, D. K. Karig, J. P. McGregor, and Z. Shi. 2003. Enlisting
Hardware Architecture to Thwart Malicious Code Injection,
work, we aim to evaluate how the CET ISA can be leveraged by Proceedings of the International Conference on Security in Pervasive
software for further strengthening CFI properties for specific Computing, Boppard, Germany.
software domains. [15] F. Schuster, T. Tendyck, C. Liebchen, L. Davi, A.-R. Sadeghi, and T.
Holz. 2015. Counterfeit object-oriented programming: On the
ACKNOWLEDGMENTS difficulty of preventing code reuse attacks in C++ applications, in
IEEE Symposium on Security and Privacy (S&P).
We thank the anonymous reviewers for their review and feedback.
[16] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. 2005. Control-
flow integrity. In Proceedings of the 12th ACM conference on
REFERENCES Computer and communications security.
[1] Intel® 64 and IA-32 Architectures Software Developer Manuals. [17] C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, U. Erlingsson,
https://software.intel.com/en-us/articles/intel-sdm L. Lozano, and G. Pike. 2014. Enforcing forward-edge control-flow
[2] R. Roemer, E. Buchanan, H. Shacham, and S. Savage. 2012. Return- integrity in GCC & LLVM. In USENIX conference on Security.
oriented programming: Systems, languages, and applications. ACM [18] S. Checkoway and H. Shacham. 2010. Escape from return-oriented
Transactions on Information and System Security (TISSEC). programming: Return-oriented programming without returns (on the
[3] T. Bletsch, X. Jiang, V. W. Freeh, and Z. Liang. 2011. Jump-oriented x86). Technical Report CS2010-0954, UC San Diego.
programming: a new class of code-reuse attack. In Proceedings of the [19] Checkoway, S., Davi, L., Dmitrienko, A., Sadeghi, A.-R., Shacham,
6th ACM Symposium on Information, Computer and H., and Winandy, M. 2010. Return-oriented programming without
Communications Security. returns. In ACM Conference on Computer and Communications
[4] N. Carlini and D. Wagner. 2014. ROP is Still Dangerous: Breaking Security (CCS).
Modern Defenses. In 23rd USENIX Security Symposium (USENIX [20] PaX Team. 2015. RAP: RIP ROP
Security 14). https://pax.grsecurity.net/docs/PaXTeam-H2HC15-RAP-RIP-
[5] Systems and security services analysis office. 2015. Hardware ROP.pdf
Control Flow Integrity (CFI) for an IT ecosystem. [21] Salwan, J. Ropgadget
https://github.com/iadgov/Control-Flow-Integrity/tree/master/paper. https://github.com/JonathanSalwan/ROPgadget
[6] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. 2009. Control- [22] Victor van der Veen, Enes Goktas, Moritz Contag, Andre Pawlowski,
flow integrity principles, implementations, and applications. ACM Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, Elias
Transactions on Information and System Security. Athanasopoulos, and Cristiano Giuffrida. 2016. A Tough call:
[7] Control Flow Guard. https://msdn.microsoft.com/en- Mitigating Advanced Code-Reuse Attacks At The Binary Level. In
us/library/windows/desktop/mt637065(v=vs.85).aspx 2016 IEEE Symposium on Security and Privacy.
[8] T. H. Dang, P. Maniatis, and D. Wagner. 2015. The performance cost [23] Pointer Authentication on ARMv8.3.
of shadow stacks and stack canaries. In ACM Symposium on https://www.qualcomm.com/media/documents/files/whitepaper-
Information, Computer and Communications Security, ASIACCS pointer-authentication-on-armv8-3.pdf
’15. [24] Mashtizadeh, A. J., Bittau, A., Mazieres, D., and Boneh, D. 2014.
[9] M. Zhang and R. Sekar. 2013. Control Flow Integrity for COTS Cryptographically enforced control flow integrity. In
Binaries. In USENIX Security. arXiv:1408.1451[cs.CR].
[10] Intel® 64 and IA-32 Architectures Optimization Reference Manual. [25] Volodymyr Kuznetsov, László Szekeres, Mathias Payer, George
https://www.intel.com/content/dam/www/public/us/en/documents/ma Candea, R. Sekar, and Dawn Song. 2014. Code-pointer integrity. In
nuals/64-ia-32-architectures-optimization-manual.pdf. Proceedings of the 11th USENIX conference on Operating Systems
Design and Implementation (OSDI'14).
[11] A. Oikonomopoulos, C. Giuffrida, E. Athanasopoulos, and H. Bos.
2016. Poking holes into information hiding. In USENIX SEC. [26] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss,
Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas
[12] L. Davi, P. Koeberl, and A.-R. Sadeghi. 2014. Hardware-assisted Prescher, Michael Schwarz and Yuval Yarom. 2019. In Proceedings
fine-grained control-flow integrity: Towards efficient protection of of the 40th IEEE Symposium on Security and Privacy.
embedded systems against software exploitation. In Annual Design
Automation Conference - Special Session: Trusted Mobile Embedded [27] Intel® Control-flow Enforcement Technology Preview document.
Computing, DAC ’14. https://software.intel.com/sites/default/files/managed/4d/2a/control-
flow-enforcement-technology-preview.pdf