1 Introduction
There exist two predominant
field-programmable gate array (FPGA) technologies on the market today:
static random access memory (SRAM)-based and flash-based technologies. Both of these technologies are featured in Figure
1. SRAM-based FPGA devices require an external
non-volatile memory (NVM) to store their configuration information between power-offs due to the volatility of SRAM cells. In contrast, flash-based FPGA devices will retain their design indefinitely.
According to a 2019 market study by Gartner, the FPGA market consists of four significant manufacturers [
21]. These are AMD-Xilinx at 51.1% of the market, Intel at 35.8%, Microsemi at 6.6%, and Lattice at 5.0%. The most prominent FPGA manufacturers, AMD-Xilinx and Intel, produce SRAM-based FPGA devices. Two other manufacturers, Microsemi and Lattice [
80,
100], produce flash-based FPGA devices. Noting that SRAM-based FPGAs constitute the greatest proportion of the market today, the rest of this article will focus on this technology.
FPGA devices are particularly interesting to cybersecurity researchers since their hardware structure is defined mainly by the content of a memory array located within the device. This memory array, which we refer to as the FPGA’s configuration memory, is loaded with a bit file, commonly known as the bitstream. The bitstream actively modifies switches and logic blocks, and interconnects within the device’s configurable logic, also known as its fabric. Of the research works that deal with FPGA security, many seek to evaluate FPGA vulnerability to cyber attacks that compromise their bitstream. The bitstream is somewhat of a blueprint for the fabric of an FPGA. An attacker who successfully extracts a target’s plaintext bitstream from an FPGA’s configuration memory can reverse engineer this bitstream to either clone the intellectual property contained therein or add malicious functionalities back into the target. To protect their devices from attacks that specifically target the bitstream, FPGA manufacturers began adding complex encryption and authentication schemes [
86,
127]. Although this adds a new layer of complexity for potential attackers, this article shows that multiple researchers have successfully compromised these security features.
More recently, FPGA manufacturers have begun incorporating elaborate processing systems on the same die as the FPGA. We commonly refer to these devices as system-on-chip (SoC) FPGA devices. By limiting the need for external signals, SoC FPGA devices provide an additional layer of security to communication paths between a processor and the FPGA. Although this improves the security of the processor-FPGA communication link, the complexity of the chip’s design opens up these devices to new cybersecurity concerns. In cybersecurity terminology, the complexity of SoC FPGA devices expands the attack surface.
When discussing cybersecurity, we divide attacks into two general classes: active and passive. Attacks in both classes seek to break the confidentiality, integrity, and availability of a
target of exploitation (TOE). Active attacks seek to break a TOE by perturbing its normal function. A classic example of an active attack is a
fault injection (FI) attack, where one will intentionally provoke the TOE to fall within an unintended state in an attempt to get it to leak restricted information [
13,
68]. However, passive attacks seek to extract information from a TOE without interacting with its regular operation.
Side-channel attacks (SCAs), where one analyzes a TOE’s external outputs and emissions to extract secret information, are classic examples of passive attacks [
135].
By leveraging active and passive attacks, malicious actors can pose various threats to assets contained within FPGA and SoC FPGA devices. To secure a system against these malicious actors, system designers need to follow a methodology that thoroughly examines where security measures (i.e., controls) need to be applied. Muckin and Fitch [
114] propose one such methodology—the IDDIL/ATC methodology defined in Figure
2.
The IDDIL/ATC methodology is divided into two phases: discovery and implementation. In an attempt to keep the scope of this work general, we will primarily focus on the discovery phase and the activities contained therein. The system of choice will be a generic SoC FPGA with no specific design or operating environment. However, since basic security features such as encryption, authentication, and debug port disable are available on most recent SoC FPGA devices, these are assumed to be implemented. The final result will yield a generic threat model that outlines the attack surface, potential attack vectors, and threat actors of typical SoC FPGA devices. Since in-system execution of SoC FPGA devices is design dependent and will vary from one application to the next, we provide a review of applicable controls, functions, and interfaces without fully decomposing the system.
This article is organized as follows. Section
2 begins by discussing past works tackling the subject of FPGA cybersecurity. Next, FPGA and SoC FPGA assets and their attack surface are described in Section
3. We then describe techniques to identify threats in Section
4 and analyze multiple attack vectors that apply to FPGA and SoC FPGA devices in Section
5. Section
6 looks at security measures and controls one can implement in FPGA and SoC devices. Finally, Section
7 identifies future FPGA research and developments.
5 Literature Review of FPGA Attack Vectors
In this section, we review the multiple attack vectors that can affect FPGA and SoC FPGA devices. We begin each discussion on a given attack vector by first providing a brief background on the attack vector, followed by a literature review of researchers who have applied these techniques directly to FPGA devices.
Moving forward, we recall the primary asset of FPGA devices, the bitstream, to realize that all of the attack vectors have something in common: whichever way an attacker successfully extracts a plaintext bitstream from the FPGA, the need to reverse engineer this bitstream remains. Therefore, as a starting point, we review bitstream reverse engineering tools and techniques found in the literature.
5.1 Bitstream Reverse Engineering
FPGA manufacturers facilitate the design process for their respective platforms with the help of
electronic design automation (EDA) tools such as Vivado (AMD-Xilinx) [
9], Quartus Prime (Intel) [
60], Libero (Microsemi) [
79], and Lattice Diamon (Lattice Semiconductor) [
5].
All of these EDAs more or less follow a similar design flow. Developers start with register-transfer level (RTL) code. EDAs will synthesize this into a gate-level implementation or netlist and create a target-specific floorplan large enough to accommodate the design. From there, the EDA will place logic cells within this layout and route everything together. At the end of the process, the EDA’s output will be a stream of bits destined to fill the FPGA’s configuration memory and thus configure its logic cells and routing resources.
If one wishes to reverse engineer a bitstream from the bitstream back to the RTL code, one first must know the relationship between the bits contained within the bitstream and the gate-level implementation they represent. This relationship is manufacturer dependent and will vary between FPGA families and series. In this survey, we found that most of the work done on the subject thus far has been focused on AMD-Xilinx, Lattice Semiconductor, and Microsemi bitstreams [
20,
30,
65,
71,
121,
130,
152,
175,
179]. As far as Intel is concerned, to the best of our knowledge, there have been no significant reverse engineering efforts on their FPGA bitstream. Given that it is a predominant FPGA manufacturer on the market today, one would expect similar advances to those seen on AMD-Xilinx FPGAs. However, we believe that since Intel, unlike AMD-Xilinx, provides very little information about the layout of their devices, researchers have so far focused elsewhere.
5.1.1 AMD-Xilinx.
With AMD-Xilinx being one of the pioneers in the FPGA domain, we find that early research works aimed at reverse engineering bitstreams began here. In the early days, these were mainly focused on understanding the FPGA’s internal structure to implement partial reconfiguration [
40,
50,
82,
148,
173]. One of the first publications to analyze a bitstream from a cybersecurity perspective was presented by Ziener et al. [
179], who sought to find a way of identifying unlicensed proprietary cores within an FPGA from its bitstream. This work looked at the Virtex-II platform, whose user guide [
6], much like previous AMD-Xilinx platforms, revealed a fair level of valuable details into the Virtex-II’s logic cell structure and bitstream. This work successfully identified which packets within the bitstream represented lookup table content and used this content to identify specific unlicensed cores with high certainty.
Following these early attempts, Note and Rannaud [
121] introduced the first publication to present a coherent algorithm to reverse engineer the bitstream back into its netlist. They made use of the now replaced AMD-Xilinx tool,
ncd2xdl, which would translate the netlist circuit description file into a clear-text representation of the netlist, an AMD-
Xilinx design language (XDL) file. The XDL file provided detailed information on the FPGA’s internal state to identify configurable logic blocks and programmable interconnect points. With knowledge of the configurable logic blocks and programmable interconnect points present in a particular design from the XDL file and by the recurring structure of the FPGA, Note and Rannaud could relate an active site within the XDL file to its corresponding site within the bitstream. Using this established relationship, they created a program that automated the process of associating a particular portion of a bitstream to its corresponding XDL format. The next step performed by Note and Rannaud remains the basis of even the most recent bitstream reverse engineering tools: the creation of a database that holds a collection of associations between the location of bits within the bitstream and its corresponding netlist information. Using this database, one could step backward and retrieve critical portions of a plaintext bitstream in its XDL format. Although Note and Rannaud’s work proved to be a significant advancement for the reverse engineering of the AMD-Xilinx bitstream, a considerable amount of information was still missing. With the intent of tackling this deficiency, Benz et al. [
20] looked at AMD-Xilinx’s netlist circit description report file (XDLRC). AMD-Xilinx uses the XDLRC file to describe the structure and resources of its various FPGA platforms. This improvement allowed Benz et al. to fully retrieve the FPGA architecture from the bitstream.
At the time of writing, a collaborative project by the name of F4PGA is attempting to create a universal open-source computer-aided design platform for FPGA development [
116]. The project currently targets FPGA devices from two manufacturers: Lattice Semiconductor and AMD-Xilinx. However, the project aims at creating a platform to develop a much wider variety of FPGA architectures. Within F4PGA, at its lowest level, the logic hardware is described in a generic FPGA assembly (FASM) format devised for the project. F4PGA then has sub-projects to handle the transition from FASM to bitstreams from various FPGA manufacturers. These sub-projects gather the results from multiple individual bitstream generation runs to create device-specific databases. For AMD-Xilinx devices, this F4PGA sub-project goes by the name
Project X-Ray [
152]. Project X-Ray has successfully mapped out Artix-7 devices and is working on the remainder of AMD-Xilinx 7-series FPGAs. These same databases created for F4PGA bitstream generation can also convert a bitstream into the FASM format. From the FASM format, one can derive which features are enabled by the target bitstream [
153].
5.1.2 Lattice Semiconductor and Microsemi.
Bitstreams for devices from Lattice Semiconductor and Microsemi have also been reverse engineered recently. For Lattice Semiconductor, project Trellis [
130] and project IceStorm [
30] have successfully reverse engineered the bitstreams of the ECP5 and iCE40 devices, respectively. In the case of Microsemi, its ProASIC3, IGLOO2, and FUSION devices were successfully reverse engineered in 2021 by Kim et al. [
71].
5.1.3 Deep Learning for Bitstream Reverse Engineering.
Recently, Chen and Liu [
27] showed how to recover function blocks from bitstreams using deep learning. They achieved this through a deep learning based object detection algorithm by first transforming the bitstreams of FPGA designs into images suitable for deep learning processing.
5.2 Side-Channel Attacks
SCAs represent one of the literature’s first and most prevalent forms of physical attacks. They are passive attacks that seek to exploit weaknesses in the physical implementation of electronic devices. In an SCA, one analyzes changes in power supply [
48,
109,
110,
111,
140,
166,
171,
172,
177], thermal signature [
133],
electromagnetic (EM) emanations [
16,
63,
112,
170], photonic emanations [
157], and timing [
115] to extract secret information from a TOE. In the context of FPGA devices, attackers primarily use SCA against cryptographic engines to retrieve their secret keys; however, some use cases have demonstrated how side-channel emissions can also be used to recover input images and parameters from
convolutional neural networks (CNNs) and binarized neural networks implemented within FPGA devices [
166,
170].
In practice, SCAs normally begin with a series of measurements taken over the channel of choice. Since variation in a given channel due to a device’s cryptographic operations is minimal, one normally takes the average of many of these measurements to reduce the effects of noise. Once a sufficiently large number of measurements are captured, an attacker can use several methods to recover the secret key. These methods include simple power analysis [
73], differential power analysis [
72,
74], template attacks [
26],
correlation power analysis (CPA) [
23], and mutual information analysis [
44].
5.2.1 SCAs on Cryptographic Algorithms.
Turning our attention to practical examples of SCAs on FPGA devices, we find that one of the first such attack on an FPGA was attempted by Moradi et al. [
109] in 2011. In this attack, they broke the AMD-Xilinx Virtex–II DES encryption by analyzing its power supply through differential power analysis techniques. Moradi et al. also performed similar attacks on AES encryption using CPA [
110,
111] in 2012 and in 2013. In their 2013 attack, Moradi et al. targeted the Altera (Intel) Stratix II FPGA via its power supply. Using a digital oscilloscope and a custom programmer based on an ATmega256, they successfully retrieved the full AES-128 key in less than 3 hours.
As for other SCA mediums, Moradi and Schneider [
112] also successfully mounted EM-based SCAs on AMD-Xilinx FPGA devices. Using a process similar to their previous CPA attack but replacing power readings with EM readings, they broke the AES-256 bitstream encryption of the 5, 6, and 7 series AMD-Xilinx FPGA devices. Comparing their attacks using EM and power side channels, they found that the positional accuracy due to the EM probe drove up the required number of traces. This trend was more significant as technology shrunk from 65 nm with the 5 series to 28 nm with the 7 series. Nevertheless, the non-intrusive approach to EM-based SCA remained a distinct advantage since it only requires one to place the EM probe close to the TOE. A related work by Iyer and Yilmaz [
63] in 2019 proposes an adaptive acquisition protocol to help identify the optimal EM capture configuration. The protocol, tested on the AMD-Xilinx Artix-7 FPGA, was found to reduce the required acquisition time by a factor of close to 35.
Finally, recent SCA methods have turned to AI for significant improvements in the number of measurements required. This approach was presented by Ramezanpour et al. [
134] in 2020, where they successfully extracted the key from an AES algorithm implemented within the fabric of an Artix-7 FPGA with less than 3,700 measurements. Their approach used unsupervised learning to extract the information required for the leakage model, thus allowing the attack to occur without any prior knowledge of the device. In another approach, Wang and Dubrova [
164] demonstrate how deep learning using a single neural network classifier can recover the key from an AES algorithm implemented within an Artix-7. Their results showed they could recover the secret key with less than 430 measurements.
5.2.2 SCAs in Cloud Computing.
Although all previously stated attacks require physical access to the TOE, the emerging trend of integrating FPGA devices within cloud computing instances is raising new opportunities for attackers. Particularly, placing multiple tenants on a single device to share reconfigurable resources raises concerns over the leakage of sensitive information between isolated portions of the reconfigurable fabric through their power distribution network. With these applications in mind, recent works have shown that SCA techniques can be applied remotely [
48,
133,
140,
177]. For instance, Gravellier et al. [
48] made use of an AMD-Xilinx Zynq-7000 to show that they could infer the encryption key of both an AES instance running on a CPU and from a hardware implementation of the algorithm based in the fabric. Such capabilities could easily result in loss of confidentiality for unsuspecting cloud users.
5.2.3 SCAs on Neural Networks.
Turning to other applications of SCAs, we find Wei et al. [
166], who performed a power SCA to recover the pixel value of a CNN’s input image. Their attack on the AMD-Xilinx Spartan-6 FPGA successfully recovered an image being processed in a classification task. In addition, Yu et al. [
170] performed an EM-based SCA to retrieve the weight values of a binarized neural network. Using the AMD-Xilinx Zynq-7000 SoC FPGA, they showed that they could accurately recover the underlying model characteristics and develop a substitute model from these values.
5.2.4 SCAs on Physically Unclonable Functions.
Yu et al. [
171] in 2020 demonstrated how SCAs could further be used to classify the sequence of 1’s and 0’s from an FPGA’s
physically unclonable function (PUF). Using an AMD-Xilinx Artix-7 FPGA as their target, they combined voltage-based SCA with deep learning to show that they could overcome PUF-based key provisioning and remote attestation measures.
5.2.5 SCAs on True Random Number Generators.
Another application of SCAs comes with
true random number generators (TRNGs). The aim of SCAs applied to TRNGs is usually to determine the frequency of TRNGs implemented with
ring oscillators (ROs). Knowledge of this frequency can then be used to mount other attacks, such as FI attacks on the TRNG’s output. One such approach using an EM-based SCA was presented by Bayon et al. [
16] in 2013. Their attack revealed they could deduce the RO TRNG’s frequency with high accuracy. Building on the work of Bayon et al., Yu et al. [
172] in 2021 combined voltage-based SCA, deep learning, and a bitstream modification attack to extract the TRNG’s output. Demonstrated on an AMD-Xilinx Artix-7 FPGA device, their attack resulted in a near-perfect accuracy.
5.3 FI Attacks
FI attacks are active attacks that seek to modify the behavior of the TOE. Ways of performing FIs include manipulating the TOE’s temperature, supply voltages, or clock signals, or injecting external EM pulses, white light, laser, X-ray, or ion beams into the TOE [
13,
68].
In experimental implementations of FI attacks, faults injected into a TOE cause transistors to switch abnormally. These abnormal transitions will lead to instruction skips or corrupted data values. The literature divides these fault attacks into three general sub-classes: algorithm modification,
differential fault analysis (DFA), and safe error [
181]. Fault attacks that fall into the first sub-class, algorithm modification, will seek to skip or modify a critical instruction to circumvent a security measure. The second sub-class, DFA, is likely one of the most prevalent fault attacks. DFA seeks to inject faults in encryption and authentication mechanisms to retrieve their secret keys. The final sub-class, safe error, has a broader definition than its previous two sub-classes and features any fault attack that changes the expected behavior of a TOE.
Depending on the FI technique used, one can obtain differing results. Characteristics that define FI techniques include control over fault location and control over fault timing [
68]. Both features will range from precise control to loose control to no control at all. For instance, injecting faults by varying the device’s supply voltage will offer no control over the fault’s physical location, whereas injecting faults via laser will allow precise positional control. Table
4 summarizes the defining characteristics of each relevant FI technique.
5.3.1 FI Characterization Studies.
Among the works conducted on FI, some researchers have sought to characterize the behavior of a given target due to FIs. Such characterizations were conducted for
electromagnetic fault injections (EMFIs) by Zussa et al. [
180] in 2014, again for EMFIs by Paquette et al. [
126] in 2021, for voltage FI by O’Flynn [
122] in 2016, and for laser FI by Selmke et al. [
143] again in 2016.
5.3.2 FI Attacks on Cryptographic Algorithms.
Other researchers have sought to execute specific attacks on FPGA devices. Most of these attacks demonstrated in the literature have sought to show the vulnerability of AES implementations on cryptographic modules within the FPGA [
2,
22,
24,
33,
98,
124,
132,
143,
168,
181,
182]. For instance, in 2016, Selmke et al. [
143] performed a laser FI on an AES core implemented within the AMD-Xilinx Spartan-6 FPGA device. Although the implementation featured a redundancy circuit, they could inject the same fault twice with a two-laser setup and thus induce exploitable faults into the circuit.
5.3.3 FI Attacks on SoC FPGA Devices.
In 2016, Timmers and Spruyt [
161] demonstrated an attack highly relevant to the discussion on SoC FPGA. Although this attack focuses explicitly on an ARM CPU, it also introduces the SoC FPGA attack vector. Their attack demonstrated that they could use a voltage fault attack to skip instructions processed by the ARM CPU on the Zynq-7000. Such capabilities raise important concerns for bitstream security. Suppose an attacker can gain control of the CPUs by skipping the authentication check. Access to the FPCA configuration module might be possible via the device controller module even though security features such as encryption and debug port disabling are in place.
5.3.4 FI Attacks in Cloud Computing.
Researchers have also shown that they can launch FI attacks from hardware Trojans inserted within the fabric or from neighboring circuits in multi-tenant cloud computing platforms. This example was presented by Gnad et al. [
47] in 2017, where they showed that an RO could be used as a form of voltage-based FI mechanism. They found that the spontaneous current draws exerted by frequently activating the ROs would affect the device’s normal operation. In their experiments, Gnad et al. validated the effects of such voltage-based FIs on the AMD-Xilinx Virtex 7, Kintext 7, and Zynq 7020 FPGA devices. In all three FPGA devices, the FI-induced errors ranged from complete system resets to minor malfunctions. Expanding on the works of Gnad et al., Krautter et al. [
78] in 2018 formalized the hardware Trojan voltage-based FI attack as the
FPGAhammer. Using FPGAhammer, they carried out a DFA on an AES implementation within the Intel Cyclone V SoC FPGA. Their attempt showed they could successfully inject timing faults via the RO-induced voltage drops. Their results showed that using approximately 35% to 45% of FPGA LUTs, they could recover 90% of the secret AES key.
Another remote FI technique was introduced by Alam et al. [
4] in 2019. In their attacks, Alam et al. showed that dual-port RAMs in FPGA devices would allow for concurrent writes, which can result in memory collisions when opposing values are written to the same address simultaneously. These collisions cause transient shorts that can be exploited to increase the temperature of the FPGA. These temperature increases can induce timing violations and thus bit-flips in the FPGA device’s configuration memory.
5.3.5 FI Attacks on PUFs.
In 2015, Tajik et al. [
156] demonstrated laser FIs on PUFs. In their experiment, the laser FIs were used to bypass specific countermeasures placed on PUFs to secure them against machine learning based attacks. In a second attack, they also showed how they could stop the ROs in a RO PUF, thus reducing the entropy of the numbers generated.
5.3.6 FI Attacks on TRNGs.
Other researchers, such as Bayon et al. [
17] in 2012 and Martin et al. [
95] in 2015, have evaluated the impact of FIs on TRNGs implemented within FPGA devices. One of the latest such attacks was demonstrated by Madau et al. [
90] in 2018. Using EMFI, they demonstrated how an EM pulse could affect the output of a TRNG implemented within an AMD-Xilinx Spartan-6 FPGA.
5.3.7 FI Attacks on Neural Networks.
FI attacks are also possible on neural networks implemented within FPGA devices [
83,
88,
178]. The most recent of these attacks, by Luo et al. [
88] in 2021, used an AMD-Xilinx Zynq-7000 SoC FPGA to demonstrate how power glitching triggered through a specialized oscillating circuit was able to inject faults into a DNN on a neighboring portion of the fabric.
5.4 Probing Attacks
In a probing attack, an attacker attempts to monitor a die’s internal signals directly. Several probing techniques exist to accomplish this. These are mainly divided between electrical and optical probing techniques [
165]. Electrical probing techniques are those techniques that require direct contact with electrical paths within the die. However, optical probing techniques will either analyze photon emissions from transistors during switching activity or analyze light reflected on switching transistors after an external light source illuminates them. Probing attacks are, for the most part, invasive attacks; however, some non-invasive probing attacks can be found in the literature [
157].
5.4.1 Electrical Probing.
Electrical probing attacks are typically invasive, requiring attackers to physically decompose their target layer by layer to reverse engineer the electrical pathways within the chip [
158]. This reverse engineering step will generally require the use of an optical microscope or a more expensive scanning electron microscope [
145]. Once the target pathways are identified, we can use tools such as a focused ion beam system or a laser cutter to etch a hole and deposit the conducting material required for electrical probing.
Although FPGA and SoC FPGA devices are not immune to electrical probing attacks, examples of electrical probing attacks on FPGA are not prevalent in the literature. To provide a practical example, however, we consider an attack demonstrated by Skorobogatov [
146] in 2017. As his TOE, Skorobogatov targeted an 8-bit smartcard CPU core built with a 0.35-
\(\mu\) m complementary metal-oxide-semiconductor process with three metal layers. Using a similar approach to what is described earlier, he extracted the entire memory space of the device successfully.
5.4.2 Optical Probing.
Optical probing can be somewhat less invasive than electrical probing. In optical probing, one can also take advantage of the backside of the chip to access critical signals hidden deep within the chip. This approach is efficient on flip-chip packages, where the backside is readily accessible. In most cases, however, some chip decapsulation is still necessary to ensure the light can penetrate the target area.
As an example of optical probing on FPGA devices, we find Tajik et al. [
157], who in 2017 successfully mounted an entirely non-invasive attack on the AMD-Xilinx Kintex-7 FPGA. This particular chip is available as a flip-chip package and thus provides direct access to the silicon substrate from its backside. Building on the previous work of Lorhke et al. [
84] in 2016, without any modification to the device under test, by using a light source with a wavelength invisible to silicone, they could see through the Kintex’s die. They were able to find and precisely map each bit of the bitstream as they exited the decryption engine. Using an internal clock of 33 MHz, they estimated a total acquisition time of 43 minutes for the whole bitstream. Furthermore, they estimated that the lab work required to complete the attack ranges from a few hours to a few days. As a further example of an optical probing attack, in 2018, using the same techniques as Tajik et al., Lohrke et al. [
85] showed how optical probing could extract the full 256-bit AES key directly from the Zynq Ultrascale’s BBRAM.
One of the latest optical probing attempts on FPGA was demonstrated by Krachenfels et al. [
76] in 2019. They showed that it is possible to perform an attack similar to those presented by Tajik et al. and Lohrke et al. with a lower-cost laser FI setup. However, here, they noted a longer acquisition time.
Although the optical probing attack demonstrated by Tajik et al. exposes the vulnerability of flip-chip packages, we should note that most probing attacks will require some decapsulation effort in addition to the lengthy reverse engineering process. Furthermore, although Skorobogatov could quickly identify the data bus lines on the top metal layer, most secure chips will likely keep critical pathways deep within the sub-layers.
5.5 Hardware Trojans
During the attack surface identifications stage of the IDDIL/ATC methodology, we have seen how source code remains vulnerable to hardware Trojans while in their development stage. A seemingly unimposing piece of code inserted at this stage could translate into a significant vulnerability once deployed. Moreover, developers should take disproportionate measures to prevent their insertion.
5.5.1 Hardware Trojan Implementations.
Looking at the literature, we find several works demonstrating potential hardware Trojan implementations. Among these implementations, we find one proposed by Chakraborty et al. [
25] in 2013, where ROs are inserted into a design to reduce the device’s lifetime; Ahmed et al. [
3] in 2021 presented a Trojan that leaks out a target’s AES key as it is being processed within an FPGA; and Ye et al. [
169] in 2018 introduced a Trojan that can control the image classification process of a CNN implemented within an FPGA.
Other works, including Swierczynski et al. in 2015 [
151] and in 2018 [
150], and Ngo et al. [
117] and Moraitis and Dubrova [
113] in 2020, have investigated how adversaries could directly manipulate the bitstreams of cryptographic implementations in an effort to weaken them and thus make key recovery possible. Their results for differing cryptographic algorithms show that direct bitstream manipulation can weaken cryptographic implementations without any reverse engineering requirement.
Some researchers have examined how Trojans could impact the ARM TrustZone technology found within many SoC FPGA devices [
18,
49]. For instance, Benhani et al. [
19] used the AMD-Xilinx Zynq-7000 SoC FPGA device to show that malicious modifications to the FPGA design could jeopardize isolation and segmentation put in place via TrustZone.
5.5.2 Hardware Trojan Insertions.
Several points along the bitstream development chain have been identified as vulnerable to Trojan insertion techniques in the literature. Among the proposed approaches, we find Zhang et al. [
176], who suggested in 2019 that one could insert Trojans through a malicious FPGA design suite. Another approach proposed by Ahmed et al. [
3] in 2021 introduced a Trojan during the place-and-route step of bitstream generation. In their respective techniques, they could bypass all design check rules, thus ensuring that their Trojan remained undetected until their activation in the FPGA fabric.
5.6 Covert Channels
Covert channels show much resemblance to the previously discussed SCAs and are distinguished from them by whether one provoked the leakage of information or not. In an SCA, attackers exploit information that is accidentally leaked from the device, whereas in a covert channel, information is deliberately transferred from one device to another.
5.6.1 Thermal-Based Covert Channels.
We find an example of a covert channel demonstrated on an FPGA device by Iakymchuk et al. [
57] in 2011. They showed how two electrically isolated circuits on a single FPGA could communicate via a thermal-based covert channel. They established their covert channel with the use of specially designed ROs. They used a set of 20 ROs within the transmitter circuit to generate heat. A counter connected to a single RO within the receiver would detect any changes in the RO’s frequency induced by variations in the die’s temperature. Iakymchuk et al. showed how they could use this setup to transfer the secret key from an AES implementation to an unsecured portion of the FPGA at a rate of 0.5 bits per second. At this transmission rate, an error rate of 5% to 13% was observed. Other similar thermal-based covert channels have also been presented by Masti et al. [
96] in 2015, by Bartolini et al. [
15] in 2016, and by Tian and Szefer [
159] in 2019.
5.6.2 Voltage-Based Covert Channels.
In 2018, Nguyen [
118] published a thesis where he introduced a voltage-based covert channel. This covert channel showed many similarities to the previously introduced thermal covert channels; however, whereas previous covert channels hid information in timing variations, he hid information in the voltage’s amplitude. In 2019, improving on the work of Nguyen, Gnad et al. [
46] presented a second voltage-based covert channel that introduced modulation into the system. They showed that by using less than 3% or 5% of the surface area of the AMD-Xilinx Kintex-7, they could transfer up to 8 Mbits per second via this covert channel while maintaining an error rate of 0.003%.
5.7 SoC FPGA Devices and Logical Attack Vectors
The complexity introduced by processing systems, peripheral interfaces, and even overengineered security features provides attackers with an extensive range of potential attack vectors. These attack vectors become increasingly relevant as heterogeneous systems are integrated to form SoC devices and network connectivity for embedded systems increases. Among the subjects already discussed, we mentioned how FI attacks on the processing system of a Zynq-7000 could impact the security of the fabric; however, this is just one of many such attacks targeting SoC FPGA devices and other logical attack vectors.
5.7.1 Buffer Overflow Attacks.
We also presented several vulnerabilities of the Zynq-7000 and Stratix 10 in Section
4.2 [
103,
104,
105,
107,
108]. From two of these vulnerabilities, CVE-2021-27208 [
105] and CVE-2021-44850 [
107], a severe exploit is possible on the Zynq-7000. Using these specific vulnerabilities, Schretlen [
137], demonstrates specific exploits that can result in arbitrary code execution on the CPU.
5.7.2 Auto-Decryption Oracle.
Another logical attack was demonstrated on an AMD-Xilinx 7-series FPGA by Ender [
42] in 2020. Ender introduced several weaknesses of AMD-Xilinx 7-series FPGA, which, when put together, can allow for a previously encrypted bitstream to be read out 32 bits at a time in its plaintext form.
5.7.3 Breaking Secure Boot.
A more elaborate scheme by Jacob et al. [
64] in 2017 showed that a hardware Trojan inserted into the bitstream of a Zynq-7000 FPGA could perform data modifications within the system’s external SDRAM. In their proof of concept attack, Jacob et al. decoyed a Trojan within a hardware module that resembled a cryptographic accelerator. They used this hardware Trojan to exploit a vulnerability in the secure boot process of the AMD-Xilinx Zynq-7000 that necessarily forced the device to execute a malicious kernel. After loading the U-Boot into SDRAM, they demonstrated that they could instruct the device to load a non-encrypted malicious kernel by modifying a few lines of this U-Boot’s image. Furthermore, the device required no partition authentication before executing this malicious kernel.
5.8 Summary of Attack Vector Analysis
To conclude the review of FPGA attack vectors, we present a comparison of all previously discussed attacks. Table
5 shows the computation of all five factors of attack potential. SoC FPGA devices and logical attack vectors have been excluded from this comparison, as they tend to apply to specific circumstances and for particular devices. Instead, we created Table
6 to specify applicable targets and configurations that are prerequisites to the attack potential.
In assigning the score shown in Table
5 for bitstream reverse engineering, since a great deal of the work goes into the identification stage of the attack, we attributed more time, experience, and knowledge here. Once the relationship between the bits and the RTL code is known for a given device, exploiting a given bitstream is trivial. Furthermore, this attack will almost always apply as a subsequent stage to one of the other attack vectors. Thus, one should compound the rating for bitstream reverse engineering with the vector used to extract the bitstream.
Turning our attention to the score given to Trojan insertion, both for generic hardware Trojans in Table
5 and for the attack by Jacob et al. shown in Table
6, this score reflects a device where the bitstream has been encrypted. In such cases, the difficulty of the attack is primarily represented in the attempt to exploit the window of opportunity, which must be well guarded and restricted. This window is represented by the knowledge factor, where Common Criteria Methodology [
147] defines a rating of 4 as sensitive information limited by strict need-to-know and specific contracts. In this respect, the insertion of a Trojan along the development chain would require a sensitive level of access. However, if the bitstream is unencrypted, the process is relatively simple.
When it comes to the attacks demonstrated by Schretlen [
137] and the other by Ender [
42], they represent complex attacks that require an in-depth understanding of the FPGA and SoC FPGA device. However, the main difference between the two attacks is that whereas Ender primarily connected the dots between manufacturer documentation and various known vulnerabilities, Schretlen extracted the sensitive and proprietary bootROM from the device and identified key vulnerabilities therein. Hence, this attack’s identification phase gets an unusually high knowledge factor rating. Finally, we find that this attack represents a relatively low attack potential for the exploitation stage since the critical information is now public and a “plug-and-play” payload exists [
137].
6 Applying Controls
So far, we have introduced the attack surface of SoC FPGA devices, presented tools to identify and assess threats from various attacks and vulnerabilities, and then provided a thorough review of attack vectors that apply to FPGA and SoC FPGA devices. During this review of attack vectors, we used the tools introduced to assess the attack classes reviewed. With this information, the next logical step of the IDDIL/ATC methodology defined in Figure
2 is to use this assessment, along with the metrics assigned to known vulnerabilities and identify what controls need to be put in place.
The essential function of a control is to remove, counter, or mitigate a threat. Therefore, when seeking to apply a control, we need to identify which attack vector or vulnerability poses a more significant threat to the system. Thus, to identify (i.e., select) the proper controls, the previously identified threats should first be categorized. A model such as the STRIDE-LM model proposed by Muckin and Fitch [
114] and depicted in Table
7 can be used for this step.
We can sort the threats using the STRIDE-LM model and ensure controls are applied optimally. Controls that fit into these categories will vary; some are manufacturer provided and can be applied by system designers, whereas researchers proposed others that might need to be implemented by FPGA manufacturers. Sections
6.1 and
6.2 provide a review of such controls as presented by FPGA manufacturers and the literature, respectively.
6.1 Manufacturer-Provided Controls
For many identified threats, manufacturer-provided security features will provide a reasonable level of protection. Comparing these features on Intel and AMD-Xilinx devices, we find multiple similarities [
86,
127].
Intel and AMD-Xilinx’s most crucial security feature implemented within FPGA devices is the AES encryption [
119] used to encrypt their bitstreams and boot order files. Most FPGA attacks have focused on this particular security feature. Once one obtains the bitstream’s plaintext format, the only obstacle between an attacker and a victim’s sensitive design is the bitstream’s complexity or security through obscurity. Although FPGA manufacturers go to great lengths to keep their bitstream structure secret, multiple sources have shown that mapping the bitstream to a netlist or even RTL code is not an impossible feat.
A secondary security feature implemented by both Intel and AMD-Xilinx is authentication. Authentication prevents an attacker from uploading malicious bitstream (or malicious partitions in the case of SoC FPGA devices). Manufacturers tend to use HMAC, RSA, ECDSA, or some combination of the three [
14].
Last, FPGA manufacturers usually provide a set of eFuses on their devices to permanently alter certain functionalities. One can disable standard functionalities, including JTAG [
58] access and bitstream readback. JTAG is helpful during the design and debugging of the system; however, it becomes a convenient point of entry for attackers once fielded. If a designer wishes to maintain post-deployment debug capabilities, he can disable its most intrusive function: its ability to read back a programmed bitstream. On the Stratix 10, Intel indicates that this feature is permanently disabled [
62].
6.2 Security Measures from the Literature
In some applications, manufacturer-provided controls might not provide a solution that adequately responds to a given threat. For instance, emerging applications in cloud computing and multi-tenant environments might pose threats that were not fully accounted for, or researchers might have uncovered new vulnerabilities in specific devices. We can turn to the literature in such cases, where multiple researchers have recommended innovative tools and techniques to help apply controls.
6.2.1 Bitstream Reverse Engineering.
When it comes to securing bitstreams against reverse engineering, the first line of defense should be bitstream encryption. As such, efforts to improve cryptographic implementations within FPGA devices should be the main priority; however, in cases where bitstream encryption is not practical or where the device of interest has known vulnerabilities that affect its cryptographic implementation, one should seek techniques for bitstream obfuscations.
Hoque et al. [
54] proposed one such bitstream obfuscation method in 2019. Their method uses unused LUTs to obfuscate critical and non-critical areas of the FPGA design to render their functions and structural properties indiscernible. Furthermore, Hoque et al. also implement redundancies to prevent attackers from attempting to uncover the system’s functionalities via targeted, rule-based, and random tampering.
6.2.2 Side-Channel Attacks.
The logical way to secure devices against SCAs is to reduce side-channel leakage. This reduction will generally take on two different forms: hiding and masking.
Hiding seeks to normalize channel information that attackers could use to recover secret information. For instance, in the case of timing attacks where execution time or other temporal references are used to extract secret information, controls will usually seek to remove the dependence on time by making all operations equally long [
115]. When it comes to voltage-based SCA, Le Masle et al. [
81] propose a countermeasure that monitors power consumption to keep the latter constant.
However, masking seeks to obscure intermediate values by injecting randomness via TRNGs to remove the dependence of side-channel leakage on the secret information we wish to protect [
29,
38]. Although this method has been mainly applied to protect secret keys in cryptographic implementations, a recent work by Dubey et al. [
38] in 2020 demonstrates how one can use masking to protect weight distribution in neural networks.
A uniquely FPGA-related countermeasure to SCA involves using partial reconfiguration to randomize lookup tables within the FPGA fabric. This approach was presented by Sasdrich et al. [
139] in 2015, where they updated a new random S-Box configuration for every encryption sequence. Furthermore, noting the impact of placement and routing on side-channel leakage, other preventive countermeasures pay special attention to the placement of critical modules and the length of wires being routed [
87,
89,
141].
Last, we also find other researchers who have focused on the development of tools and frameworks to evaluate side-channel emissions [
41,
45,
67,
69,
70,
123].
6.2.3 FI Attacks.
Moving on to FI attacks, the most common approach to secure against these attacks involves passive solutions such as redundancies and active solutions such as glitch detectors.
Redundancy is a clear way to prevent faults and can be implemented in several ways. First, a popular approach employs triple modular redundancy designs. These solutions have been studied extensively and implemented in most recent FPGA devices to secure sensitive pathways such as debug access circuits [
8]. Another approach involves redundant algorithms, especially in cryptographic implementations [
35,
93]. Similar to what was described for SCA, we also find applications for partial reconfiguration as countermeasures for FI attacks. This approach was presented by Mentens et al. [
99] in 2008, where the location of the cryptographic engine is randomized to hinder FI attacks. Regarding SoC FPGA, software redundancies should also be implemented for critical security functions such as authentication [
161]. Bypassing a single check is hard, but bypassing two checks is much less likely.
Active controls seek to detect the effects of FIs within the FPGA. Such solutions were proposed by He et al. in 2016 [
51,
52], and in 2017 [
53]. He et al. proposed high-frequency RO watchdogs to detect laser FIs within the FPGA fabric. Similarly, Shen et al. [
144] in 2019 used a delay sensing circuit within the FPGA for detection. Upon detection, they tried several response mechanisms that sought to delay the rising edge of the system’s clock. In this manner, they would mitigate the transient’s effect on the system. Other approaches that specifically attempt to identify and locate malicious tenants in multi-tenant application were proposed by Provelengios et al. [
131] in 2019 and Mirzargar et al. [
101] in 2020.
6.2.4 Probing Attacks.
Where physical access is possible, an attacker with sufficient means will, in most cases, be able to extract information directly from a device via electrical or optical probing attacks. The best we can do is make it as hard as possible for an attacker to extract valuable information; a few ways of doing this exist.
Tajik et al. [
155] propose a countermeasures to their optical probing attack. The countermeasure proposed is a PUF-based security monitor. PUFs rely on the physical characteristics of a given device to generate a unique signature or fingerprint. Tajik et al. propose using an RO PUF whose defining physical characteristics are known. Their experiments showed that attempts at optically probing the internal circuit would impact characteristics of the PUF in a way that could be successfully detected with high probability.
6.2.5 Hardware Trojans.
Securing a device against hardware Trojans is mostly about knowing and understanding what composes your design. The best and only foolproof way of securing against hardware Trojans is to control the entire development chain. From this controlled development chain, one can then put authentication and integrity controls in place using secret keys, watermarks, or PUFs to ensure no changes are made to the design. However, this quickly becomes impractical as complexity increases, and there is a need to implement higher levels of abstraction through third-party building blocks. Hence, requirements for Trojan detection become apparent.
Most Trojan detection techniques found in the literature have focused on validating the authenticity of a given design based on defining characteristics. For instance, the work of Söll et al. [
154] in 2014 used EM emissions to detect discrepancies between a legitimate design and its FPGA implementation. Similarly, more recent works, such as that of Danesh et al. [
32] in 2021, have taken an approach akin to what is used in software security by stepping back into a design through the bitstream reverse engineering. After decomposing their designs, they can use various techniques to identify hidden malicious circuits. Another work of interest by Chithra et al. [
28] in 2020 used machine learning to detect Trojans. They showed that they could detect Trojans based on temperature and voltage values obtained from different standard benchmarks of the AES encryption algorithm.
Furthermore, Krachenfels et al. [
76] in 2021 proposed using laser-assisted optical probing to awaken dormant Trojans within an FPGA fabric. This technique could help uncover Trojans that have successfully bypassed the EDA’s design check rules.
Other security measures against Trojans include isolation and segmentation to limit the impact of malicious hardware within the FPGA fabric. Taking the attack by Jacob et al. described in Section
5.7.3 as an example, if the HPS had configured ARM’s TrustZone prior to programming the FPGA, then this attack would not have been possible. However, as described in Section
5.5, TrustZone is not infallible. As such, some researchers have begun looking at new techniques for trusted execution environments better adapted to FPGA devices [
12,
136,
167]. For instance, Ren et al. [
136] proposed a scheme that uses remote attestation to validate hardware accelerators programmed within FPGA devices. They showed how their scheme could validate accelerators deployed on remote cloud infrastructures.
7 Future FPGA Research and Development
Cybersecurity is a continually evolving field of research. New vulnerabilities arise as new systems are developed, and new unknown attack vectors are formed as new techniques are integrated. Consequently, identifying, prioritizing, and mitigating cybersecurity threats requires a thorough methodology to ensure systems are designed securely. This need is especially true regarding FPGA and SoC devices in the Internet of Things and operational technology, where these devices must be designed to operate independently for extended periods. In such circumstances, physical security and expedient security updates are not necessarily applicable. Furthermore, although traditional cybersecurity is well established in information systems, these methodologies do not necessarily translate very well to embedded systems.
Keeping to the particularities of FPGA and SoC FPGA devices, this article has reviewed methodologies and techniques for secure embedded system design. First, we showed how knowledge of the attack surface could be leveraged to provide the necessary input to subsequent phases of the cybersecurity analysis. Second, we performed a detailed literature review of applicable attacks and vulnerabilities. Third, we used attack potential and the CVSS score to assess their threat. Finally, we showed how controls could be selected to respond to identified threats by leveraging the STRIDE-LM model. Throughout this review, we have drawn from works focused on traditional cybersecurity and extended their approach to the world of embedded systems. Although our approach provides a basis from which to build, it is up to the scientific community, developers, and manufacturers to take up and improve on this work.
Speaking on improvements, since our primary focus has been on reviewing the works of past cybersecurity researchers, drafting a cybersecurity methodology, and gathering and testing the necessary tools, we have but touched the surface of what needs to be accomplished to define the cybersecurity approach fully. In the future, we note several areas of research and development that can help augment cybersecurity in FPGA and SoC FPGA devices:
•
Cybersecurity assessments: Ideally, the assessments featured in Table
5 would be broken down to specific devices, targeting specific assets. As such, there is a need for further research to test and evaluate these attack vectors as they apply to specific devices. Manufacturers should accomplish this via rigorous penetration-testing experiments to help designers choose the right device for their application. A more productive approach, however, would be for manufacturers to deliver open designs that cybersecurity researchers can scrutinize, evaluate, and improve. Many leaps in cybersecurity are made by curious individuals who will stop at nothing to find a loophole in a system. These exposures allow users and developers to adapt their systems and ensure their sensitive information remains safe.
•
Applied AI: Table
5 shows that hardware Trojans and covert channels share one of the weakest attack potentials. Although an apparent safeguard for Trojans is to control the entire development chain, this can quickly become impractical. Therefore, we must ensure that we can detect malicious circuits inserted inconspicuously within our FPGA designs. Understanding that our FPGA designs will only increase in complexity, this detection could greatly benefit from applied AI techniques to facilitate detection. Prospectively, AI techniques for controls that apply to other attack techniques also need to be researched.
•
Isolation and segmentation: Multiple attacks described in Section
5 occurred due to lateral movement or elevation of privilege. A notable example is the use case of FPGA devices in cloud computing, where power distribution networks are used to perform side-channel and FI attacks and to establish covert channels. Furthermore, when it comes to complex systems such as SoC FPGA devices, their large attack surface gives rise to multiple entry points for attackers, who can then freely access the sensitive designs within the device’s configuration memory. To augment security, manufacturers need to privilege practices of isolation and segmentation in their devices.
8 Conclusion
This literature review has provided an up-to-date outlook on cybersecurity issues affecting FPGA and SoC FPGA devices and introduced a strategy to help developers effectively apply controls to their systems.
Based primarily on the IDDIL/ATC methodology presented by Muchin and Fitch [
114], our strategy sought to define a generic threat model that SoC FPGA developers could adapt for specific architectures and operational environments. Having studied the architectures of the AMD-Xilinx Zynq-7000 and Intel Stratix 10, we observed multiple similarities between these two competing devices. However, what implementation differences there are can drastically affect how we apply controls. This factor becomes increasingly important when we transition from FPGA to SoC FPGA devices. One cannot overlook the added complexity of the HPS and the level of access it grants.
In tackling potential attack vectors, we meticulously reviewed the work of researchers who have demonstrated both active and passive attacks that can break the confidentiality, integrity, and availability of FPGA and SoC FPGA devices. Although manufacturers provide passive security measures that one can easily apply to their FPGA and SoC FPGA devices, these do not translate into foolproof systems. Physical security and active security measures will play a significant role in protecting the device from malicious actors. Developers must carefully study known cybersecurity issues and validate how they apply to their attack surface. From here, developers should use controls based on the threat posed by identified attack vectors and threat actors.