# Multi-Tenant Cloud FPGA: A Survey on Security MUHAMMED KAWSER AHMED\*, University of Florida, USA JOEL MANDEBI, University of Florida, USA SUJAN KUMAR SAHA, University of Florida, USA CHRISTOPHE BOBDA, University of Florida, USA With the exponentially increasing demand for performance and scalability in cloud applications and systems, data center architectures evolved to integrate heterogeneous computing fabrics that leverage CPUs, GPUs, and FPGAs. FPGAs differ from traditional processing platforms such as CPUs and GPUs in that they are reconfigurable at run-time, providing increased and customized performance, flexibility, and acceleration. FPGAs can perform large-scale search optimization, acceleration, and signal processing tasks compared with power, latency, and processing speed. Many public cloud provider giants, including Amazon, Huawei, Microsoft, Alibaba, etc., have already started integrating FPGA-based cloud acceleration services. While FPGAs in cloud applications enable customized acceleration with low power consumption, it also incurs new security challenges that still need to be reviewed. Allowing cloud users to reconfigure the hardware design after deployment could open the backdoors for malicious attackers, potentially putting the cloud platform at risk. Considering security risks, public cloud providers still don't offer multi-tenant FPGA services. This paper analyzes the security concerns of multi-tenant cloud FPGAs, gives a thorough description of the security problems associated with them, and discusses upcoming future challenges in this field of study. CCS Concepts: • Hardware $\rightarrow$ Reconfigurable logic and FPGAs; • Security and privacy $\rightarrow$ Systems security. Additional Key Words and Phrases: Cloud, datacenter, FPGA, virtualization, security, multi-tenant #### **ACM Reference Format:** #### 1 INTRODUCTION Due to increased performance, computation, and parallelism benefits over traditional accelerators such as GPUs, FPGAs are being integrated into the cloud and data center platforms. For the last few decades, technology market had an rising demand for high-speed cloud computation. Commercial cloud providers started using FPGAs in their cloud, and data centers permit tenants to implement their custom hardware accelerators on the FPGA boards over the data center. The integration of FPGAs in the cloud was followed after Microsoft has published its work on Catapult in 2014[35]. Since then, it has become a niche technology for cloud service platforms, and major cloud provider giants, e.g., Amazon[6], Alibaba[15], Baidu[4], Tencent[16], etc., have integrated Authors' addresses: Muhammed kawser Ahmed, muhammed kawsera@ufl.edu, University of Florida, , Gainesville, FL, USA, 32603; Joel Mandebi, jmandebimbongue@ufl.edu, University of Florida, , Gainesville, FL, USA, 32603; Sujan Kumar Saha, sujansaha@ufl.edu, University of Florida, , Gainesville, FL, USA, 32603; Christophe Bobda, cbobda@ufl.edu, University of Florida, , Gainesville, FL, USA, 32603. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. XXXX-XXXX/2022/9-ART \$15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn FPGAs into their platform. In this forecast, global FPGA market is projected to achieve 9.1 billion market space by 2026 following a compound annual growth (CAGR) of 7% [9]. For computationally intensive workloads like artificial intelligence, image and video processing, signal processing, big data analytics, genomics, etc., users can exploit FPGA acceleration in cloud platforms [103]. FPGAs offer unique advantages with traditional CPU and GPU in terms of computation and flexibility. We explain these features with four concrete examples. 1. Microsoft Bing search engine experienced 25 percent latency reduction and 50 percent increase in throughput in their data centers. [35]. 2. Using Amazon AWS FPGA F1 instance, the Edico Genome project[10] has over ten times speed and performance increase for analyzing genome sequences. 3. Introduction of Xilinx Versal FPGAs for real-time video compression and encoding in the cloud platform have significantly reduced the operating cost by reducing the encoding bitrate by 20%, [52]. and 4. According to this survey [18], for state-of-the-art neural network implementation, 10x better speed and energy efficiency was gained using FPGA accelerators in data centers. For maximum utilization, a single FPGA fabric has been proposed to be shared among the cloud users by exploiting and leveraging the partial re-configurable characteristics of FPGAs[38], [26], [33]. This crucial property of FPGA allows reconfiguring any region in a deployed environment. Also, another key property of FPGA is the ability to allow reconfiguration of a part or region in runtime conditions. This notion of sharing the FPGA fabric resources among multiple tenants or users is called as multi-tenancy. FPGA sharing/partitioning could be established in two different ways: temporally and spatially. In temporal sharing, the entire FPGA fabric is allocated to users/tenants over different scheduled time slots. FPGA fabric can be simultaneously divided into multiple logically isolated regions and allocated to different tenants. This sharing technique is referred to as spatial multiplexing and is mostly investigated by academic researchers. Industrial cloud providers avoid spatial sharing due to security and privacy concerns. Even though enabling FPGAs in clouds improves performance significantly, research indicates that multi-tenant FPGA platforms have multiple security risks and concerns. FPGAs allow tenants to implement custom designs on the fabric, opening multiple attacks surfaces for the attackers, unlike CPUs and GPUs. Most of the common threats that has been going on against cloud FPGA are quite similar to threats that has been observed in the server based cloud computing. Beside this attacks, multi tenancy can lead for some of the attacks that is related to hardware surface, such as side channel attacks [79] where sensitive information of the hardware surface is stolen by invasive/non invasive probing, or covert channels creation in which attacker can create a hidden channel between each other to transmit confidential information [93], [45]. Malicious attackers can also launch Denial of Service (Dos) in any layer of the cloud system including user or infrastructure level [96]. One of the major attacks beside this also uploading corrupted or malicious bitstreams uploaded to the FPGA fabric which can lead to shutdown or faulty results [49]. In hardware level structure, unlike software based attacks cloud FPGA based attacks includes mainly hardware fabric intervention and manipulation whereby the malicious contribution can lead to to short circuits fault, performance degradation or shutdown [32]. Currently, major cloud FPGA vendors such as AWS are already offering single tenant bitstream deployment. Whereas, multi-tenant cloud scope are being under active research. In context of multi-tenant cloud FPGA, the security risk is more severe and intense as the single FPGA is shared among different users which expose the hardware surface more widely. As a result, this security concerns should be the top priority of the researchers and as multi-tenant cloud FPGA in the future will expand due to immense pressure of migrating to cloud platforms. Through out the paper we tried to cover five fundamental key concepts about the multi-tenant cloud FPGA security. First, in Section 2 we tried to introduce about cloud FPGA multi-tenant concepts and related topic backgrounds. Section 3 discuss about the detailed cloud FPGA infrastructure and possible deployment models. Categorized threat models were introduced in Section 5 and in later section current established and proposed attacks were described. Implemented countermeasures and mitigation's approaches were summarized in Section 7. We also discussed some possible research scopes in context of multi-tenant cloud FPGA, which are previously implemented in FPGA SoC but not implemented yet in cloud FPGA. *Organization.* We introduced a thorough understanding of fundamental subjects in multi-tenant cloud FPGA research, such as FPGA, cloud computing, virtualization, and OS support in Section 2. We discussed the various methods for enabling FPGAs in the cloud in Section 3. Section 4 provided a summary of the industrial development and current deployment models of cloud FPGA. We created a section called 5 where we categorized the threat models in order to better identify the potential attack surfaces. In section 6, we compiled a substantial quantity of multi-tenant FPGA attacks research and illustrated their proposed building blocks of attack circuits. The various defenses and methods against the suggested attacks were covered in the following section 7. In Section 8, we expressed our opinions on virtualization security risks and future challenges. #### 2 BACKGROUND #### 2.1 FPGAs Field Programming Gate Arrays (FPGAs) are re-programmable integrated circuits that contain an array of finite programming blocks. FPGAs provides flexibility, reliability, and parallelism compared to the traditional CPU architectures. Unlike CPUs, FPGAs have massive parallelism in performing computation. In Fig. 2 a FPGA architecture block was illustrated which includes of configuration logic blocks (CLBs) surrounded with programmable switch boxes (SBs), and data input/output (I/O) blocks. CLBs are the fundamental logic building blocks in FPGA to perform any arithmetic function in digital logic perspective. CLBs are placed in a array of bus interconnects which are mainly controlled by programmable switch-boxes. Switch-box controlled the routing of wires around CLB blocks and hence provide a re-configurable computing platform. Every CLBs contains three fundamental digital logic based blocks lookup tables (LUTS), flips-flops and multiplexer which are used for implementing arithmetical logic function or storing data. The I/O blocks ensures bidirectional data communication between FPGA board interface and connected peripherals and external devices. Fig. 1. FPGA architecture has three fundamental digital logic components for performing boolean arithmetic functions. They are : 1) Configuration Logic Blocks(CLB), 2) Switch-box(SBx) and 3) Input/output (I/O) cells. Every individual CLB contains three major logic blocks: Look Up Table (LUT), a Mux and flip-flop (ff). #### 2.2 FPGA Design Flow Through the first 20 years of the FPGA development period, hardware description languages(HDLs) such as VHDL and Verilog have been used and evolved to implement a circuit in FPGAs. HDL languages demand a deep understanding of underlying digital hardware. However, the improvement and development of high-level-synthesis (HLS) design tools e.g. Vivado HLS, Labview, Matlab, and C++, has added a new dimension in the FPGA design flow and can abstract graphical block or high-level code to equivalent low-level hardware circuitry. Without achieving detailed understanding of low level hardware circuitry design, any designer can start an HLS language tool that will compile and interpret the HLS circuitry to low-level Register Transfer Level (RTL) abstraction. Different synthesis tools are used in the FPGA design pipeline to synthesize and translate HDL codes into a netlist after they have been evaluated and simulated for behavioral accuracy and intended functionality (Xilinx, Intel Quartus, Synopsys etc.). A complete route of component connections is provided in the netlist, which comprises a description of all FPGA block connections (LUTs and registers). The components of the FPGA board are then uploaded to the FPGA board by the synthesis tool after being transformed into a bitstream binary text file. Often designers write test benches with different HDL languages, which will wrap the design code and exercise the FPGA implementation by asserting inputs and verifying outputs. Generally, in FPGA design flow, designer can simulate and test the design in three different phases: pre-synthesis, post-synthesis, and post-implementation. This whole simulation and verification phases often require more time than designing the HDL block itself. Fig. 2. FPGA design flow diagram. After high-level abstraction of desired hardware design to low-level hardware circuitry, hardware block design is simulated and synthesized using different synthesis tools (Xilinx, Intel, etc.). Next, synthesis tools generate the netlist containing the mapped design's detailed route and finally upload it on a physical FPGA board. ### 2.3 Partial Reconfiguration The configuration and hardware logic layers are two separate levels that make up an FPGA fabric. Lookup tables (LUTs), flip flops, memory blocks, and switches listed in Section 2.1 are all parts of the hardware logic layer. The configuration memory layer stores a binary file with all the hardware circuitry information. All configuration chores, such as recording the values of LUTs, flip-flops, and memory, managing the voltage levels of input/output pins, and routing data for interconnections, are stored in this binary file. An individual can reload the whole functionality of the FPGA board by uploading a fresh bitstream file onto this configuration memory. Bitstream transfers to FPGA are often made using JTAG or USB devices. Partial Reconfiguration (PR) enables the modification of a specific FPGA region dynamically while the rest of the FPGA region continues to run without interruption. According to the functional specifications, an FPGA fabric can be configured and partitioned into two different regions: static and dynamic. In the dynamic region, the configuration memory of the FPGA fabric is uploaded with partial bitstream without altering the functionalities of the other region, which introduce flexibility and improve resource utilization. While in the static region, the whole bitstream of the FPGA fabric is reconfigured and reuploaded, removing the previously configured bitstream. # 2.4 Cloud Computing The term "Cloud" is coined to indicate the technology *Internet* and it refers to an internet-based computing platform where various online services, including servers, storage, and websites, are supplied to the clients' PCs via the internet. Cloud computing takes advantage of a cloud network's physical resources (software, servers, and layers) being accessible everywhere. These materials are shared with cloud customers by cloud provider vendor companies, according to [69]. For particular customers, cloud services typically establish a virtual machine with their own addresses and storage. The goal of virtualization is to separate the resources (hardware and software) into various virtual servers, each of which functions as a separate, independent server. The accessibility of virtual servers through the internet via various connectivity platforms is one of the advantages of cloud computing. # 2.5 Cloud Computing Architecture One of the most extensive and understandable explanations of cloud computing is defined by the National Institute of Standards and Technology (NIST). This explanation illustrate the platform's by three service models, four deployment models, and five key attributes [69]. The five key attributes are: 1. **Self-service on demand** (These computing resources [storage, processing ability, virtuality, etc.] can be accessed and used instantly without any human interaction from cloud service providers.) 2. Broad network access (These computing resources can be accessed from heterogeneous devices over the internet, such as laptops, mobiles, IoT devices, etc.) 3. Resource pooling (Cloud computing ensures that multiple users pool the cloud resources over the internet. This pooling is called multi-tenancy, whereas, for example, a physical internet server can initiate the hosting of several virtual servers between different users), 4. Rapid elasticity (Rapid elasticity creates a way of automatically requesting additional space in the cloud or any other services, leading to providing scalable provisioning. In a sense, these characteristics make cloud computing resources appear infinite.) and 5. Measured service (The total resource used in the cloud system can be metered using standard metrics.). According to the NIST definition, each cloud provider offers users services at various levels of abstraction, or service models. These are the most typical service models: (1) **Software as a Service (SaaS)**: In the SaaS service model, consumers only have limited admin control of running and executing provider cloud applications and services on cloud infrastructure. Cloud applications can be executed and accessible from various client devices over the internet. The most well-liked service model is SaaS. The most popular SaaS service products include Gmail, Slack, and Microsoft Office 365. A SaaS model could be very effective to achieve low-cost and flexible management platform. [89] whereas public cloud provided in background manages software layer executions. (2) Platform as a Service (PaaS): The customer has the authority to install applications and to develop the libraries, services, and tools the provider supports under the PaaS service model. The cloud platform's fundamental foundation architecture is not managed or controlled by the user. For the development of their own applications, users can employ a variety of PaaS platforms. Examples of PaaS providers are Google App Engine, Amazon Web Services (AWS), Microsoft Azure, etc. (3) Infrastructure as a Service (IaaS): The IaaS model is a low-level abstraction service, in contrast to the PaaS model, that gives users additional freedom for resource pooling of the deployed software's resources. [69]. Google Cloud, Amazon Web Services and Openstack are the three biggest IaaS providers. Openstack [20] is an open-source-based leading cloud management platform used to manage IaaS service elements (storage, networking, and I/O). OpenStack enables a cloud user to easily add servers, storage, and networking components to their cloud. Many FPGA virtualization systems are developed and implemented in cloud and data centers by utilizing OpenStack framework [34]. Fig. 3. The main three cloud service models are: Saas, PaaS and IaaS. SaaS model allows limited admin control of running and executing on cloud infrastructure. PaaS streamlines and reduces the cost of creating and delivering apps. IaaS is a cloud service paradigm that provides full control over cloud infrastructure. #### 2.6 Virtualization Virtualization technologies are now widely used and significant in the cloud computing and big data industries. It has produced various advantages, including flexibility, independence, isolation, and the capacity to share resources. In general, virtualization is an effective method of dividing up computing resources among several users into virtual abstractions. Virtual Machine Monitor (VMM) or Hypervisor is a piece of control software used in virtualization technologies that oversees resource virtualization for all of the virtual machines in the underlying hardware system. Running various OS in a virtual platform instance called Virtual Monitor is the most prevalent example of virtualization (VM). The term "FPGA virtualization" describes the abstraction of FPGA resources and their distribution among numerous users or tenants. In virtualization, an overlay architecture, also known as a hardware layer, is placed on top of the FPGA fabric. Multi-tenancy, isolation, flexibility, scalability, and performance are the goals of FPGA virtualization. 2.6.1 SHELL. One of the most important parts of FPGA virtualization is defining the SHELL, and ROLE abstraction, which is described as **SHELL SHELL Architecture SRA** in many literature's [82]. Inside an FPGA fabric, the shell is a static region that contains some pre-defined and pre-implemented deployments. Usually, the shell region contains three basic elements: a. Control Logic b. Memory Controller (DMA Controller) and c. Network Controller (PCIe). The primary role of the SHELL region is to provide necessary control logic and I/O details so data can be easily exchanged between the user and FPGA fabric. SHELL also provides isolation of core system components and hence provides security guarantees. The shell typically receives one-fourth of the DRAMs. One of the four DDR4 DRAM controllers utilized in Amazon EC2 F1 instances is implemented in the SHELL region. Fig. 4. Architecture for SHELL and ROLE region. The SHELL area houses the three main shared FPGA fabric building blocks (PCIe, network, and memory), which are crucial building blocks for the tenants. Shell region is isolated and secured from rest of the ROLE region. The re-configurable block known as the ROLE region houses inhabited tenants. - 2.6.2 ROLE. The ROLE section of the FPGA is called the dynamic or re-configurable region, which the user at runtime configuration can map. Bitstream mapping of the FPGA fabric can be launched either fully or partially. The role region provides elasticity and flexibility, which helps achieve higher resource utilization and improved performance. In Fig. 4 a SHELL ROLE architecture was described, indicating the isolated regions for partial reconfiguration of FPGA block. - 2.6.3 Operating System(OS) Support. For FPGA virtualization, some software applications or a OS must be deployed. Abstraction of FPGA fabric is generally available by the software stacks or a OS. Tenants can create and deploy applications into the FPGA fabric using software stacks and operating systems without having a deep understanding of the underlying FPGA hardware. The most popular software stack for FPGA virtualization is OS since the concept of virtualization is derived from the basis of OS. However, there is currently no well-established operating system that addresses OS abstraction for reconfigurable computing systems. The literature has proposed some architecture to provide OS virtualization in cloud FPGA platforms using Linux and Windows. Besides Linux and Windows, some specialized OSs is also designed to exploit FPGA fabric's reconfigurable nature. FPGA-based cloud virtualization does not abstract core logical resources like LUTs, block memory, or flip-flops into many virtual FPGA instances, in contrast to software-based virtualization where the virtual machine monitor starts several virtual machine (VM) instances. Instead, the FPGA cloud virtualization aims to support of abstracting the FPGA hardware layer for emulating OS instructions. Custom OSs proposed for FPGA virtualization follow the paravirtualization method, where mostly Linux kernels are modified to support FPGA hardware abstraction. FPGA based operating system are generally divided into two categories: 1) Embedded Processor OSs and 2) Re-configurable CPU . Embedded processor OSs developed for FPGA platforms normally operate on the embedded processor integrated into the same FPGA SoC. Re-configurable CPU are modified OSs that run directly on FPGA hardware fabric and emulate OS instructions even though the underlying FPGA architecture is different from traditional CPUs. *Embedded Processor OS.* HybridOS [66] is a re-configurable accelerator framework that is developed by modifying the Linux kernel and implemented in the embedded processor inside of Xilinx Virtex II SoC. **Re-configurable CPU**. BORPH [5], modifies the Linux kernel to execute the FPGA process as a user FPGA application. The modified Linux kernel can abstract the conventional CPU instruction platform in the FPGA fabric as a hardware process. In a manner similar to a traditional processor-based system, tenants can execute system calls and perform necessary operating system functions. BORPH abstracts the memories and registers defined in the FPGA using the Unix operating system as a pipeline. Unlike BORPH, FUSE [61] leverages a modified kernel module to support FPGA accelerators in the form of tasks instead of a process. In order to decrease data transmission latency between HW and SW operations, it also makes use of a shared kernel memory paradigm. ReconOS [23] introduces the multi-thread abstractions of software programs and standardized interface for integrating custom hardware accelerators. Like FUSE and BORPH, Recon uses the modified Linux kernel to develop this framework. At a high level, Feniks [102] abstracts FPGA into two distinct region: OS and multi-application regions, both of that are provided to software applications. To effectively connect with the local DRAM of the FPGA, the host CPU and memory, servers, and cloud services, software stacks and modules are present in the OS region. In addition, Feniks provides resource management and FPGA allocation using centralized cloud controllers that run on host CPUs. Like an OS, LEAP [44] provides a uniform, abstract interface to underlying hardware resources and is similar to other proposed architectures. AmorphOS's OS [3], divides the FPGA region into a small fixed-size zone, a.k.a morph lets, which provides the virtualization of the FPGA fabric. AmorphOS performs the sharing of the hardware tasks by either spatial sharing or the time-sharing schedule method. #### 3 CLOUD FPGA INFRASTRUCTURE AND DEPLOYMENT MODEL FPGAs have mostly been used in the verification phase of ASIC designs over the previous ten years, where the ASIC design was implemented for validation and verification phases before it was actually produced. Additionally, specialized markets and research programs had some other applications. However, FPGAs are gaining popularity as an alternative for CPUs and GPUs due to high performance processing and parallelism. FPGA boards are both available on the market today as supported devices that may be connected by PCIe ports or as part of the same System-on-Chip (SoC). Recent trends indicate that the integration of FPGA is growing exponentially in cloud platforms to provide tenants with designing and implementing their custom hardware accelerators. There are typically four basic methods for deploying FPGA boards in cloud data centers. Fig 5 shows the different FPGA deployment models on the cloud. ### 3.1 Co-processor In the first approach, FPGA is considered a co-processor. FPGA and CPU are located in a same node in a data center and can be accessed by the PCIe network. However, the total count FPGA Fig. 5. FPGA deployment models in the cloud. a) In the co-processor model, PCIe connections are used to connect FPGA boards to CPUs in data centers. b) In the SoC model, the FPGA and CPU are mounted on a chip die. c) The bump-in-the-wire concept uses FPGAs in the data centers, which tenants can access via NIC protocols. boards in the data center is proportional to the total number of CPUs, and the FPGA board cannot run independently. Xilinx introduced the first CPU+FPGA integration in 2013 for embedded devices named a Zynq SoC [22]. This SoC platform is integrated with ARM cortex processor and FPGA programming block in the same die. Removing communication ports to the CPU reduces latency and increases overall performance. The AXI-based communication protocol introduced the communication between FPGA and CPU. In year 2014, a project lead by Microsoft implemeted the idea of integrating FPGA and CPUs in a datacenter named as Catapult [?]. In this project, Microsoft aimed at augmenting CPUs with an interconnected and configurable programming FPGA block. The project was first used as a prototype in the Bing search engine to identify and accelerate computationally expensive operations in Bing's IndexServe engine. The experimental project was very successful and generated outstanding results while increasing a dramatic increase in the search latency, running Bing search algorithms 40 times faster than CPUs alone [39]. These results pushed Microsoft to extend the other web services. In 2015, Intel acquired Altera and its (Xeon+FPGA) deployment model that integrates FPGAs along with Xeon CPUs. In the backend, Intel introduces a partial reconfiguration of the bitstream, where static region of FPGA is allocated. This reconfiguration is referred to as blue bitstream as it can load the user bitstream (named as a green bitstream) into the re-configurable blue region. This interaction is handled by an interface protocol called Core Cache Interface (CCI-P). Amazon announced its affiliation with Xilinx for accelerated cloud technology using FPGA in 2016. This project was controlled under an AWS shell module where the user logic was configured dynamically in the FPGA hardware. Tenant are provided with amazon-designed software API shell to avoid any security damanges. In the last recent years, Baidu [4], Huawei [21], Tencent [16], and Alibaba also started the recent trends of integrating FPGA and CPU. #### 3.2 Discrete FPGA board can be also deployed independently as an individual separate component which eliminates the ncessity of deploying CPU along with FPGA boards. This discrete approach considers deploying the FPGA board as separate standalone component. This setup is independent from the CPU and FPGA board is directly connected to the network. For example, NARC [2] is a standalone FPGA board which is connected through the network and capable of performing high computational workloads and network tasks. By using OpenStack platform, Smart Applications on Virtualized Infras-tructure (SAVI) project [7] deployed a cluster of discrete FPGA boards that can communicate privately in a closed network. IBM announced cloudFPGA project of accommodating of 1024 FPGA boards in total to a data-centre rack, in 2017 [1]. IBM deploys these FPGA racks as stand-alone resources for hardware and avoids the common way of coupling FPGA boards with CPUs in datacentres. IBM standalone setup has shown a increase 40x and 5x in latency and output paramater [1]. # 3.3 Bump-in-the-wire Bump-in-the-wire model refer to a setup where FPGAs are placed on a server between the Network Interface Card(NIC) and the network. This allow FPGAs to communicate directly over the network and process data packets receiving from different users through the internet. Bump-in-the-wire architectures experienced a dramatic reduction in latency as layers between the communication path are reduced. Exposing the FPGA resources over the network have unique benefits of providing offload computation without interacting with CPUs and GPUs. Users can directly send packets to the servers which is processed by the routers and later forwarded to the destined FPGA board using software defined network(SDN) protcols. The famous Microsoft Catapult has followed the bump-in-the-wire deployment model to connect their FPGA boards in the top-of-the-rack switch (TOR) rack. # 3.4 System-on-chip (SoC) SoC FPGA devices integrate microprocessors with FPGA fabric into a single board. Consequently, they provide higher reliability and reduced latency and power consumption between the processor and FPGA. They also include a rich set of I/O devices, on-chip memory blocks, logic arrays, DSP blocks, and high-speed transceivers. Currently, there are three families of SoC FPGAs available on the market by Intel, Xilinx, and Microsemi [62]. The processors used in the FPGA SoC have fully dedicated "hardened" processor blocks. All three FPGA vendors integrated the full-featured ARM® processor and a memory hierarchy internally connected with the FPGA region. Integrating ARM processors and FPGA blocks on the same piece of a silicon die significantly reduces the board space and fabrication cost. Communication between the two regions consumes substantially less power than using two isolated devices. # 4 INDUSTRIAL EVOLUTION OF PUBLIC CLOUD FPGA ACCELERATORS In recent years, public cloud providers offering FPGA boards to users/tenants in their data centers. Users or tenants normally go for pay-per-use to access FPGA resources, and the control is assigned to users for a specific time slot. Tenants can speed up their application performance by designing custom hardware accelerators and implementing them in their assigned region. Amazon AWS[6], Huawei Cloud[21], and Alibaba Cloud [15] has started offering rent of the Xilinx Virtex Ultrascale+architecture in their cloud platform. Xilinx Kintex Ultrascale is offered by Baidu [4] and Tencent Cloud [16]. Alibaba and OVH are currently offering Intel Arria 10 FPGA boards on their cloud. Xilinx Alveo accelerators are available through the Nimbix cloud platform [19]. Normally, a cloud providers host several FPGA boards on their servers. Hypervisor of the server assigned each FPGA board to a single tenant. Still now, the offerings from the major cloud providers can be divided into three main properties: - Most providers would provide user permission to upload any synthesized netlist into the FPGA board, including some limitations and design rule checks. - Some providers would be given the advantage of working in a high-level synthesis (HLS) platform instead of uploading RTL netlist. - All public cloud providers offering FPGA cloud accelerations services follow the co-processor deployment model (Section 3) in their data center servers. Bump-in-the-wire and System-on-chip(SoC) is not available today in public clouds. Because most cloud providers today have similar architecture and naming conventions to AWS, we have dedicated a subsection for the AWS EC2 F1 platform. Fig 6 shows the AWS EC2 F1 architecture and deployment model. # 4.1 Amazon Web Services (AWS) EC2 F1 Since 2016, In order to provide single-tenant FPGA cloud access, AWS has continued to offer FPGA-accelerated multi-tenant cloud services. These FPGA-based cloud services, also known as AWS F1 service, includes three model instances: f1.2xlarge, f1.4xlarge, and f1.16xlarge. This naming convention represents twice the number of FPGA instances present in the module. For example, f1.2xlarge refers to 1 FPGA board, 4xlarge refers to 2 FPGA board instances, and 16xlarge refers to total 8 FPGA boards. Every virtual machine model instance contains 8 CPU cores (virtual), 122 Gib of memory DRAM, and storage limit of 470GB(SSD). Four DDR4 DRAM chips that each FPGA board can directly access make up the AWS F1 instance architecture. Each FPGA instance board may get up to 64GiB of RAM in total. A PCI Gen 3 bus connects the FPGA card to the server. Users cannot upload their unique hardware design directly to the FPGA in AWS. AWS uses the tools from Xilinx to create bitstreams in its own way. After completing all Design Rule Checks (DRCs), AWS generates the final bitstream (Amazon FPGA Image - AFI) [6]. Fig. 6. Amazon AWS EC2 F1 instance architecture. After bitstream checking, generated netlist (AWS FPGA Image) is uploaded in the FPGAs and respective software program (Amazon Machine Image) is uploaded in the CPU. #### 5 THREAT MODELS We divide the threat models into four categories in order to better comprehend the potential risks to cloud FPGA users: (1) Intra tenant adversaries - Trojans (2) Inter co-tenants adversaries, and (3) Inter-node adversaries and (4) Malicious network traffics. # 5.1 Intra Tenant Adversaries (Trojans) Modern hardware design flow involves multiple designs and verification stages. Even in the design stages, many combinations of software and hardware blocks (referred to as Intellectual IP") consist of different levels of the highly optimized environment. Hardware IPs are heavily customized in various sections according to the necessity of fields like Signal Processing, Video, and Image Encoding, Quantum and Artificial Intelligence Computations, etc. Also, no single companies are responsible for developing the whole complex IP components. Rather, hundreds of third-IP designer companies and IP Integrators are involved in the entire FPGA hardware design process. This complex design and integration process opens a door for several security attacks on a target FPGA fabric [92]. In hardware design trends, designer tenants use existing third-party IPs by providing licensing fees to cut the cost of designing complex IPs. Often non-trusted third-party IP cores or EDA tools are integrated into different stages of FPGA design life and are susceptible to numerous attacks, such as HT injection, IP piracy, cloning, and tampering [100]. # 5.2 Inter Tenants (co-tenant) Adversaries In a multi-tenant spatial sharing model, users can have a stake in a large computation resource pool, leading to numerous resource-sharing threats. Even though the allocated tenants are logically isolated, they are located in the same hardware fabric and share the same resources. These vulnerable resources include the shared DRAM, PCIe, and Power Distribution Network (PDN). A malicious adversary can exploit these advantages to launch a variety of attacks. In research [45], [46], the cross-talk and covert channel communication has been extensively explored between logically isolated co-tenants. A large power-hungry adversary tenant could potentially pull over the entire FPGA board in extreme cases. [29]. A malicious co-tenant could also cause temperature and voltage fluctuations and affect other users by introducing row hammer attacks on shared DRAM [96]. # 5.3 Inter Nodes (co-nodes) Adversaries Since FPGAs have shared DRAM access to multiple tenants, adversaries could introduce Rowhammer-style attacks for co-located node FPGAs. A malicious tenant could attack the shell region of the co-located nodes and launch a denial-of-service attack which could potentially block the host server. In co-located nodes, it is also possible to extract the FPGA process variation parameters by accessing shared DRAM [94]. In this work, a dedicated physical unclonable function (PUF) circuit created a fingerprint of the shared DRAM instance. As DRAM is normally static in the data centers, acquiring the fingerprint of the shared DRAM module is equivalent to fingerprinting the attached FPGAs in co-located nodes. Recent work [11] shows that the fingerprint of co-located node FPGAs could also be achieved by PCIe bus contention. #### 5.4 Malicious Network Traffic While FPGAs are connected to the cloud ecosystem through public network interfaces, they face unique security challenges as they process network packets directly from the users. Bump-in-the-wire (or Smart-NIC) deployment models have multiple risks for exposing the FPGAs over the public network. A malicious adversary can easily flood the inbound/outbound packets in data-center routers. In research [84], a timing-based side-channel attack was introduced in an NoC platform to extract the keys in the AES algorithm. Using a Multiprocessor System-on-Chip (MPSoC) platform, a Denial-of-service (Dos) based attack was carried out in a Internet-of-thing(IoT) connected network devices. [37]. In the proposed attack, while the tenant application has been executing on NoC coupled IoT devices, the attacker could flood the network packets until Packet Injection Rate (PIR) crosses the threshold voltage and cause a complete shutdown. # **6 MULTI-TENANT CLOUD FPGA ATTACKS** Because cloud FPGA are installed in cloud servers and data centers, physical attacks on the protected servers and datacenters are rare. Multi-tenant cloud FPGA assaults will often include non-intrusive attacks. Three main types of attacks on multi-tenant FPGAs are as follows: 1. Extraction attacks 2. Fault Injection attacks 3. Denial-of-Service (DoS) attacks. #### 6.1 Extraction Attacks 6.1.1 Remote Side Channel attacks. Side channel attacks in FPGA have been explored extensively by the security researchers. Traditionally, side-channel attacks is carried out by probing voltage or electromagnetic waves and analysing collected traces by the devices either using Simple Power Analysis method (SPA) [?] or Differential Power Analysis method (DPA [79]). Instead of accessing actual physical FPGA hardware, a malicious attacker can exploit sensitive data and information by analysing the dynamic power consumption caused by voltage fluctuations of a remote FPGA board. This power consumption information could potentially leaks the secret keys of the IPs core and design blocks. Generally, in terms of multi-tenant platform the attacker doesn't have the access of the actual physical hardware. Which nullify the all traditional electromagnetic based side channel attacks on FPGA board. Eventually, attackers find out a way of accessing the FPGA fabric power consumption data which is carried out in a remote network connection. In order for the malicious opponents to gain access to some of the LUTs in the remote FPGA, it is assumed that the attacker and victim are situated within the same FPGA fabric. In general, building delay sensors like Ring Oscillator (RO) based sensors [51] or Time-to-Digital Converters (TDC) based sensors [87] can be used to evaluate the voltage fluctuation on the remote FPGA board. Fig. 7. An example of a TDC sensor that measures delay using a series of buffers and latches. [87]. Time-to-Digital (TDC) based Delay sensors. In FPGA fabric, power distribution network(PDN) controls all the supply voltage distribution of all the components. PDN resources are made accessible to all users when FPGA accelerators are shared in the cloud. Every modern IC is powered by a complicated PDN, a type of power mesh network, made up of resistive, capacitive, and inductive components. Any voltage drop in these PDN can cause a dynamic change in a current. Also, variation in operating conditions will alter the supply current and voltage and hence change the PDN. By inserting a suitable sensor (Hardware Trojan) this voltage fluctuation can be read and use to launch malicious remote side channel attacks. This hardware trojan can be inserted in both pre-fabrication and post-fabrication stages of the IC development process chain. The architecture of this trojan consist a chain of delay signals which propagates through a chain of buffers. The delay signals could be designed as a Time-to-Digital Converters(TDCs), as shown in Fig. 7. In the work [87], it is shown that the potential hardware Trojan could sufficiently sense the PDN variation events and could efficiently extract sensitive information's based on the voltage drop fluctuations. AES core running at 24 Mhz on a Xilinx Spartan-6 FPGA was used as the victim in the proof-of-concept attack described in [87] as a way to illustrate this scenario. In two alternative scenarios, the experiment was run: in one, the sensor was positioned close to the victim AES logic, while in the other, it was placed farther from the AES core. In all scenarios, the attacker is able to obtain the AES key. In the research, [47] Glamocanin et al. was successful in mounting a remote power side-channel attacks on a cryptographic accelerator running in AWS EC2 F1 instances. The attack circuit used in this work is a TDC based delay sensor depicted in Fig 7. As a case study, AES-128 core was used and the attack could successfully recover all the secret keys of all 16 bytes from a AES-128 IP core. Although, currently in the industry deployments FPGA is mounted as a data accelerator connected to CPU via PCI e bus connections. It is expected that in coming years, FPGA + CPU integration could take the lead in the server deployments. While FPGA + CPU integration in a same system-on-chip(SoC) (Subsection 3.4) dies could potentially delivers a lots of benefits in resource sharing and latency, it could have also create security bottlenecks. In future years, the rapid adaption of heterogeneous computing will probably force to fabricate more data-centers and cloud providers in one single SoC die. In the research, [8], Zhao et al. was successful to launch a remote base power side channel attacks from FPGA fabric in a system-on-chip deployment model (Subsection 3.4). The attack was carried out using a Ring Oscillator (RO) circuit placed in a FPGA programming logic block which can retrieve all the secret keys from a RSA crypto module placed in a CPU processor. This attack should be considered as a big security threat in perspective of multi-tenant cloud security and necessary steps should be taken to overcome this vulnerability risk. **Ring Oscillator (RO) based Delay sensors.** By measuring the oscillation frequency $f_{RO}$ of an RO-based delay sensor circuit, an attacker can extract the power supply consumption information. By using an odd number of cascaded inverters, where the output of the final inverter is sent back to the first, it is possible to create a RO-based delay circuit. Fig 8 depicts the traditional combinational loop-based RO circuit. Oscillation frequency $f_{RO}$ depends on the total inverters n cascaded in the ring followed by this equation $f_{RO} = \frac{1}{2l_p n}$ . Designers often integrate a sequential counter to measure oscillation ticks. Gravellier et al. proposed a high-speed on-chip RO-based sensor circuit for establishing remote side channel attacks on the FPGAs [51] which can measure runtime voltage fluctuations. Using the CPA analysis technique, the proposed work successfully extracted the AES encryption core keys operating at 50 Mhz. This alternative high-speed RO circuit offers better spatial coverage and reduced overhead compared to TDC-based sensors. The proposed circuit replaced the cascaded inverters and counters with sequential NAND gates and Johnson Ring Counter (JRC). Fig. 8. Proposed [51] Ring Oscillator (RO) based delay sensor block for launching remote side channel attack on AES encryption core. Cascaded inverters and counters are replaced with sequential NAND gates and Johnson Ring Counter. 6.1.2 Crosstalk Attacks. In FPGA configuration block, cross talk of routing wires has been established by the research [45]. The article demonstrates how the proximity of other long wires affects the delay of long lines (logical 1 vs 0). This delay could potentially leaks the information of the long wires and creates a communication channel even though they are not physically connected. This information leakage could create security vulnerability in a SoC platform, as often they are integrated by different multiple untrusted third parties IP. A malicious attacker could easily establish a hidden covert channel communication between the distinct cores in multi user setup. This could more severe on public cloud infrastructures where as FPGAs and CPUs are being incorporated. In the paper [46], which is an extension of the prior work [45] demonstrated the first covert channel between the separate virtual machine with cloud FPGAs, with reaching data transmission close to 20bps along with 99% accuracy. The covert channel between the FPGAs was exploited by using the PCI data contentions and which opens the doors for inferring the information from the VM instances when they are co located. The authors assert that particular circuits such as Ring Oscillators (RO) or Time-to-Digital Converters (TDCs), which are typically subject to Amazon design rule checks, are not required for the attacks (DRC). Instead, it compares the bandwidth of one data transmission with a predetermined threshold value and assigns a PCI stress when the logic is reached. Three separate Amazon F1 f1.16x large big instances (totaling 24 FPGAs) were rented for the 24-hour trial for this study. Today's cloud service providers, like AWS, only enable "single-tenant" access to FPGAs due to security risks. In these settings, each FPGA is only assigned and dedicated to the subscribing user for a predetermined amount of time, after which it may be transferred to another user. In the AWS F1 cloud instance design, each FPGA board is connected to a server by the x16 PCIe bus, and each server has a total of 8 FPGA instances. Eight FPGA instances are uniformly distributed across two Non-Uniform Memory Access (NUMA) nodes, according to a recent study using publicly available data on AWS F1 instances [45] shown in Fig. 9. Four FPGAs are connected as PCIe devices to each NUMA node's Intel Xeon CPUs through an Intel QuickPath Interconnect (QPI) interface. Any FPGA user on the AWS platform can perform DMA transfers from the FPGA to the server, and many designs with a little amount of logic overhead have the potential to fill the shared PCIe bus. However, users cannot directly manage PCIe transactions. The co-location of FPGA instances will be made visible due to bus contention and interference caused by this saturation. This concept is used to make sure that co-located NUMA FPGA instances are detected. The two parties must make sure they are co-located in the same NUMA node prior to the attack. They send out multiple handshake messages and wait for confirmation of the handshake response to confirm this. The co-located FPGA instances are identified and used for upcoming communication after the answer. The attack is launched with two FPGAs at a time, a memory stressor and a memory tester [46]. The stressor constantly transmit 1 bit in the PCIe bus and the tester keeps measuring its own bandwidth during this whole transmission period. The receiver can classifies a specific hit or miss state by comparing the bandwidth with threshold value. The results shows that, it can create a fast covert channel between any two FPGAs in either direction: at 200 bps with the accuracy of the channel is 100%. Two FPGAs are used in the attack at once, together with the memory stressor and memory tester [46]. Throughout the whole transmission period, the stressor continuously transmits 1 bit over the PCIe bus, and the tester continuously monitors its own bandwidth. By comparing the bandwidth with the threshold value, the receiver may identify a specific hit or miss state. The results demonstrate its ability to establish a quick covert channel between any two FPGAs in either direction at 200 bps with 100 Fig. 9. Single AWS F1 server, 8 FPGAs are divided between two NUMA nodes. # 6.2 Fault Injection A malevolent tenant could purposefully inject a fault into the FPGA during a fault-injection assault to jeopardize its security. This could lead to a denial-of-service attack, the extraction of cryptographic secret keys, unauthorized authentication, or the facilitation of secret leaking within the system. Attackers can employ a variety of physical tools and equipment to conduct fault-injection attacks, which can be non-invasive (such as clock glitching or voltage glitching), semi-invasive (such as local heating or laser), or intrusive (such as focussed ion beam) [103]. In perspective of cloud FPGA, only non-invasive attacks can be carried out and inject faults by clock or voltage glitching. [72] Fig. 10. RC equivalent network of Power Distribution Network (PDN) block [32]. 6.2.1 Voltage Attacks. In the paper, [32], In a multi-tenant platform, the security of deep learning (DL) accelerators was assessed against voltage-based attacks. The Asynchronous Ring Oscillator (RO) circuits have been used extensively in earlier research on voltage attacks. However, Amazon AWS EC2 already employs a protection mechanism against it because this technique has been extensively researched in academic circles. A fault injection attack is another name for a voltage attack. An adversary circuit with hostile intentions might implement circuitry that consumes a lot of current and causes voltage drops in the chip's power distribution network (PDN). A malicious tenant could induce timing violations and potentially disrupt the default functionality of the circuit. Fig 10 shows a simplified diagram indicating the main components in a PDN block. An RC equivalent network can be used to represent an on-chip FPGA PDN circuit. In this circuit, an onboard voltage regulator controls the board voltage level into the die voltage level. Decoupling and parallel capacitors are also present, and they aid in filtering out undesirable voltage noise. The voltage drop of this PDN block can be written by equation 1. $$V_{drop}(s) = I(s)Z_{PDN}(s) \tag{1}$$ where $Z_{PDN}(s)$ is the total impedance of the PDN block in the frequency domain. In a steady-state condition, the resistive component of $Z_{PDN}(s)$ e.g. **IR** is responsible for the *steady state* drop a.k.a transient drop. An effective malicious circuit could exponentially raise the transient drop (current drawn) and potentially launch a large voltage drop. As a result, it could affect the voltage propagation time in a neighboring circuit in a multi-tenant platform. Using the vendor gate clocking IP, the author introduces two novel adversarial circuits that can cause huge voltage drops in an FPGA fabric. Even though the attacker circuit is physically isolated, it can effectively launch integrity attacks on the DL accelerator running an ImageNet classification on the runtime scenario [32]. These results indicate that current multi-tenant FPGAs are vulnerable to voltage attacks and need additional research to overcome these attacks. Four 4-input XOR gates make up the first attacker circuit (Fig 11). Four toggle registers and one delayed output, which operate with a high switching frequency, serve as the inputs for XOR gates. The toggle rate of all-gated FFs changes abruptly when a clock is enabled. The concept behind the assaults is that voltage dips and timing errors could be introduced into the circuits if any gated clock circuitry is unexpectedly modified over a brief period. In the implementation, the adversary circuit performs clock gating by frequently switching at a high frequency while using the Intel clock control IP. This causes voltage decreases along the circuit. The final attacker circuit makes use of the attack and broadens it to include all digital signal processing (DSP) and block memory (BRAMs). The attacker circuits continue writing some patterns sequentially to every address of all BRAMs while the FPGA is working on a DL accelerator task. DSP blocks are also seeded concurrently with random initial inputs. The findings indicate that it could lead to significant timing violations in the circuits, and eventually, DL accelerators would see a significant performance hit as a result of this attack. Fig. 11. Voltage attack circuit using clock-gated garbled XORs. Adversary circuit performs clock gating, switches frequency quickly, and causes voltage drop throughout the circuit using the Intel clock control IP. Alam et al. [25], shows that concurrent write operations in same address on dual-port RAMs could cause severe voltage drop and temperature increase inside the chip. Simultaneous write operations to the same will introduce a memory collusion leading to transient short circuit. 6.2.2 Ring Oscillator based fault injection attacks. Ring oscillators are a lucrative choice to launch fault injection attacks as they achieve very high switching frequencies with minimum design and effort. Mahmoud et al. [70] leveraged a ring oscillator based power hungry circuit to induce timing failures in a true random generator cryto core used for generating secret keys. Krautter et al.[67] proposed a high frequency based controlled RO circuit, feasible of recovering encrypted message in the 9th round of AES computation stages by inducing timing delays. # 6.3 Denial of Service attacks (DoS) DRAM attacks(Row Hammers). In modern heterogeneous FPGA system, each FPGA board is occupied with his own attached DRAM which can be shared through the SoC/Datacenter. Here, the attached DRAM can be accessed from the FPGA in a unique privilege without any further monitoring. This allows one user to take a attempt of modify the DRAM of another user. This attempt, can be undetected from the CPU side and a malicious user can access the shared DRAM and launch a row-hammer attack [96]. This research shows that, a malicious FPGA can perform twice as a typical Row-hammer in the CPU on the same system and could potentially flip four times bits in compare with the CPU attack. Further, it is much more difficult to detect on the FPGA side as user has the direct access of the FPGA's memory access operations without any intervention of CPU. In academia, row-hammer attacks refer to the new way of attacking shared DRAM and manipulating the user data [71]. In this method, by continuous read operation it could use the unwanted electrical charges to corrupt the data in a same row in RAM. Zane Weissman et al. launch a classic CPU-based Row-hammer attack along with a Jack Hammer attack(FPGA) against the WolfCrypt RSA implementation. The attack was successful in recovering the private keys used to secure SSL connections. The main reason of the attack results is the lack of security controls and protections of DRAM allocation and access policies. This issue could be very critical for multi-tenant cloud platforms in which co-tenants have the immediate hardware access and control of the shared DRAM modules. For multi-tenant platform it has not been done yet. #### 7 COUNTERMEASURES To prevents the attacks mentioned in section 6 several methods has been proposed in the literature's [103], [63]. We divided the methods into two major categories : 1. Tenant approach and 2. Cloud Providers Approach ### 7.1 Tenant Approach 7.1.1 Masking and Hiding. Researchers have proposed many masking and hiding strategies for building attack side channel attack resilient circuits. To prevent attacker from obtaining circuits internal details, in masking strategy, the base circuit is transformed into a secure large circuit by applying some cryptographic algorithm [60]. This large secure circuit is totally different from the base circuit in terms of internal LUT tables and boolean functions but functionally equivalent to base circuit. Even though the attacker can observe some internal details it cannot extract the whole circuit as changed scheme of implementation Hiding strategies at aims reducing the Signal Noise Ration (SNR) at computational stages of the crypto cores which can be exploited by malicious attackers to leak information from the IP core. Generally, SNR reduction is implemented by either adding additional noise or reducing the strength of the signal power trace in computational phases [65], [98], [40]. - 7.1.2 IP Watermarking. In a multi-tenant environment, tenants can integrate IP watermarking into their hardware design by introducing some unique data identifier into hardware design [80]. IP watermarking method could prevent potential IP counterfeits and reverse engineering threats [12]. By integrating the IP watermark, tenants can prove ownership of the hardware design and prevent illegal IP use from malicious cloud providers. Some watermark techniques could incur additional area overhead in routing paths. IP watermarking methods can be generally classified into five groups: i) Constraint-based watermarking, ii) Digital signal processing (DSP)-based watermarking, iii) Finite state machine (FSM)-based watermarking, iv) Test structure-based watermarking and v) Side channel-based watermarking [12]. The constraint-based watermarking method was introduced in research [64], where the author's signature was mapped into a set of constraints that can hold some independent solution to complex NP-hard problems. FSM-based watermark adds additional FSM states at the behavioral level without altering the functionality of the hardware circuit. By modifying the state transition graphs, FSM watermarks can be embedded into the hardware design [81]. Test structure-based watermarking would not fit in the context of multi-tenant cloud FPGA security. It depends on IC supply chain testing methodology and focuses more on fabrication-based testing techniques. Digital Signal based watermarking could be introduced by slight modifications to the decibel (dB) requirements of DSP filters without compromising their operation [36]. - 7.1.3 Obfuscation. Hardware obfuscation or camouflage is a unique approach to obfuscate tenant circuits from malicious attackers. In this method, the circuit's functionality is obscured to a functional equivalent code by secret cryptographic keys. The idea behind this attack is to modify the state transitions functions of a circuit so that it only generates desired output for correct input patterns. This same obfuscated circuit can generate incorrect functional behavior with provided wrong input patterns or keys [13]. By this method, the tenant can protect their Intellectual Property (IPs) from unauthorized manufacturing and cloning. Its also provide prevention for IP privacy issues [90]. A more common practice of obfuscation in industry is widely adopted by HDL encryption [17]. The HDL source code is encrypted and obfuscated, and the IP vendor provides valid authentication keys to only registered licensed customers. # 7.2 Cloud Providers Aprroach 7.2.1 Bitstream Antivirus . The encryption algorithms and key storages used by contemporary FPGAs from top suppliers are listed in Table 1. The onboard volatile memory, which is powered by a coin-cell battery, is where the secret symmetric key is encrypted and kept. Table 1. Bitstream encryption in modern FPGAs | Features | Virtex/Kintex Ultrascale+ | Zynq Ultrascale+ | Aria-10 GX / SX | Stratix-10 GX/SX | SmartFusion2 | |----------------------|---------------------------|------------------|-----------------|------------------|---------------------------| | Encryption Algorithm | AES-GCM256 | AES-GCM 256 | AES-GCM 256 | AES-GCM 256 | AES-GCM 128/256 with ECDH | | Key Storage | BBRAM eFuse | BBRAM eFuse | BBRAM eFuse | AES-GCM 256 | AES-GCM 128/256 with ECDH | Tenants can encrypt their designs or region using bitstream encryption cores provided by FPGA vendors. But, in the cloud FPGA platform, the tenant hardware design is not delivered to the public cloud providers as a bitstream. Amazon AWS EC2 F1 only accepts its own customized synthesized netlist [6]. This customization process includes reshaping reconfigurable regions, applying design rule checks, and verifying routing constraints for the tenant's IP blocks. Cloud providers can enforce strict design rule checking on the tenant's bitstream by bitstream checkers to prevent malicious sidechannel attacks and fault-injection attacks [4]. Bitstream checkers could identify malicious circuit structures, e.g., malicious ROs or combinational loops, and prevent bitstreams from being uploaded. Alternative designs, such as sequential RO circuits, can still avoid bitstream verification, and the standard technique used by AWS and other significant FPGA cloud providers is insufficient because it can only detect assaults using LUT-based ROs. [91]. The first bitstream checker, a.k.a FPGA antivirus, was proposed by Gnad et al. [48] that checks for known negative patterns on the circuits. Similar to the antivirus concepts, this bitstream checker checks at signatures of malicious logic that might lead to electrical-level attacks. Instead of matching exact circuits, it intends to formulate the malicious attacks' fundamental properties and provide protection against fault injection and side-channel attacks. In the case of fault injection attacks, the checker extracts combinational cycles and evaluates a threshold number of cycles that can be allowed for the designer without being able to launch any crashes and fault attacks. As timing violations and TDC could incur side-channel leakage, this checker also prevents side-channel attacks by confirming the netlist has zero timing violation constraints. While in the work [48], LUT based ring oscillator designs are detected by scanning bitstream netlist. This work cannot prevents alternative oscillator design (e.g. glitch based oscillators)[91]. FPGADEFENDER proposes an FPGA virus scanner that detects malicious non-ROs based designed by scanning bitstream covering the areas configuration blocks (CLBs), block ram memories (BRAMs), and signal processing blocks (DSPs) [78]. Beside ring oscillators, the most severe malicious FPGA threats are self-oscillators (SOs) based attacks. Unlike ROs circuits, self-oscillators (SOs) depend on the external clock and soft logic feedback and can hardly be detected by AWS and major FPGA vendors' bitstream checkers. FPGADEFENDER can detect these non-trivial self-oscillator circuits threats. In the work [78], FPGADEFENDER designed and proposed a circuit using different self-oscillating circuits by means of combinatorial feedback loops and asynchronous flip-flop modes and scanned different non-RO-based oscillatory circuits. Although, one of the bottleneck of this work was to flag the non-malicious true random number generator (TRNG). 7.2.2 Active Fence and side channel attack prevention. Krautter et al. [14] propose implanting a fence between the tenant region and attacker circuit. This work aims to prevent side-channel attacks from neighboring attacker circuits by creating extra noise signals utilizing ROs chains. As a result, it reduces the circuit's total signal-to-noise(SNR) ratio at the electrical level. The work aims to provide electrical level defense against side-channel or voltage-based extraction attacks where logical isolation methods are ineffective. Both heuristic and arbitrary approaches were followed to place the ring oscillators between malicious circuits and tenant regions. In addition, besides the placement of ROs, two activation strategies were considered for efficient SNR reduction. The amount of activated ROs depends on a TRNG output in the first strategy. Secondly, ROs are proportionally activated by the value of the voltage-fluctuation sensor. Besides this work, hiding and masking are popular countermeasures against side-channel attacks. However, as hiding and masking are more IP dependent, they are discussed in the Tenant Approach countermeasure (Section 7.1.1) and should be provided under the tenant's responsibility. 7.2.3 Access Control . In an FPGA-accelerated multi-tenant platform, execution of an software application both depends on processor and FPGA fabric. Software part includes exploiting the advantage of built-in operating system or sequential programming, where FPGA fabric accelerates hardware computation. To prevent unauthorized authentication and software access, trusted cloud provider entity can enforce security rules and different policies for its hardware region and attached hardware peripherals. Joel et. al. proposed a domain isolation based access control mechanism for multi-tenant cloud FPGAs [74]. The threat model in this work considers a malicious software on a virtual machine try to steal information by accessing the FPGA region assigned to another VM. The threat mitigation approach uses a hardware-software co-design method to enforce the access control policies to shared FPGA regions. They propose a central software-based policy server, hardware based policy enforcer named "Hardware Mandatory Access Controller (HMAC)", a secured communication protocol between policy server and HMAC, and RO-TRNG based key generation for secure authentication. This work also does not consider the side channel based attacks on cloud based FPGA sharing. Access control based security rules checking in SoC-based embedded systems was discussed in work [56][54]. 7.2.4 Domain Isolation for cross-talk and covert channel attack prevention. Crosstalks attacks have been discussed thoroughly in section 6.1.2. Logical domain isolation amongst tenants is the most effective method to prevent crosstalk and cross-channel attacks. Domain isolation solutions can be carried out either by physical isolation or logical isolation. Logical isolation rely on dividing FPGA region into software controlled secure region and enforcing security mechanism to provide secure execution of environment [101], [50], [99], [85], [95], , [27]. In [75–77] Joel et al. proposed logical hardware isolation technique (Fig 12) using FPGA virtualization without major performance loss. In context of multi-tenant cloud FPGA security, this work also investigates integration of access control along with hardware isolation. In SoC, for protecting IPs from tampering and crosstalk some hardware isolation strategies were presented in [43, 53] and further extended in [30, 31, 55] to shield hardware IPs using hardware sandboxes. Some of the prior research on domain isolation in hardware has focused on isolating FPGA accelerators on SoC platform [57], [59], [86], [41], [97], [42], [28], [68], [83]. Huffmire et al. propose a physical hardware isolation technique (Moats and Draw-bridge) to isolate FPGA region for different tenants[58]. Physical hardware isolation is established by creating a block of wires("moats") around each core region. The core region can only communicate with other region via "draw-bridge" tunnel. This tunneling method is applied by disabling all the CLB switchboxes near isolated region. Fig. 12. FLASK security architecture functionality [75]. The policy server enforces IP context matching and compares the IP identity with the access vector cache that has been previously saved in secured storage server. Fig. 13. Physical hardware isolation technique by establishing moats and draw-bridge [58] #### 8 VIRTUALIZATION SECURITY RISKS AND FUTURE CHALLENGES Using VMs and a VMM, a host system is virtualized in the multi-tenant cloud FPGA virtualization method. This software-based virtualization layer's vulnerabilities could be used to compromise data. Unreliable IaaS has direct access and is able to go beyond the virtualization layer. As a result, the FPGA must take care of its own security and cannot rely on host computer software. In subsection 2.6.3 we introduced existing operating system support for FPGA virtualization. Virtualization in FPGA is generally implemented using software or OS abstraction. In AmorphOS's OS [3], space shared multi-tenancy is introduced by dedicating I/O and memory bandwidth to specific tenant (Morphlet). Although AmorphOS's provides some protection similar to software OS , it does not provide explicit protection against remote side channel attacks [3]. BORPH [5], exploits and extends Linux Kernel for abstracting OS level instruction in the FPGA. Although, this proposed OS framework can enable OS in FPGAs, security primitives remains to be explored. Concepts such as secure system call, swapping, parallel file system access need to be refined. Scaling of BORPH model for increased demand without compromising security should be the priority in future research. Due to the high overhead added by program image loading, Feniks OS mostly uses spatial sharing for multitasking rather than dynamic accelerator reloading (context switching) [102]. Although, the literature claim that their custom accelerators templates will prevent collusion between OS and neighbour accelerators, the benefits was not demonstrated in the implementation. Morever, it doesnot provide any explicit protection or countermeasures against any established cloud FPGA adversaries. **Future Challanges.** FPGA acceleration in a cloud platform is a trending exploration, and we believe it will continue to rise in the forthcoming years. However, in the future years, more attacks are likely to be proposed on the multi-tenant platform. Current attacks described in section 6 will be perhaps outdated and need more exclusive and comprehensive definitions. Cloud providers are more responsible and should come forward to provide strong security measures for tenants being protected from the malicious attackers. Existing logical and physical isolation methods [57], [59], [86], [97], [28], [83] should be carefully extended and exploited in multi-tenant platform to establish secure isolation of tenants. Moreover, it should also explore some concrete and specific virtualization/isolation mechanisms for spatial sharing of the underlying FPGA fabric. Bitstream checkers provided by cloud providers should also be able to detect non-RO-based side-channel and fault injection attacks without compromising the tenant's design. Existing public cloud provider giants still lack deploying a runtime monitoring system for active defenses against real-time known attacks. One of the core challenges would be to provide secure access to reserved shared Shell region elements(memory, control logic, and PCIe buses) so that a tenant could only access its designated address and memory data. Any compromise of Shell region abstraction would hamper the isolation of these core system components and could potentially lead to launch row hammer attacks [96] and crosstalk attacks[46]. Existing fault injection and side channel attacks detection mechanisms still hold a significant overhead in the overall circuit which need to be addressed. In multi-tenant platform, tenants still have to trust the major public cloud providers as they have to provide major architectural support for security. Beside, this tenants should focus on obfuscating their important critical design by hiding and masking strategies e.g. obfuscation, watermarking and noise reduction methods [90], [12], [98]. Although some of the hiding and masking and obfuscation strategies are not still extended in multi-tenant platform and currently being implemented in the SoC platforms, we believe they have great potentiality for increasing the security of the tenants IPs blocks and designs. Also, researchers are more likely to proposed various different low cost and low overhead side channel mitigation's countermeasures in the forthcoming years. Tenants could also use their cryptographic FPGA primitives for different security purposes such as IP authentication, privacy and integrity check, and establishing the root of trust. Most public cloud providers still restrict using RO-based combinational loops in the circuit, which is the fundamental block of many PUF and TRNG generators (e.g., Amazon EC2 F1 instance blocks RO-based design). Alternatively, for true random number generators(TRNGs), tenants can exploit some TRNG circuits that use meta-stability as a source of randomness [73], [88]. For secure authentication, tenants can construct different FPGA-based physical unclonable functions (PUFs) circuits [24]. Many cutting-edge PUF designs can be extended in the context of cloud FPGA and provide proof of trust and proof of execution fingerprint of the tenant. #### 9 CONCLUSION FPGA accelerated cloud security research has evolved rapidly and should be exploited comprehensively in future years. Our survey discusses different recent works on multi-tenant cloud FPGA platform security threats and covers the proposed state-of-the-art defenses for existing security threats. We survey different FPGA deployment models in cloud platforms and aim to classify existing attacks. We also conclude our insights and provide guidelines for constructing and developing future security primitives in a multi-tenant context. 9.0.1 Acknowledgments. This work was funded by the National Science Foundation (NSF) under Grant CNS 2007320 #### REFERENCES - [1] [n.d.]. (3) (PDF) An FPGA Platform for Hyperscalers. https://www.researchgate.net/publication/319346199\_An\_FPGA\_Platform\_for\_Hyperscalers - [2] [n.d.]. (3) (PDF) NARC: NARC: Network Network -Attached Reconfigurable Computing for Attached Reconfigurable Computing for High High -performance, Network performance, Network -based Applications based Applications. https://www.researchgate.net/publication/265892078\_NARC\_NARC\_Network\_Network\_--Attached\_Reconfigurable\_Computing\_for\_Attached\_Reconfigurable\_Computing\_for\_High\_High\_--performance\_Network\_performance\_Network\_--based\_Applications\_based\_Applications - [3] [n.d.]. AmorphOS Google Search. https://www.google.com/search?client=firefox-b-1-d&q=AmorphOS - [4] [n.d.]. Baidu Deploys Xilinx FPGAs in New Public Cloud Acceleration Services. https://www.xilinx.com/news/press/2017/baidu-deploys-xilinx-fpgas-in-new-public-cloud-acceleration-services.html - [5] [n.d.]. BORPH: An Operating System for FPGA-Based Reconfigurable Computers | EECS at UC Berkeley. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-92.html - [6] [n.d.]. Developer Preview EC2 Instances (F1) with Programmable Hardware | AWS News Blog. https://aws.amazon.com/blogs/aws/developer-preview-ec2-instances-f1-with-programmable-hardware/ - [7] [n.d.]. Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center | Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. https://dl.acm.org/doi/10.1145/3020078. 3021742 - [8] [n.d.]. FPGA-Based Remote Power Side-Channel Attacks. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8418606 - [9] [n.d.]. FPGA Market Size, Share and Trends Forecast to 2026 | MarketsandMarkets™. https://www.marketsandmarkets. com/Market-Reports/fpga-market-194123367.html - [10] [n.d.]. How DNAnexus and Edico Genome are Powering Precision Medicine on Amazon Web Services (AWS) | AWS Partner Network (APN) Blog. https://aws.amazon.com/blogs/apn/how-dnanexus-and-edico-genome-are-powering-precision-medicine-on-amazon-web-services-aws/ - [11] [n.d.]. IEEE Xplore Full-Text PDF:. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9444054 - [12] [n.d.]. IEEE Xplore Full-Text PDF:. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1213006 - [13] [n.d.]. IEEE Xplore Full-Text PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5247148 - [14] [n.d.]. IEEE Xplore Full-Text PDF:. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8742262 - [15] [n.d.]. Instance family. https://www.alibabacloud.com/help/en/doc-detail/25378.html - [16] [n.d.]. Instance Types | Tencent Cloud. https://intl.cloud.tencent.com/document/product/213/11518#FX2 - [17] [n.d.]. Methodology for protection and Licensing of HDL IP. https://www.design-reuse.com/articles/12745/methodology-for-protection-and-licensing-of-hdl-ip.html - [18] [n.d.]. NGCodec Archives The Broadcast Knowledge. https://thebroadcastknowledge.com/tag/ngcodec/ - [19] [n.d.]. Nimbix Introduces Xilinx Alveo U50 Accelerator Cards on the Nimbix Cloud with Broad Application Support. https://www.hpcwire.com/off-the-wire/nimbix-introduces-xilinx-alveo-u50-accelerator-cards-on-the-nimbix-cloud-with-broad-application-support/ - [20] [n.d.]. Open Source Cloud Computing Infrastructure OpenStack. https://www.openstack.org/ - $[21] \ [n.d.]. \ Xilinx Powers Huawei FPGA \ Accelerated \ Cloud \ Server. \ https://www.xilinx.com/news/press/2017/xilinx-powers-huawei-fpga-accelerated-cloud-server.html$ - [22] [n.d.]. Zynq-7000 SoC. https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html - [23] Andreas Agne, Markus Happe, Ariane Keller, Enno Lubbers, Bernhard Plattner, Marco Platzner, and Christian Plessl. 2014. ReconOS: An operating system approach for reconfigurable computing. *IEEE Micro* 34, 1 (2014), 60–71. https://doi.org/10.1109/MM.2013.110 - [24] Muhammed Kawser Ahmed, Venkata P. Yanambaka, Ahmed Abdelgawad, and Kumar Yelamarthi. 2020. Physical Unclonable Function Based Hardware Security for Resource Constraint IoT Devices. In 2020 IEEE 6th World Forum on Internet of Things (WF-IoT). 1–2. https://doi.org/10.1109/WF-IoT48130.2020.9221357 - [25] Md Mahbub Alam, Shahin Tajik, Fatemeh Ganji, Mark Tehranipoor, and Domenic Forte. 2019. RAM-Jam: Remote temperature and voltage fault attack on FPGAs using memory collisions. Proceedings - 2019 Workshop on Fault - Diagnosis and Tolerance in Cryptography, FDTC 2019 (8 2019), 48-55. https://doi.org/10.1109/FDTC.2019.00015 - [26] Hussain Aljahdali, Abdulaziz Albatli, Peter Garraghan, Paul Townend, Lydia Lau, and Jie Xu. 2014. Multi-Tenancy in Cloud Computing. In Proceedings - IEEE 8th International Symposium on Service Oriented System Engineering, SOSE 2014. https://doi.org/10.1109/SOSE.2014.50 - [27] ARM. [n.d.]. TrustZone: SoC and CPU System-Wide Approach to Security. - [28] Abhishek Basak, Swarup Bhunia, and Sandip Ray. 2015. A Flexible Architecture for Systematic Implementation of SoC Security Policies. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (Austin, TX, USA) (ICCAD '15). IEEE Press, Piscataway, NJ, USA, 536–543. http://dl.acm.org/citation.cfm?id=2840819.2840894 - [29] Christophe Bobda, Joel Mandebi Mbongue, Paul Chow, Mohammad Ewais, Naif Tarafdar, Juan Camilo Vega, Ken Eguro, Microsoft Dirk Koch, Suranga Handagala, Miriam Leeser, Martin Herbordt, Hafsah Shahzad, Peter Hofste, Ahmed Sanaullah, J Mandebi Mbongue, P Chow, M Ewais, N Tarafdar, J C Vega, M Leeser, M Herbordt, H Shahzad, A Sanaullah, Dirk Koch, Burkhard Ringlein, Jakub Szefer, and Russell Tessier. 2022. The Future of FPGA Acceleration in Datacenters and the Cloud. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15, 3 (2 2022), 1–42. https://doi.org/10.1145/3506713 - [30] Christophe Bobda, Joshua Mead, Taylor J. L. Whitaker, Charles A. Kamhoua, and Kevin A. Kwiat. 2017. Hardware Sandboxing: A Novel Defense Paradigm Against Hardware Trojans in Systems on Chip. In Applied Reconfigurable Computing - 13th International Symposium, ARC 2017, Delft, The Netherlands, April 3-7, 2017, Proceedings. 47–59. https://doi.org/10.1007/978-3-319-56258-2 - [31] Christophe Bobda, Taylor J. L. Whitaker, Charles A. Kamhoua, Kevin A. Kwiat, and Laurent Njilla. 2017. Synthesis of hardware sandboxes for Trojan mitigation in systems on chip. In 2017 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2017, McLean, VA, USA, May 1-5, 2017. 172. https://doi.org/10.1109/HST.2017.7951836 - [32] Andrew Boutros, Mathew Hall, Nicolas Papernot, and Vaughn Betz. [n.d.]. Neighbors From Hell: Voltage Attacks Against Deep Learning Accelerators on Multi-Tenant FPGAs. ([n. d.]). - [33] Stuart Byma, J. Gregory Steffan, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. 2014. FPGAs in the cloud: Booting virtualized hardware accelerators with OpenStack. Proceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014 (7 2014), 109-116. https://doi.org/10.1109/FCCM.2014.42 - [34] Stuart Byma, J. Gregory Steffan, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. 2014. FPGAs in the cloud: Booting virtualized hardware accelerators with OpenStack. *Proceedings 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014* (7 2014), 109–116. https://doi.org/10.1109/FCCM.2014.42 - [35] Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. [n.d.]. A Cloud-Scale Acceleration Architecture. ([n.d.]). - [36] Roy Chapman and Tariq S. Durrani. 2000. IP protection of DSP algorithms for system on chip implementation. IEEE Transactions on Signal Processing 48, 3 (2000), 854–861. https://doi.org/10.1109/78.824679 - [37] Cesar Giovanni Chaves, Siavoosh Payandeh Azad, Thomas Hollstein, and Johanna Sepúlveda. 2019. DoS Attack Detection and Path Collision Localization in NoC-Based MPSoC Architectures. Journal of Low Power Electronics and Applications 2019, Vol. 9, Page 7 9, 1 (2 2019), 7. https://doi.org/10.3390/JLPEA9010007 - [38] Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014. Enabling FPGAs in the cloud. (2 2014). https://doi.org/10.1145/2597917.2597929 - [39] Jongsok Choi, Ruolong Lian, Zhi Li, Andrew Canis, and Jason Anderson. 2018. Accelerating memcached on AWS cloud FPGAs. ACM International Conference Proceeding Series (6 2018). https://doi.org/10.1145/3241793.3241795 - [40] Jean Luc Danger, Sylvain Guilley, Shivam Bhasin, and Maxime Nassar. 2009. Overview of dual rail with precharge logic styles to thwart implementation-level attacks on hardware cryptoprocessors - New attacks and improved counter-measures. 3rd International Conference on Signals, Circuits and Systems, SCS 2009 (2009). https://doi.org/10. 1109/ICSCS.2009.5412599 - [41] S. Drzevitzky. 2010. Proof-Carrying Hardware: Runtime Formal Verification for Secure Dynamic Reconfiguration. In 2010 International Conference on Field Programmable Logic and Applications. 255–258. https://doi.org/10.1109/FPL. 2010.59 - [42] S. Drzevitzky, U. Kastens, and M. Platzner. 2009. Proof-Carrying Hardware: Towards Runtime Verification of Reconfigurable Modules. In 2009 International Conference on Reconfigurable Computing and FPGAs. 189–194. https://doi.org/10.1109/ReConFig.2009.31 - [43] Christophe Bobda Charles Kamhoua Festus Hategekimana, Adil Tbatou and Kevin Kwiat. 2015. Hardware Isolation Technique for IRC-based Botnets Detection. In *International Conference on ReConFigurable Computing and FPGAs (ReConFigós)*, Vol. 0. IEEE Computer Society, Cancun, Mexico. - [44] Kermin Fleming, Hsin-Jung Yang, Michael Adler, and Joel Emer. [n.d.]. The LEAP FPGA Operating System. ([n.d.]). - [45] Ilias Giechaskiel, Kasper B Rasmussen, and Ken Eguro. 2018. Leaky Wires: Information Leakage and Covert Communication Between FPGA Long Wires. (2018). https://doi.org/10.1145/XXXXXX.XXXXXX - [46] Ilias Giechaskiel, Shanquan Tian, and Jakub Szefer. [n.d.]. Cross-VM Information Leaks in FPGA-Accelerated Cloud Environments. ([n.d.]). https://ieeexplore.ieee.org/document/9702277 - [47] Ognjen Glamočanin, Louis Coulon, Francesco Regazzoni, and Mirjana Stojilovi´c Stojilovi´c. [n.d.]. Are Cloud FPGAs Really Vulnerable to Power Analysis Attacks? ([n.d.]). https://ieeexplore.ieee.org/document/9116481 - [48] Dennis R.E. Gnad, Sascha Rapp, Jonas Krautter, and Mehdi B. Tahoori. 2018. Checking for Electrical Level Security Threats in Bitstreams for Multi-Tenant FPGAs. Proceedings - 2018 International Conference on Field-Programmable Technology, FPT 2018 (12 2018), 289–292. https://doi.org/10.1109/FPT.2018.00055 - [49] Dennis R E Gnad, Fabian Oboril, and Mehdi B Tahoori. 2017. Voltage drop-based fault attacks on FPGAs using valid bitstreams. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 1–7. https://doi.org/10.23919/FPL.2017.8056840 - [50] J. A. Goguen and J. Meseguer. 1982. Security Policies and Security Models. In Security and Privacy, 1982 IEEE Symposium on. 11–11. https://doi.org/10.1109/SP.1982.10014 - [51] Joseph Gravellier, Jean Max Dutertre, Yannick Teglia, and Philippe Loubet-Moundi. 2019. High-Speed Ring Oscillator based Sensors for Remote Side-Channel Attacks on FPGAs. In 2019 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2019. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ ReConFig48160.2019.8994789 - [52] Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Y U Wang, Huazhong Yang, and Yu Wang. 2017. A Survey of FPGA-Based Neural Network Inference Accelerator. ACM Trans. Reconng. Technol. Syst 9, 11 (2017). - [53] Festus Hategekimana, Pierre Nardin, and Christophe Bobda. 2016. Hardware/Software Isolation and Protection Architecture for Transparent Security Enforcement in Networked Devices. In IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2016, Pittsburgh, PA, USA, July 11-13, 2016. 140-145. https://doi.org/10.1109/ISVLSI.2016.32 - [54] F. Hategekimana, T. Whitaker, M. J. H. Pantho, and C. Bobda. 2017. Shielding non-trusted IPs in SoCs. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 1–4. https://doi.org/10.23919/FPL.2017. 8056848 - [55] F. Hategekimana, T. Whitaker, M. J. H. Pantho, and C. Bobda. 2017. Shielding non-trusted IPs in SoCs. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 1–4. https://doi.org/10.23919/FPL.2017. 8056848 - [56] F. Hategekimana, T. J. L. Whitaker, M. J. H. Pantho, and C. Bobda. 2017. Secure integration of non-trusted IPs in SoCs. In 2017 Asian Hardware Oriented Security and Trust Symposium (AsianHOST). 103–108. https://doi.org/10.1109/ AsianHOST.2017.8354003 - [57] T. Huffmire, B. Brotherton, T. Sherwood, R. Kastner, T. Levin, T. D. Nguyen, and C. Irvine. 2008. Managing Security in FPGA-Based Embedded Systems. *IEEE Design Test of Computers* 25, 6 (Nov 2008), 590–598. https://doi.org/10.1109/ MDT.2008.166 - [58] T. Huffmire, B. Brotherton, G. Wang, T. Sherwood, R. Kastner, T. Levin, T. Nguyen, and C. Irvine. 2007. Moats and Drawbridges: An Isolation Primitive for Reconfigurable Hardware Based Systems. In 2007 IEEE Symposium on Security and Privacy (SP '07). 281–295. https://doi.org/10.1109/SP.2007.28 - [59] Ted Huffmire, Timothy Sherwood, Ryan Kastner, and Timothy Levin. 2008. Enforcing Memory Policy Specifications in Reconfigurable Hardware. Comput. Secur. 27, 5-6 (Oct. 2008), 197–215. https://doi.org/10.1016/j.cose.2008.05.002 - [60] Yuval Ishai, Amit Sahai, and David Wagner. [n.d.]. Private Circuits: Securing Hardware against Probing Attacks. ([n.d.]). - [61] Aws Ismail and Lesley Shannon. 2011. FUSE: Front-end user framework for O/S abstraction of hardware accelerators. Proceedings - IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011 (2011), 170–177. https://doi.org/10.1109/FCCM.2011.48 - [62] Chenglu Jin, Vasudev Gohil, Ramesh Karri, and Jeyavijayan Rajendran. 2020. Security of Cloud FPGAs: A Survey. (5 2020). http://arxiv.org/abs/2005.04867 - [63] Chenglu Jin, Vasudev Gohil, Ramesh Karri, and Jeyavijayan Rajendran. 2020. Security of Cloud FPGAs: A Survey. (5 2020). http://arxiv.org/abs/2005.04867 - [64] Andrew B. Kahng, John Lach, William H. Mangione-Smith, Stefanus Mantik, Igor L. Markov, Miodrag Potkonjak, Paul Tucker, Huijuan Wang, and Gregory Wolfe. 2001. Constraint-based watermarking techniques for design IP protection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 20, 10 (10 2001), 1236–1252. https://doi.org/10.1109/43.952740 - [65] Najeh Kamoun, Lilian Bossuet, and Adel Ghazel. 2009. Correlated power noise generator as a low cost DPA countermeasures to secure hardware AES cipher. 3rd International Conference on Signals, Circuits and Systems, SCS 2009 (2009). https://doi.org/10.1109/ICSCS.2009.5412604 - [66] John H. Kelm and Steven S. Lumetta. 2008. HybridOS: Runtime support for reconfigurable accelerators. ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA (2008), 212–221. https://doi.org/10.1145/1344671. 1344703 [67] Jonas Krautter, Dennis R.E. Gnad, and Mehdi B. Tahoori. 2018. FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES. IACR Transactions on Cryptographic Hardware and Embedded Systems 2018, 3 (8 2018), 44–68. https://doi.org/10.13154/TCHES.V2018.I3.44-68 - [68] Xun Li, Vineeth Kashyap, Jason K. Oberg, Mohit Tiwari, Vasanth Ram Rajarathinam, Ryan Kastner, Timothy Sherwood, Ben Hardekopf, and Frederic T. Chong. 2014. Sapper: A Language for Hardware-level Security Policy Enforcement. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (Salt Lake City, Utah, USA) (ASPLOS '14). ACM, New York, NY, USA, 97–112. https://doi.org/10.1145/2541940. 2541947 - [69] Fang Liu, Jin Tong, Jian Mao, Robert Bohn, John Messina, Mark Badger, and Dawn Leaf. 2011. NIST Cloud Computing Reference Architecture. https://doi.org/10.6028/NIST.SP.500-292 - [70] Dina Mahmoud and Mirjana Stojilović. 2019. Timing Violation Induced Faults in Multi-Tenant FPGAs. Proceedings of the 2019 Design, Automation and Test in Europe Conference and Exhibition, DATE 2019 (5 2019), 1745–1750. https://doi.org/10.23919/DATE.2019.8715263 - [71] Dina G. Mahmoud, Vincent Lenders, and Mirjana Stojilović. 2022. Electrical-Level Attacks on CPUs, FPGAs, and GPUs: Survey and Implications in the Heterogeneous Era. ACM Computing Surveys (CSUR) 55, 3 (2 2022), 1–40. https://doi.org/10.1145/3498337 - [72] Mehrdad Majzoobi, Farinaz Koushanfar, and Srinivas Devadas. [n.d.]. FPGA-based True Random Number Generation using Circuit Metastability with Adaptive Feedback Control. ([n.d.]). - [73] Mehrdad Majzoobi, Farinaz Koushanfar, and Srinivas Devadas. 2011. FPGA-Based True Random Number Generation Using Circuit Metastability with Adaptive Feedback Control. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6917 LNCS (2011), 17–32. https://doi.org/ 10.1007/978-3-642-23951-9{\_}}2 - [74] Joel Mandebi Mbongue, Sujan Kumar Saha, and Christophe Bobda. 2021. Domain Isolation in FPGA-Accelerated Cloud and Data Center Applications. In Proceedings of the 2021 on Great Lakes Symposium on VLSI. 283–288. - [75] Joel Mandebi Mbongue, Festus Hategekimana, Danielle Tchuinkou Kwadjo, David Andrews, and Christophe Bobda. 2018. FPGAVirt: A Novel Virtualization Framework for FPGAs in the Cloud. In 11th IEEE International Conference on Cloud Computing, CLOUD 2018, San Francisco, CA, USA, July 2-7, 2018. 862–865. https://doi.org/10.1109/CLOUD.2018. 00122 - [76] Joel Mandebi Mbongue, Danielle Tchuinkou Kwadjo, and Christophe Bobda. 2018. FLexiTASK: A Flexible FPGA Overlay for Efficient Multitasking. In Proceedings of the 2018 on Great Lakes Symposium on VLSI, GLSVLSI 2018, Chicago, IL, USA, May 23-25, 2018. 483–486. https://doi.org/10.1145/3194554.3194644 - [77] Michael Metzner, Jesus Lizarraga, and Christophe Bobda. 2015. Architecture Virtualization for Run-Time Hardware Multithreading on Field Programmable Gate Arrays. In Applied Reconfigurable Computing - 11th International Symposium, ARC 2015, Bochum, Germany, April 13-17, 2015, Proceedings. 167–178. https://doi.org/10.1007/978-3-319-16214-0\_14 - [78] Tuan LA Minh, Kaspar Matas, Nikola Grunchevski, Khoa Dang Pham, Dirk Koch, Tuan Minh La, M La, K Matas, N Grunchevski, K D Pham, and D Koch. 2020. FPGADefender. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 13, 3 (9 2020), 2020. https://doi.org/10.1145/3402937 - [79] Shayan Moini, Shanquan Tian, Jakub Szefer, Daniel Holcomb, and Russell Tessier. 2021. Power Side-Channel Attacks on BNN Accelerators in Remote FPGAs. - [80] N Nalla Anandakumar, M Sazadur Rahman, Mridha Md, Mashahedur Rahman, Rasheed Kibria, Upoma Das, Farimah Farahmandi, Fahim Rahman, and Mark M Tehranipoor. [n.d.]. FUTURE HARDWARE SECURITY RESEARCH SERIES 1 Rethinking Watermark: Providing Proof of IP Ownership in Modern SoCs. ([n.d.]). - [81] Arlindo L. Oliveira. 2001. Techniques for the creation of digital watermarks in sequential circuit designs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 20, 9 (9 2001), 1101–1117. https://doi.org/ 10.1109/43.945306 - [82] Masudul Hassan Quraishi, Erfan Bank Tavakoli, and Fengbo Ren. 2021. A Survey of System Architectures and Techniques for FPGA Virtualization. IEEE Transactions on Parallel and Distributed Systems 32, 9 (9 2021), 2216–2230. https://doi.org/10.1109/TPDS.2021.3063670 - [83] Sandip Ray and Yier Jin. 2015. Security Policy Enforcement in Modern SoC Designs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (Austin, TX, USA) (ICCAD '15). IEEE Press, Piscataway, NJ, USA, 345–350. http://dl.acm.org/citation.cfm?id=2840819.2840868 - [84] Cezar Reinbrecht, Altamiro Susin, Lilian Bossuet, Georg Sigl, and Johanna Sepúlveda. [n.d.]. Side Channel Attack on NoC-based MPSoCs are practical: NoC Prime+Probe Attack. ([n.d.]). - [85] M. Sabt, M. Achemlal, and A. Bouabdallah. 2015. Trusted Execution Environment: What It is, and What It is Not. In Trustcom/BigDataSE/ISPA, 2015 IEEE, Vol. 1. 57–64. https://doi.org/10.1109/Trustcom.2015.357 - [86] Sujan Kumar Saha and Christophe Bobda. 2020. FPGA Accelerated Embedded System Security Through Hardware Isolation. In 2020 Asian Hardware Oriented Security and Trust Symposium (AsianHOST). IEEE, 1–6. - [87] Falk Schellenberg, Dennis R.E. Gnad, Amir Moradi, and Mehdi B. Tahoori. 2018. An inside job: Remote power analysis attacks on FPGAs. *Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018* 2018-January (4 2018), 1111–1116. https://doi.org/10.23919/DATE.2018.8342177 - [88] R. Sivaraman, Sundararaman Rajagopalan, A. Sridevi, J. B.B. Rayappan, Moorthi Paramasivam Vijaya Annamalai, and Amirtharajan Rengarajan. 2020. Metastability-Induced TRNG Architecture on FPGA. Iranian Journal of Science and Technology - Transactions of Electrical Engineering 44, 1 (3 2020), 47–57. https://doi.org/10.1007/S40998-019-00234-2/TABLES/8 - [89] S Subashini and V Kavitha. 2011. A survey on security issues in service delivery models of cloud computing. *Journal of Network and Computer Applications* 34, 1 (2011), 1–11. https://doi.org/10.1016/j.jnca.2010.07.006 - [90] Rajat Subhra Chakraborty and Swarup Bhunia. [n.d.]. Hardware Protection and Authentication Through Netlist Level Obfuscation. ([n.d.]). - [91] T. Sugawara, K. Sakiyama, S. Nashimoto, D. Suzuki, and T. Nagatsuka. 2019. Oscillator without a combinatorial loop and its threat to FPGA in data centre. Electronics Letters 55, 11 (2019), 640–642. https://doi.org/10.1049/EL.2019.0163 - [92] Mohammad Tehranipoor and Cliff Wang. 2012. Introduction to hardware security and trust. *Introduction to Hardware Security and Trust* 9781441980809 (10 2012), 1–427. https://doi.org/10.1007/978-1-4419-8080-9 - [93] Shanquan Tian and Jakub Szefer. 2019. Temporal Thermal Covert Channels in Cloud FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '19). Association for Computing Machinery, New York, NY, USA, 298–303. https://doi.org/10.1145/3289602.3293920 - [94] Shanquan Tian, Wenjie Xiong, Ilias Giechaskiel, Kasper Rasmussen, and Jakub Szefer. 2020. Fingerprinting cloud FPGA infrastructures. FPGA 2020 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2 2020), 58–64. https://doi.org/10.1145/3373087.3375322 - [95] Srinivas Devadas Victor Costan. 2016. Intel SGX Explained. - [96] Zane Weissman, Thore Tiemann, Daniel Moghimi, Evan Custodio, Thomas Eisenbarth, and Berk Sunar. 2019. Jack-Hammer: Efficient Rowhammer on Heterogeneous FPGA-CPU Platforms. (12 2019). http://arxiv.org/abs/1912.11523 - [97] T. Wiersema, S. Drzevitzky, and M. Platzner. 2014. Memory security in reconfigurable computers: Combining formal verification with monitoring. In 2014 International Conference on Field-Programmable Technology (FPT). 167–174. https://doi.org/10.1109/FPT.2014.7082771 - [98] Alexander Wild, Amir Moradi, and Tim Güneysu. 2018. GliFreD: Glitch-Free Duplication Towards Power-Equalized Circuits on FPGAs. *IEEE Trans. Comput.* 67, 3 (3 2018), 375–387. https://doi.org/10.1109/TC.2017.2651829 - [99] Xilinx. 2014. TrustZone Technology Support in Zynq-7000 All Programmable SoCs. - [100] Mingfu Xue, Chongyan Gu, Weiqiang Liu, Shichao Yu, and Máire O'Neill. 2020. Ten years of hardware Trojans: a survey from the attacker's perspective. *IET Computers & Digital Techniques* 14, 6 (11 2020), 231–246. https://doi.org/10.1049/IET-CDT.2020.0041 - [101] B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar. 2009. Native Client: A Sandbox for Portable, Untrusted x86 Native Code. In 2009 30th IEEE Symposium on Security and Privacy. 79–93. https://doi.org/10.1109/SP.2009.25 - [102] Jiansong Zhang, Yongqiang Xiong, Ningyi Xu, Ran Shu, Bojie Li, Peng Cheng, Guo Chen, and Thomas Moscibroda. 2017. The Feniks FPGA Operating System for Cloud Computing. (2017). https://doi.org/10.1145/3124680.3124743 - [103] Dimitris Zissis and Dimitrios Lekkas. 2012. Addressing cloud computing security issues. Future Generation Comp. Syst. 28 (2 2012), 583–592. https://doi.org/10.1016/j.future.2010.12.006