Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- posterApril 2024
A Comprehensive Evaluation of FPGA-Based Spatial Acceleration of LLMs
- Hongzheng Chen,
- Jiahao Zhang,
- Yixiao Du,
- Shaojie Xiang,
- Zichao Yue,
- Niansong Zhang,
- Yaohui Cai,
- Zhiru Zhang
FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate ArraysApril 2024, Page 185https://doi.org/10.1145/3626202.3637600Recent advancements in large language models (LLMs) have generated significant demands for efficient deployment in inference workloads. Most existing approaches rely on temporal architectures that reuse hardware units for different network layers and ...
- posterApril 2024
Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate ArraysApril 2024, Page 184https://doi.org/10.1145/3626202.3637593High-Level Synthesis enables the rapid prototyping of hardware accelerators, by combining a high-level description of the functional behavior of a kernel with a set of micro-architecture optimizations as inputs. Such pragmas may describe the pipelining ...
- research-articleMarch 2023
High-level Synthesis for Domain Specific Computing
ISPD '23: Proceedings of the 2023 International Symposium on Physical DesignMarch 2023, Pages 211–219https://doi.org/10.1145/3569052.3580027This paper proposes a High-Level Synthesis (HLS) framework for domain-specific computing. The framework contains three key components: 1) ScaleHLS, a multi-level HLS compilation flow. Aimed to address the lack of expressiveness and hardware-dedicated ...
- short-paperMarch 2022
Marrying WebRTC and DASH for interactive streaming
MHV '22: Proceedings of the 1st Mile-High Video ConferenceMarch 2022, Page 98https://doi.org/10.1145/3510450.3517296WebRTC is a set of W3C and IETF standards that allows the delivery of real-time content to users, with an end-to-end latency of under half a second. Support for WebRTC is built into all modern browsers across desktop and mobile devices, and it allows ...
- research-articleFebruary 2022
High-Performance Sparse Linear Algebra on HBM-Equipped FPGAs Using HLS: A Case Study on SpMV
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2022, Pages 54–64https://doi.org/10.1145/3490422.3502368Sparse linear algebra operators are memory bound due to low compute to memory access ratio and irregular data access patterns. The exceptional bandwidth improvement provided by the emerging high-bandwidth memory (HBM) technologies, coupled with the ...
-
- research-articleFebruary 2022Best Paper
RapidStream: Parallel Physical Implementation of FPGA HLS Designs
- Licheng Guo,
- Pongstorn Maidee,
- Yun Zhou,
- Chris Lavin,
- Jie Wang,
- Yuze Chi,
- Weikang Qiao,
- Alireza Kaviani,
- Zhiru Zhang,
- Jason Cong
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2022, Pages 1–12https://doi.org/10.1145/3490422.3502361FPGAs require a much longer compilation cycle than conventional computing platforms like CPUs. In this paper, we shorten the overall compilation time by co-optimizing the HLS compilation (C-to-RTL) and the back-end physical implementation (RTL-to-...
- research-articleFebruary 2022
Accelerating SSSP for Power-Law Graphs
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2022, Pages 190–200https://doi.org/10.1145/3490422.3502358The single-source shortest path (SSSP) problem is one of the most important and well-studied graph problems widely used in many application domains, such as road navigation, neural image reconstruction, and social network analysis. Although we have ...
- posterFebruary 2022
Synthesized Garbage Collection for FPGA Accelerators
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2022, Page 53https://doi.org/10.1145/3490422.3502341Speed and ease of accelerator design is a growing need. High level programming languages have provided significant gains in the software world, but lag in the hardware realm. We present a hardware implementation of a garbage collector, which automates ...
- short-paperOctober 2021
A Complete End to End Open Source Toolchain for the Versatile Video Coding (VVC) Standard
- Adam Wieckowski,
- Christian Lehmann,
- Benjamin Bross,
- Detlev Marpe,
- Thibaud Biatek,
- Mikael Raulet,
- Jean Le Feuvre
MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021, Pages 3795–3798https://doi.org/10.1145/3474085.3478320Versatile Video Coding (VVC) is the most recent international video coding standard jointly developed by ITU-T and ISO/IEC, which has been finalized in July 2020. VVC allows for significant bit-rate reductions around 50% for the same subjective video ...
- posterFebruary 2021
Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2021, Pages 145–146https://doi.org/10.1145/3431920.3439457Image processing algorithms can benefit tremendously from hardware acceleration. However, hardware accelerators for image processing algorithms look very different from the programs that image processing algorithm designers are accustomed to writing. ...
- research-articleFebruary 2021Best Paper
AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2021, Pages 81–92https://doi.org/10.1145/3431920.3439289Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable clock frequency between an HLS-generated design and a handcrafted RTL one. A key factor that limits ...
- research-articleFebruary 2021
Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2021, Pages 105–115https://doi.org/10.1145/3431920.3439284With the public availability of FPGAs from major cloud service providers like AWS, Alibaba, and Nimbix, hardware and software developers can now easily access FPGA platforms. However, it is nontrivial to develop efficient FPGA accelerators, especially ...
- short-paperJune 2020
Pedal to the Bare Metal: Road Traffic Simulation on FPGAs Using High-Level Synthesis
SIGSIM-PADS '20: Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationJune 2020, Pages 117–121https://doi.org/10.1145/3384441.3395979The performance of Agent-based Traffic Simulations (ABTS) has been shown to benefit tremendously from offloading to accelerators such as GPUs. In the search for the most suitable hardware platform, reconfigurable hardware is a natural choice. Some ...
- posterFebruary 2020
DBHI: A Tool for Decoupled Functional Hardware-Software Co-Design on SoCs
- Unai Martinez-Corral,
- Guillermo Callaghan,
- Konstantinos Iordanou,
- Cosmin Gorgovan,
- Koldo Basterretxea,
- Mikel Lujan
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2020, Page 326https://doi.org/10.1145/3373087.3375386This paper presents a system-level co-simulation and co-verification workflow to ease the transition from a software-only procedure, executed in a General Purpose processor, to the integration of a custom hardware accelerator developed in a Hardware ...
- posterFebruary 2020
Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2020, Page 318https://doi.org/10.1145/3373087.3375355High Level Synthesis (HLS) tools, like the Intel FPGA SDK for OpenCL, improve hardware design productivity and enable efficient design space exploration, by providing simple program directives (pragmas) and/or API calls that allow hardware programmers ...
- posterFebruary 2020
Advanced Dataflow Programming using Actor Machines for High-Level Synthesis
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2020, Page 310https://doi.org/10.1145/3373087.3375330The use of parallelism has increased drastically in recent years. Parallel platforms come in many forms: multi-core processors, embedded hybrid solutions such as multi-processor system-on-chip with reconfigurable logic, and cloud datacenters with multi-...
- posterFebruary 2020
DOMIS: Dual-Bank Optimal Micro-Architecture for Iterative Stencils
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2020, Page 315https://doi.org/10.1145/3373087.3375329High-Level Synthesis (HLS) can achieve significant performance improvements through effective memory partitioning and meticulous data reuse. Many modern applications, such as medical imaging and convolutional layers in a CNN, mostly contain kernels ...
- research-articleFebruary 2020
Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2020, Pages 244–254https://doi.org/10.1145/3373087.3375296Data movement is the dominating factor affecting performance and energy in modern computing systems. Consequently, many algorithms have been developed to minimize the number of I/O operations for common computing patterns. Matrix multiplication is no ...
- posterFebruary 2019
Optimizing Order-Associative Kernel Computation with Joint Memory Banking and Data Reuse
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2019, Pages 189–190https://doi.org/10.1145/3289602.3293980In this paper, we develop a joint strategy of memory banking and data reuse to specifically optimize the memory performance of any given order-associative and stencil-based computing kernel i.e., its iteration order can be reordered freely without ...
- posterFebruary 2019
Building FPGA State Machines from Sequential Code
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2019, Page 186https://doi.org/10.1145/3289602.3293965State machines are commonly used and well understood for hardware. However, in some cases they can introduce complexity as the program can no longer be read sequentially. We propose an extension to the SME model, which retains the sequential program ...