Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-Art to Future Opportunities
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 4Article No.: 59, Pages 1–37https://doi.org/10.1145/3687480Sparse matrix multiplication (SpMM) plays a critical role in high-performance computing applications, such as deep learning, image processing, and physical simulation. Field-Programmable Gate Arrays (FPGAs), with their configurable hardware resources, can ...
Application-Driven Exascale: The JUPITER Benchmark Suite
- Andreas Herten,
- Sebastian Achilles,
- Damian Alvarez,
- Jayesh Badwaik,
- Eric Behle,
- Mathis Bode,
- Thomas Breuer,
- Daniel Caviedes-Voullième,
- Mehdi Cherti,
- Adel Dabah,
- Salem El Sayed,
- Wolfgang Frings,
- Ana Gonzalez-Nicolas,
- Eric B. Gregory,
- Kaveh Haghighi Mood,
- Thorsten Hater,
- Jenia Jitsev,
- Chelsea Maria John,
- Jan H. Meinke,
- Catrin I. Meyer,
- Pavel Mezentsev,
- Jan-Oliver Mirus,
- Stepan Nassyr,
- Carolin Penke,
- Manoel Römmer,
- Ujjwal Sinha,
- Benedikt von St. Vieth,
- Olaf Stein,
- Estela Suarez,
- Dennis Willsch,
- Ilya Zhukov
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 32, Pages 1–45https://doi.org/10.1109/SC41406.2024.00038Benchmarks are essential in the design of modern HPC installations, as they define key aspects of system components. Beyond synthetic workloads, it is crucial to include real applications that represent user requirements into benchmark suites, to ...
- research-articleOctober 2024
FPGA-assisted Design Space Exploration of Parameterized AI Accelerators: A Quickloop Approach
Journal of Systems Architecture: the EUROMICRO Journal (JOSA), Volume 155, Issue Chttps://doi.org/10.1016/j.sysarc.2024.103260AbstractFPGAs facilitate prototyping and debug, and recently accelerate full-stack simulations due to their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space explorations of parameterized RTL generators, especially ...
Highlights- Machine Learning.
- Accelerators.
- Systolic Array.
- Design Space Exploration.
- research-articleDecember 2024
UpDown: A Novel Architecture for Unlimited Memory Parallelism
MEMSYS '24: Proceedings of the International Symposium on Memory SystemsPages 61–77https://doi.org/10.1145/3695794.3695801The emergence of HBM as a high-volume memory product has made memory bandwidths of 1.2TB/s (1 stack) to 4.8TB/s (4 stacks) feasible. Exploiting such bandwidths requires high memory level parallelism, but the memory access mechanisms in today’s CPUs are ...
- research-articleSeptember 2024
ADS-CNN: Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs
Future Generation Computer Systems (FGCS), Volume 158, Issue CPages 138–149https://doi.org/10.1016/j.future.2024.04.038AbstractLightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access ...
-
- ArticleAugust 2024
Elastic Filter Prune in Deep Neural Networks Using Modified Weighted Hybrid Criterion
AbstractThe deployment of Convolutional Neural Networks (CNNs) on edge devices has gradually become a hot topic in research and application. However, simply pursuing high-performance networks is no longer suitable for scenarios that require comprehensive ...
- posterJuly 2024
Harnessing the Power of the Neocortex System: Open Call for Research Applications
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 107, Pages 1–3https://doi.org/10.1145/3626203.3670622Neocortex[4] is a National Science Foundation (NSF)[8] system for artificial intelligence workflows that integrates hundreds of thousands of cores coupled with high-speed on-chip memory, making it ideal for complex AI tasks that require high throughput ...
- research-articleJune 2024
P-ReTI: Silicon Photonic Accelerator for Greener and Real-Time AI
GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024Pages 766–769https://doi.org/10.1145/3649476.3660376Computing deep AI algorithms on traditional CPUs and GPUs brings several performance and energy pitfalls. Most of the emerging AI accelerators target only the inference phase of deep learning. There have been very limited attempts to design a full-...
- short-paperJune 2024
Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion
GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024Pages 599–603https://doi.org/10.1145/3649476.3658810Attention-based transformers have achieved significant performance breakthroughs in natural language processing (NLP) and computer vision (CV) tasks. Meanwhile, the ever-increasing length of today’s input sequences puts much pressure on computing ...
- short-paperJune 2024
Integrated MAC-based Systolic Arrays: Design and Performance Evaluation
GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024Pages 292–295https://doi.org/10.1145/3649476.3658797In the rapidly advancing landscape of computing, hardware accelerator designs are pivotal for satisfying high performance and low power demands. Systolic array (SA) architectures, tailored for general matrix multiplication (GEMM) operations, are ideal ...
- ArticleAugust 2024
Accelerating WebAssembly Interpreters in Embedded Systems Through Hardware-Assisted Dispatching
AbstractWebAssembly is a promising bytecode virtualization technology for embedded systems. WebAssembly interpreters for embedded demonstrate strong isolation and portability. However, they come with a significant performance penalty compared to direct ...
- ArticleAugust 2024
nAIxt: A Light-Weight Processor Architecture for Efficient Computation of Neuron Models
AbstractThe simulation of biological neural networks holds immense promise for advancing both neuroscience and artificial intelligence. Due to its high complexity, it requires powerful computers. However, the high proportion of communication and routing ...
- research-articleApril 2024
Diagnosis of Parkinson's Disease Using Convolutional Neural Network-Based Audio Signal Processing on FPGA
- Hamid Majidinia,
- Farzan Khatib,
- Seyyed Javad Seyyed Mahdavi Chabok,
- Hamid Reza Kobravi,
- Fariborz Rezaeitalab
Circuits, Systems, and Signal Processing (CSSP), Volume 43, Issue 7Pages 4221–4238https://doi.org/10.1007/s00034-024-02636-yAbstractThis study proposes a new method for diagnosing Parkinson's disease using audio signals and FPGA-based convolutional neural networks. The proposed method involves training a convolutional neural network and using deep learning techniques to ...
- research-articleMarch 2024
Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques
Journal of Real-Time Image Processing (SPJRTIP), Volume 21, Issue 3https://doi.org/10.1007/s11554-024-01442-8AbstractWith the recent advancements in high-performance computing, convolutional neural networks (CNNs) have achieved remarkable success in various vision tasks. However, along with improvements in model accuracy, the size and computational complexity of ...
- research-articleMarch 2024
Neural network accelerator with fast buffer design for computer vision
Journal of Real-Time Image Processing (SPJRTIP), Volume 21, Issue 2https://doi.org/10.1007/s11554-024-01423-xAbstractRecently, the neural networks with convolution computation is widely used for image classification and recognition. For real-time implementation, the video buffer is required to store the image temperately. However, traditional buffers like CLSB (...
- research-articleFebruary 2024
Application-level Validation of Accelerator Designs Using a Formal Software/Hardware Interface
- Bo-Yuan Huang,
- Steven Lyubomirsky,
- Yi Li,
- Mike He,
- Gus Henry Smith,
- Thierry Tambe,
- Akash Gaonkar,
- Vishal Canumalla,
- Andrew Cheung,
- Gu-Yeon Wei,
- Aarti Gupta,
- Zachary Tatlock,
- Sharad Malik
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 2Article No.: 35, Pages 1–25https://doi.org/10.1145/3639051Ideally, accelerator development should be as easy as software development. Several recent design languages/tools are working toward this goal, but actually testing early designs on real applications end-to-end remains prohibitively difficult due to the ...
- research-articleDecember 2023
Flip: Data-centric Edge CGRA Accelerator
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 1Article No.: 22, Pages 1–25https://doi.org/10.1145/3631118Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the ...
- research-articleNovember 2023
Of Apples and Oranges: Fair Comparisons in Heterogenous Systems Evaluation
HotNets '23: Proceedings of the 22nd ACM Workshop on Hot Topics in NetworksPages 1–8https://doi.org/10.1145/3626111.3628186Accelerators, such as GPUs, SmartNICs and FPGAs, are common components of research systems today. This paper focuses on the question of how to fairly compare these systems. This is challenging because it requires comparing systems that use different ...
Characterizing the Performance of Triangle Counting on Graphcore's IPU Architecture
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1949–1957https://doi.org/10.1145/3624062.3624608In recent years, we have seen an emergence of novel spatial architectures to accelerate domain-specific workloads like Machine Learning. There is a need to investigate their performance characteristics for traditional HPC workloads for their tighter ...
- research-articleNovember 2023
SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs
ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 6Article No.: 101, Pages 1–22https://doi.org/10.1145/3624582Convolutional neural networks (CNNs) are essential for advancing the field of artificial intelligence. However, since these networks are highly demanding in terms of memory and computation, implementing CNNs can be challenging. To make CNNs more ...