Keyword: Accelerator : Search

research-article

FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-Art to Future Opportunities

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 4Article No.: 59, Pages 1–37https://doi.org/10.1145/3687480

Sparse matrix multiplication (SpMM) plays a critical role in high-performance computing applications, such as deep learning, image processing, and physical simulation. Field-Programmable Gate Arrays (FPGAs), with their configurable hardware resources, can ...

research-article

Application-Driven Exascale: The JUPITER Benchmark Suite

SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 32, Pages 1–45https://doi.org/10.1109/SC41406.2024.00038

Benchmarks are essential in the design of modern HPC installations, as they define key aspects of system components. Beyond synthetic workloads, it is crucial to include real applications that represent user requirements into benchmark suites, to ...

research-article

FPGA-assisted Design Space Exploration of Parameterized AI Accelerators: A Quickloop Approach

Journal of Systems Architecture: the EUROMICRO Journal (JOSA), Volume 155, Issue Chttps://doi.org/10.1016/j.sysarc.2024.103260

Abstract

FPGAs facilitate prototyping and debug, and recently accelerate full-stack simulations due to their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space explorations of parameterized RTL generators, especially ...

Highlights

Machine Learning.
Accelerators.
Systolic Array.
Design Space Exploration.

research-article

Open Access

UpDown: A Novel Architecture for Unlimited Memory Parallelism

MEMSYS '24: Proceedings of the International Symposium on Memory SystemsPages 61–77https://doi.org/10.1145/3695794.3695801

The emergence of HBM as a high-volume memory product has made memory bandwidths of 1.2TB/s (1 stack) to 4.8TB/s (4 stacks) feasible. Exploiting such bandwidths requires high memory level parallelism, but the memory access mechanisms in today’s CPUs are ...

research-article

ADS-CNN: Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs

Future Generation Computer Systems (FGCS), Volume 158, Issue CPages 138–149https://doi.org/10.1016/j.future.2024.04.038

Abstract

Lightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access ...

Article

Elastic Filter Prune in Deep Neural Networks Using Modified Weighted Hybrid Criterion

Knowledge Science, Engineering and ManagementPages 16–27https://doi.org/10.1007/978-981-97-5492-2_2

Abstract

The deployment of Convolutional Neural Networks (CNNs) on edge devices has gradually become a hot topic in research and application. However, simply pursuing high-performance networks is no longer suitable for scenarios that require comprehensive ...

poster

Harnessing the Power of the Neocortex System: Open Call for Research Applications

PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 107, Pages 1–3https://doi.org/10.1145/3626203.3670622

Neocortex[4] is a National Science Foundation (NSF)[8] system for artificial intelligence workflows that integrates hundreds of thousands of cores coupled with high-speed on-chip memory, making it ideal for complex AI tasks that require high throughput ...

research-article

P-ReTI: Silicon Photonic Accelerator for Greener and Real-Time AI

GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024Pages 766–769https://doi.org/10.1145/3649476.3660376

Computing deep AI algorithms on traditional CPUs and GPUs brings several performance and energy pitfalls. Most of the emerging AI accelerators target only the inference phase of deep learning. There have been very limited attempts to design a full-...

short-paper

Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion

GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024Pages 599–603https://doi.org/10.1145/3649476.3658810

Attention-based transformers have achieved significant performance breakthroughs in natural language processing (NLP) and computer vision (CV) tasks. Meanwhile, the ever-increasing length of today’s input sequences puts much pressure on computing ...

short-paper

Integrated MAC-based Systolic Arrays: Design and Performance Evaluation

GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024Pages 292–295https://doi.org/10.1145/3649476.3658797

In the rapidly advancing landscape of computing, hardware accelerator designs are pivotal for satisfying high performance and low power demands. Systolic array (SA) architectures, tailored for general matrix multiplication (GEMM) operations, are ideal ...

Article

Accelerating WebAssembly Interpreters in Embedded Systems Through Hardware-Assisted Dispatching

Architecture of Computing SystemsPages 207–220https://doi.org/10.1007/978-3-031-66146-4_14

Abstract

WebAssembly is a promising bytecode virtualization technology for embedded systems. WebAssembly interpreters for embedded demonstrate strong isolation and portability. However, they come with a significant performance penalty compared to direct ...

Article

nAIxt: A Light-Weight Processor Architecture for Efficient Computation of Neuron Models

Architecture of Computing SystemsPages 3–17https://doi.org/10.1007/978-3-031-66146-4_1

Abstract

The simulation of biological neural networks holds immense promise for advancing both neuroscience and artificial intelligence. Due to its high complexity, it requires powerful computers. However, the high proportion of communication and routing ...

research-article

Diagnosis of Parkinson's Disease Using Convolutional Neural Network-Based Audio Signal Processing on FPGA

Circuits, Systems, and Signal Processing (CSSP), Volume 43, Issue 7Pages 4221–4238https://doi.org/10.1007/s00034-024-02636-y

Abstract

This study proposes a new method for diagnosing Parkinson's disease using audio signals and FPGA-based convolutional neural networks. The proposed method involves training a convolutional neural network and using deep learning techniques to ...

research-article

Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques

Journal of Real-Time Image Processing (SPJRTIP), Volume 21, Issue 3https://doi.org/10.1007/s11554-024-01442-8

Abstract

With the recent advancements in high-performance computing, convolutional neural networks (CNNs) have achieved remarkable success in various vision tasks. However, along with improvements in model accuracy, the size and computational complexity of ...

research-article

Neural network accelerator with fast buffer design for computer vision

Journal of Real-Time Image Processing (SPJRTIP), Volume 21, Issue 2https://doi.org/10.1007/s11554-024-01423-x

Abstract

Recently, the neural networks with convolution computation is widely used for image classification and recognition. For real-time implementation, the video buffer is required to store the image temperately. However, traditional buffers like CLSB (...

research-article

Application-level Validation of Accelerator Designs Using a Formal Software/Hardware Interface

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 2Article No.: 35, Pages 1–25https://doi.org/10.1145/3639051

Ideally, accelerator development should be as easy as software development. Several recent design languages/tools are working toward this goal, but actually testing early designs on real applications end-to-end remains prohibitively difficult due to the ...

research-article

Flip: Data-centric Edge CGRA Accelerator

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 1Article No.: 22, Pages 1–25https://doi.org/10.1145/3631118

Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the ...

research-article

Open Access

Of Apples and Oranges: Fair Comparisons in Heterogenous Systems Evaluation

HotNets '23: Proceedings of the 22nd ACM Workshop on Hot Topics in NetworksPages 1–8https://doi.org/10.1145/3626111.3628186

Accelerators, such as GPUs, SmartNICs and FPGAs, are common components of research systems today. This paper focuses on the question of how to fairly compare these systems. This is challenging because it requires comparing systems that use different ...

research-article

Open Access

Characterizing the Performance of Triangle Counting on Graphcore's IPU Architecture

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1949–1957https://doi.org/10.1145/3624062.3624608

In recent years, we have seen an emergence of novel spatial architectures to accelerate domain-specific workloads like Machine Learning. There is a need to investigate their performance characteristics for traditional HPC workloads for their tighter ...

research-article

SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs

ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 6Article No.: 101, Pages 1–22https://doi.org/10.1145/3624582

Convolutional neural networks (CNNs) are essential for advancing the field of artificial intelligence. However, since these networks are highly demanding in terms of memory and computation, implementing CNNs can be challenging. To make CNNs more ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences