Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2024JUST ACCEPTED
- surveyDecember 2024
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
ACM Computing Surveys (CSUR), Volume 57, Issue 4Article No.: 91, Pages 1–35https://doi.org/10.1145/3703453Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence for the past few years. As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have grown continuously, ...
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 40, Pages 1–19https://doi.org/10.1109/SC41406.2024.00046Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techniques reduce bottlenecks associated ...
- research-articleNovember 2024
Invited: New Solutions on LLM Acceleration, Optimization, and Application
- Yingbing Huang,
- Lily Jiaxin Wan,
- Hanchen Ye,
- Manvi Jha,
- Jinghua Wang,
- Yuhong Li,
- Xiaofan Zhang,
- Deming Chen
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 369, Pages 1–4https://doi.org/10.1145/3649329.3663517Large Language Models (LLMs) have revolutionized a wide range of applications with their strong human-like understanding and creativity. Due to the continuously growing model size and complexity, LLM training and deployment have shown significant ...
-
- short-paperJune 2024
Accelerating Boolean Constraint Propagation for Efficient SAT-Solving on FPGAs
GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024Pages 305–309https://doi.org/10.1145/3649476.3658808We present a hardware-accelerated SAT solver targeting processor/Field Programmable Gate Arrays (FPGA) SoCs. Our solution accelerates the most expensive subroutine of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, Boolean Constraint Propagation (...
- short-paperAugust 2024
Acceleration of Ultrasound Neurostimulation Using Mixed-Precision Arithmetic
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed ComputingPages 370–372https://doi.org/10.1145/3625549.3658823Ultrasound neurostimulation, a technique that modulates the brain's electrical activity, has emerged as a significant secondary treatment option for cases resistant to pharmacological interventions. The therapy is achievable through the application of a ...
- research-articleMarch 2024
XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
- Xijie Jia,
- Yu Zhang,
- Guangdong Liu,
- Xinlin Yang,
- Tianyu Zhang,
- Jia Zheng,
- Dongdong Xu,
- Zhuohuan Liu,
- Mengke Liu,
- Xiaoyang Yan,
- Hong Wang,
- Rongzhang Zheng,
- Li Wang,
- Dong Li,
- Satyaprakash Pareek,
- Jian Weng,
- Lu Tian,
- Dongliang Xie,
- Hong Luo,
- Yi Shan
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 2Article No.: 20, Pages 1–24https://doi.org/10.1145/3617836Today, convolutional neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this ...
- research-articleMarch 2024
Analyzing Operation Efficiency of a City Transportation System by the U-Statistics Methods. II. Optimization of the Interactive Evaluation Methods
Cybernetics and Systems Analysis (KLU-CASA), Volume 60, Issue 2Pages 268–275https://doi.org/10.1007/s10559-024-00667-6AbstractThe authors formalize the technique of interactive evaluation of the operation efficiency of the motor vehicle system of a large city based on U-statistics methods. To optimize this technique, the authors propose efficient algorithmic ...
- research-articleJanuary 2024
Computing Acceleration to Genome-Wide Association Study Based on CPU/FPGA Heterogeneous System
ACM SIGAPP Applied Computing Review (SIGAPP), Volume 23, Issue 4Pages 16–26https://doi.org/10.1145/3642964.3642966Genome Wide Association Study (GWAS) reveals the influence of single nucleotide polymorphisms (SNP) and other genetic markers on the complex genetic disease traits, making a significant contribution to the prevention and treatment of genetic diseases. ...
- research-articleApril 2024
Tennis players' hitting action recognition method based on multimodal data
International Journal of Biometrics (IJOB), Volume 16, Issue 3-4Pages 317–336https://doi.org/10.1504/ijbm.2024.138223In order to improve the recognition accuracy of hitting movements, a tennis player hitting movement recognition method based on multimodal data is proposed. First, we collect acceleration modal data of hitting movements and extract acceleration ...
- research-articleNovember 2023
Monotone Inclusions, Acceleration, and Closed-Loop Control
Mathematics of Operations Research (MOOR), Volume 48, Issue 4Pages 2353–2382https://doi.org/10.1287/moor.2022.1343We propose and analyze a new dynamical system with a closed-loop control law in a Hilbert space H, aiming to shed light on the acceleration phenomenon for monotone inclusion problems, which unifies a broad class of optimization, saddle point, and ...
- ArticleNovember 2023
Real Acceleration of Communication Process in Distributed Algorithms with Compression
AbstractModern applied optimization problems become more and more complex every day. Due to this fact, distributed algorithms that can speed up the process of solving an optimization problem through parallelization are of great importance. The main ...
- research-articleJuly 2023
Simplicity done right for SIMDified query processing on CPU and FPGA
SiMoD '23: Proceedings of the 1st Workshop on Simplicity in Management of DataArticle No.: 3, Pages 1–5https://doi.org/10.1145/3596225.3596229We present a simple but effective solution idea to port SIMDified query processing code to Intel® FPGA cards for acceleration. The main advantage of our approach is the seamless integration with existing SIMD abstraction libraries originally developed ...
- research-articleJune 2023
Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 169, Pages 1–27https://doi.org/10.1145/3589314The densest subgraph problem (DSP) is of great significance due to its wide applications in different domains. Meanwhile, diverse requirements in various applications lead to different density variants for DSP. Unfortunately, existing DSP algorithms ...
- research-articleMay 2023
Scalable High-Performance Architecture for Evolving Recommender System
EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and SystemsPages 154–162https://doi.org/10.1145/3578356.3592594Recommender systems are expected to scale to the requirement of the large number of recommendations made to the customers and to keep the latency of recommendations within a stringent limit. Such requirements make architecting a recommender system a ...
- research-articleFebruary 2023
Exploiting Data Parallelism in Graph-Based Simultaneous Localization and Mapping: A Case Study with GPU Accelerations
HPCAsia '23: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionPages 126–139https://doi.org/10.1145/3578178.3578237Graph-based simultaneous localization and mapping (G-SLAM) is an intuitive SLAM implementation where graphs are used to represent poses, landmarks and sensor measurements when a mobile robot builds a map of the environment and locates itself in it. ...
- research-articleFebruary 2023
ENCORE: Efficient Architecture Verification Framework with FPGA Acceleration
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate ArraysPages 209–219https://doi.org/10.1145/3543622.3573187Verification typically consumes the majority of the time in the hardware development cycle. Primarily this is because multiple iterations to debug hardware using software simulation is extremely time-consuming. While FPGAs can be utilised to accelerate ...
- research-articleJanuary 2023
Accelerating Convolutional Neural Networks in Frequency Domain via Kernel-Sharing Approach
ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation ConferencePages 733–738https://doi.org/10.1145/3566097.3567862Convolutional neural networks (CNNs) are typically computationally heavy. Fast algorithms such as fast Fourier transforms (FFTs), are promising in significantly reducing computation complexity by replacing convolutions with frequency-domain element-wise ...
- research-articleMarch 2024
Evaluation Methods and Differences between Three Dimensional And Two Dimensional Movies by Physiological Measurements Using A Commercial 6-Axis Sensor
- Hideyuki Kanematsu,
- Dana M. Barry,
- Nobuyuki Ogawa,
- Kuniaki Yajima,
- Katsuko T. Nakahira,
- Shin-nosuke Suzuki,
- Takehito Kato,
- Tatsuya Shirai,
- Masashi Kawaguchi,
- Michiko Yoshitake
Procedia Computer Science (PROCS), Volume 225, Issue CPages 4631–4639https://doi.org/10.1016/j.procs.2023.10.461AbstractThe term "metaverse" has become a common word in recent years. This term, a virtual space on the Internet, is gaining importance in various fields in today's era of remarkable progress toward the fusion of virtual space and real space using ...