Theory of computation

Applied Filters

People

Publications

Publication Date

Searched The ACM Guide to Computing Literature (3,846,589 records)|Limit your search to The ACM Full-Text Collection (776,024 records)

Showing 1 - 20of54 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Open Access
February 2025
JUST ACCEPTED
Comprehensive Evaluation and Opportunity Discovery for Deterministic Concurrency Control
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3715126
Deterministic concurrency control (DCC) guarantees that the same input transactions produce the same serializable result. It offers benefits in both distributed databases and blockchain systems. Dozens of DCC algorithms have emerged in the past decade. ...
0
7
Metrics
Total Citations0
Total Downloads7
Last 12 Months7
Last 6 weeks7
View online with eReader
PDF
research-article
Open Access
November 2024
An Optimized GPU Implementation for GIST Descriptor
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 4Article No.: 78, Pages 1–24https://doi.org/10.1145/3689339
The GIST descriptor is a classic feature descriptor primarily used for scene categorization and recognition tasks. It drives a bank of Gabor filters, which respond to edges and textures at various scales and orientations to capture the spatial structures ...
0
552
Metrics
Total Citations0
Total Downloads552
Last 12 Months552
Last 6 weeks134
View online with eReader
PDF
research-article
Open Access
November 2024
JUST ACCEPTED
RT-GNN: Accelerating Sparse Graph Neural Networks by Tensor-CUDA Kernel Fusion
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3702001
Graph Neural Networks (GNNs) have achieved remarkable successes in various graph-based learning tasks, thanks to their ability to leverage advanced GPUs. However, GNNs currently face challenges arising from the concurrent use of advanced Tensor Cores (TCs)...
0
302
Metrics
Total Citations0
Total Downloads302
Last 12 Months302
Last 6 weeks74
View online with eReader
PDF
research-article
Open Access
November 2024
JUST ACCEPTED
ApSpGEMM: Accelerating Large-scale SpGEMM with Heterogeneous Collaboration and Adaptive Panel
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3703352
The Sparse General Matrix-Matrix multiplication (SpGEMM) is a fundamental component for many applications, such as algebraic multigrid methods (AMG), graphic processing, and deep learning. However, the unbearable latency of computing high-dimensional, ...
0
296
Metrics
Total Citations0
Total Downloads296
Last 12 Months296
Last 6 weeks89
View online with eReader
PDF
research-article
Open Access
November 2024
JUST ACCEPTED
Characterizing and Understanding HGNN Training on GPUs
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3703356
Owing to their remarkable representation capabilities for heterogeneous graph data, Heterogeneous Graph Neural Networks (HGNNs) have been widely adopted in many critical real-world domains such as recommendation systems and medical analysis. Prior to ...
0
175
Metrics
Total Citations0
Total Downloads175
Last 12 Months175
Last 6 weeks57
View online with eReader
PDF
research-article
Open Access
October 2024
JUST ACCEPTED
MemoriaNova: Optimizing Memory-Aware Model Inference for Edge Computing
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3701997
In recent years, deploying deep learning models on edge devices has become pervasive, driven by the increasing demand for intelligent edge computing solutions across various industries. From industrial automation to intelligent surveillance and healthcare,...
0
197
Metrics
Total Citations0
Total Downloads197
Last 12 Months197
Last 6 weeks63
View online with eReader
PDF
research-article
Open Access
September 2024
GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core Systems
- Junkaixuan Li,
- Yi Kang
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 3Article No.: 48, Pages 1–25https://doi.org/10.1145/3661998
With the explosive growth of graph data, distributed graph processing has become popular, and many graph hardware accelerators use distributed frameworks. Graph partitioning is foundation in distributed graph processing. However, dynamic changes in graph ...
0
667
Metrics
Total Citations0
Total Downloads667
Last 12 Months667
Last 6 weeks123
View online with eReader
PDF
research-article
Open Access
September 2024
An Example of Parallel Merkle Tree Traversal: Post-Quantum Leighton-Micali Signature on the GPU
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 3Article No.: 44, Pages 1–25https://doi.org/10.1145/3659209
The hash-based signature (HBS) is the most conservative and time-consuming among many post-quantum cryptography (PQC) algorithms. Two HBSs, LMS and XMSS, are the only PQC algorithms standardised by the National Institute of Standards and Technology (NIST) ...
2
968
Metrics
Total Citations2
Total Downloads968
Last 12 Months968
Last 6 weeks133
View online with eReader
PDF
research-article
Open Access
May 2024
TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 2Article No.: 37, Pages 1–26https://doi.org/10.1145/3652604
Many real-world networks are characterized by being temporal and dynamic, wherein the temporal information signifies the changes in connections, such as the addition or removal of links between nodes. Employing random walks on these temporal networks is a ...
0
987
Metrics
Total Citations0
Total Downloads987
Last 12 Months987
Last 6 weeks124
View online with eReader
PDF
research-article
Open Access
March 2024
Cost-aware Service Placement and Scheduling in the Edge-Cloud Continuum
- Samuel Rac,
- Mats Brorsson
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 2Article No.: 29, Pages 1–24https://doi.org/10.1145/3640823
The edge to data center computing continuum is the aggregation of computing resources located anywhere between the network edge (e.g., close to 5G antennas), and servers in traditional data centers. Kubernetes is the de facto standard for the ...
4
1,709
Metrics
Total Citations4
Total Downloads1,709
Last 12 Months1,648
Last 6 weeks219
View online with eReader
PDF
research-article
Open Access
February 2024
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 1Article No.: 19, Pages 1–29https://doi.org/10.1145/3632950
Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states ...
3
1,704
Metrics
Total Citations3
Total Downloads1,704
Last 12 Months1,626
Last 6 weeks207
1
Supplementary Material
3632950.supp
View online with eReader
PDF
research-article
Open Access
December 2023
Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 1Article No.: 4, Pages 1–25https://doi.org/10.1145/3631709
Leveraging the SIMD capability of modern CPU architectures is mandatory to take full advantage of their increased performance. To exploit this capability, binary executables must be vectorized, either manually by developers or automatically by a tool. For ...
2
1,821
Metrics
Total Citations2
Total Downloads1,821
Last 12 Months1,579
Last 6 weeks124
View online with eReader
PDF
research-article
Open Access
December 2023
PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 4Article No.: 52, Pages 1–25https://doi.org/10.1145/3624569
Dense linear algebra operations appear very frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieve optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are ...
1
1,230
Metrics
Total Citations1
Total Downloads1,230
Last 12 Months815
Last 6 weeks78
View online with eReader
PDF
research-article
Open Access
March 2023
Source Matching and Rewriting for MLIR Using String-Based Automata
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 2Article No.: 22, Pages 1–26https://doi.org/10.1145/3571283
A typical compiler flow relies on a uni-directional sequence of translation/optimization steps that lower the program abstract representation, making it hard to preserve higher-level program information across each transformation step. On the other hand, ...
2
3,627
Metrics
Total Citations2
Total Downloads3,627
Last 12 Months1,595
Last 6 weeks134
View online with eReader
PDF
research-article
Open Access
December 2022
XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 1Article No.: 17, Pages 1–25https://doi.org/10.1145/3568956
Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they are reused in ...
0
1,765
Metrics
Total Citations0
Total Downloads1,765
Last 12 Months501
Last 6 weeks69
View online with eReader
PDF
research-article
Open Access
March 2022
CARL: Compiler Assigned Reference Leasing
ACM Transactions on Architecture and Code Optimization (TACO), Volume 19, Issue 1Article No.: 15, Pages 1–28https://doi.org/10.1145/3498730
Data movement is a common performance bottleneck, and its chief remedy is caching. Traditional cache management is transparent to the workload: data that should be kept in cache are determined by the recency information only, while the program information,...
1
1,378
Metrics
Total Citations1
Total Downloads1,378
Last 12 Months477
Last 6 weeks73
View online with eReader
PDF
research-article
Open Access
March 2022
Weaving Synchronous Reactions into the Fabric of SSA-form Compilers
ACM Transactions on Architecture and Code Optimization (TACO), Volume 19, Issue 2Article No.: 22, Pages 1–25https://doi.org/10.1145/3506706
We investigate the programming of reactive systems combining closed-loop control with performance-intensive components such as Machine Learning (ML). Reactive control systems are often safety-critical and associated with real-time execution requirements, ...
1
1,787
Metrics
Total Citations1
Total Downloads1,787
Last 12 Months532
Last 6 weeks72
View online with eReader
PDF
research-article
Open Access
September 2021
SortCache: Intelligent Cache Management for Accelerating Sparse Data Workloads
ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 4Article No.: 56, Pages 1–24https://doi.org/10.1145/3473332

Sparse data applications have irregular access patterns that stymie modern memory architectures. Although hyper-sparse workloads have received considerable attention in the past, moderately-sparse workloads prevalent in machine learning applications, ...
4
882
Metrics
Total Citations4
Total Downloads882
Last 12 Months224
Last 6 weeks31
View online with eReader
View this article in HTML format
PDF
research-article
Open Access
July 2021
All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns
- Jerzy Proficz
ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 4Article No.: 41, Pages 1–22https://doi.org/10.1145/3460122

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI ...
1
681
Metrics
Total Citations1
Total Downloads681
Last 12 Months283
Last 6 weeks37
View online with eReader
View this article in HTML format
PDF
research-article
Open Access
June 2021
KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls
ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 3Article No.: 38, Pages 1–22https://doi.org/10.1145/3459010

Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a code pattern for which a well-tuned library implementation exists is found in the source code ...
9
1,573
Metrics
Total Citations9
Total Downloads1,573
Last 12 Months436
Last 6 weeks42
View online with eReader
View this article in HTML format
PDF

Applied Filters

People

Names

Institutions

Authors

Publications

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Publication Date

Results

Comprehensive Evaluation and Opportunity Discovery for Deterministic Concurrency Control

An Optimized GPU Implementation for GIST Descriptor

RT-GNN: Accelerating Sparse Graph Neural Networks by Tensor-CUDA Kernel Fusion

ApSpGEMM: Accelerating Large-scale SpGEMM with Heterogeneous Collaboration and Adaptive Panel

Characterizing and Understanding HGNN Training on GPUs

MemoriaNova: Optimizing Memory-Aware Model Inference for Edge Computing

GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core Systems

An Example of Parallel Merkle Tree Traversal: Post-Quantum Leighton-Micali Signature on the GPU

TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture

Cost-aware Service Placement and Scheduling in the Edge-Cloud Continuum

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations

PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems

Source Matching and Rewriting for MLIR Using String-Based Automata

XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments

CARL: Compiler Assigned Reference Leasing

Weaving Synchronous Reactions into the Fabric of SSA-form Compilers

SortCache: Intelligent Cache Management for Accelerating Sparse Data Workloads

All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls