Parallel architectures

Applied Filters

People

Publications

Publication Date

Searched The ACM Guide to Computing Literature (3,765,093 records)|Limit your search to The ACM Full-Text Collection (758,134 records)

Showing 1 - 12of12 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Open Access
February 2024
Scalability Limitations of Processing-in-Memory using Real System Evaluations
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 1Article No.: 5, Pages 1–28https://doi.org/10.1145/3639046

Processing-in-memory (PIM), where the compute is moved closer to the memory or the data, has been widely explored to accelerate emerging workloads. Recently, different PIM-based systems have been announced by memory vendors to minimize data movement and ...
1
787
Metrics
Total Citations1
Total Downloads787
Last 12 Months787
Last 6 weeks117
View online with eReader
PDF
research-article
February 2024
H3DM: A High-bandwidth High-capacity Hybrid 3D Memory Design for GPUs
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 1Article No.: 12, Pages 1–28https://doi.org/10.1145/3639038

Graphics Processing Units (GPUs) are widely used for modern applications with huge data sizes. However, the performance benefit of GPUs is limited by their memory capacity and bandwidth. Although GPU vendors improve memory capacity and bandwidth using 3D ...
0
375
Metrics
Total Citations0
Total Downloads375
Last 12 Months375
Last 6 weeks33
Get Access
research-article
Open Access
March 2023
Asynchronous Automata Processing on GPUs
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 1Article No.: 27, Pages 1–27https://doi.org/10.1145/3579453

Finite-state automata serve as compute kernels for many application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1)~input stream level, 2)~automaton-...
4
786
Metrics
Total Citations4
Total Downloads786
Last 12 Months496
Last 6 weeks55
View online with eReader
PDF
research-article
December 2022
Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 3Article No.: 44, Pages 1–26https://doi.org/10.1145/3570604

Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source. However, DNN training on the edge is poorly explored. ...
4
793
Metrics
Total Citations4
Total Downloads793
Last 12 Months429
Last 6 weeks27
Get Access
research-article
February 2022
NURA: A Framework for Supporting Non-Uniform Resource Accesses in GPUs
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 1Article No.: 16, Pages 1–27https://doi.org/10.1145/3508036

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g., spatial multitasking) have limited opportunity to improve resource utilization, while other ...
4
347
Metrics
Total Citations4
Total Downloads347
Last 12 Months77
Last 6 weeks8
Get Access
research-article
Public Access
February 2022
Memory Space Recycling
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 1Article No.: 14, Pages 1–24https://doi.org/10.1145/3508034

Many program codes from different application domains process very large amounts of data, making their cache memory behavior critical for high performance. Most of the existing work targeting cache memory hierarchies focus on improving data access ...
0
273
Metrics
Total Citations0
Total Downloads273
Last 12 Months134
Last 6 weeks26
View online with eReader
PDF
research-article
Public Access
June 2021
Mix and Match: Reorganizing Tasks for Enhancing Data Locality
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 5, Issue 2Article No.: 20, Pages 1–24https://doi.org/10.1145/3460087

Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. However, to maximize the performance of multithreaded applications running on emerging manycore systems, data ...
6
277
Metrics
Total Citations6
Total Downloads277
Last 12 Months80
Last 6 weeks10
View online with eReader
PDF
research-article
Public Access
February 2021
SUGAR: Speeding Up GPGPU Application Resilience Estimation with Input Sizing
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 5, Issue 1Article No.: 01, Pages 1–29https://doi.org/10.1145/3447375

As Graphics Processing Units (GPUs) are becoming a de facto solution for accelerating a wide range of applications, their reliable operation is becoming increasingly important. One of the major challenges in the domain of GPU reliability is to accurately ...
14
525
Metrics
Total Citations14
Total Downloads525
Last 12 Months123
Last 6 weeks19
View online with eReader
PDF
research-article
Public Access
June 2019
Architecture-Aware Approximate Computing
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 3, Issue 2Article No.: 38, Pages 1–24https://doi.org/10.1145/3341617.3326153

Deliberate use of approximate computing has been an active research area recently. Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with ...
3
483
Metrics
Total Citations3
Total Downloads483
Last 12 Months96
Last 6 weeks19
View online with eReader
PDF
research-article
Public Access
December 2018
Computing with Near Data
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 2, Issue 3Article No.: 42, Pages 1–30https://doi.org/10.1145/3287321

One cost that plays a significant role in shaping the overall performance of both single-threaded and multi-thread applications in modern computing systems is the cost of moving data between compute elements and storage elements. Traditional approaches ...
4
724
Metrics
Total Citations4
Total Downloads724
Last 12 Months82
Last 6 weeks16
View online with eReader
PDF
research-article
Public Access
December 2018
Quantifying Data Locality in Dynamic Parallelism in GPUs
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 2, Issue 3Article No.: 39, Pages 1–24https://doi.org/10.1145/3287318

GPUs are becoming prevalent in various domains of computing and are widely used for streaming (regular) applications. However, they are highly inefficient when executing irregular applications with unstructured inputs due to load imbalance. Dynamic ...
4
453
Metrics
Total Citations4
Total Downloads453
Last 12 Months82
Last 6 weeks19
View online with eReader
PDF
research-article
Public Access
December 2017
Towards Optimality in Parallel Scheduling
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 1, Issue 2Article No.: 40, Pages 1–30https://doi.org/10.1145/3154499

To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip rather than single core performance. In turn, modern jobs are often designed to run on any number of cores. However, to effectively leverage these multi-...
13
300
Metrics
Total Citations13
Total Downloads300
Last 12 Months69
Last 6 weeks22
View online with eReader
PDF