Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2024
Scalability Limitations of Processing-in-Memory using Real System Evaluations
- Gilbert Jonatan,
- Haeyoon Cho,
- Hyojun Son,
- Xiangyu Wu,
- Neal Livesay,
- Evelio Mora,
- Kaustubh Shivdikar,
- José L. Abellán,
- Ajay Joshi,
- David Kaeli,
- John Kim
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 1Article No.: 5, Pages 1–28https://doi.org/10.1145/3639046Processing-in-memory (PIM), where the compute is moved closer to the memory or the data, has been widely explored to accelerate emerging workloads. Recently, different PIM-based systems have been announced by memory vendors to minimize data movement and ...
- research-articleFebruary 2024
H3DM: A High-bandwidth High-capacity Hybrid 3D Memory Design for GPUs
- Negar Akbarzadeh,
- Sina Darabi,
- Atiyeh Gheibi-Fetrat,
- Amir Mirzaei,
- Mohammad Sadrosadati,
- Hamid Sarbazi-Azad
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 1Article No.: 12, Pages 1–28https://doi.org/10.1145/3639038Graphics Processing Units (GPUs) are widely used for modern applications with huge data sizes. However, the performance benefit of GPUs is limited by their memory capacity and bandwidth. Although GPU vendors improve memory capacity and bandwidth using 3D ...
- research-articleMarch 2023
Asynchronous Automata Processing on GPUs
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 1Article No.: 27, Pages 1–27https://doi.org/10.1145/3579453Finite-state automata serve as compute kernels for many application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1)~input stream level, 2)~automaton-...
- research-articleDecember 2022
Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 3Article No.: 44, Pages 1–26https://doi.org/10.1145/3570604Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source. However, DNN training on the edge is poorly explored. ...
- research-articleFebruary 2022
NURA: A Framework for Supporting Non-Uniform Resource Accesses in GPUs
- Sina Darabi,
- Negin Mahani,
- Hazhir Baxishi,
- Ehsan Yousefzadeh-Asl-Miandoab,
- Mohammad Sadrosadati,
- Hamid Sarbazi-Azad
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 1Article No.: 16, Pages 1–27https://doi.org/10.1145/3508036Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g., spatial multitasking) have limited opportunity to improve resource utilization, while other ...
- research-articleFebruary 2022
Memory Space Recycling
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 1Article No.: 14, Pages 1–24https://doi.org/10.1145/3508034Many program codes from different application domains process very large amounts of data, making their cache memory behavior critical for high performance. Most of the existing work targeting cache memory hierarchies focus on improving data access ...
- research-articleJune 2021
Mix and Match: Reorganizing Tasks for Enhancing Data Locality
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 5, Issue 2Article No.: 20, Pages 1–24https://doi.org/10.1145/3460087Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. However, to maximize the performance of multithreaded applications running on emerging manycore systems, data ...
- research-articleFebruary 2021
SUGAR: Speeding Up GPGPU Application Resilience Estimation with Input Sizing
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 5, Issue 1Article No.: 01, Pages 1–29https://doi.org/10.1145/3447375As Graphics Processing Units (GPUs) are becoming a de facto solution for accelerating a wide range of applications, their reliable operation is becoming increasingly important. One of the major challenges in the domain of GPU reliability is to accurately ...
- research-articleJune 2019
Architecture-Aware Approximate Computing
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 3, Issue 2Article No.: 38, Pages 1–24https://doi.org/10.1145/3341617.3326153Deliberate use of approximate computing has been an active research area recently. Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with ...
- research-articleDecember 2018
Computing with Near Data
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 2, Issue 3Article No.: 42, Pages 1–30https://doi.org/10.1145/3287321One cost that plays a significant role in shaping the overall performance of both single-threaded and multi-thread applications in modern computing systems is the cost of moving data between compute elements and storage elements. Traditional approaches ...
- research-articleDecember 2018
Quantifying Data Locality in Dynamic Parallelism in GPUs
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 2, Issue 3Article No.: 39, Pages 1–24https://doi.org/10.1145/3287318GPUs are becoming prevalent in various domains of computing and are widely used for streaming (regular) applications. However, they are highly inefficient when executing irregular applications with unstructured inputs due to load imbalance. Dynamic ...
- research-articleDecember 2017
Towards Optimality in Parallel Scheduling
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 1, Issue 2Article No.: 40, Pages 1–30https://doi.org/10.1145/3154499To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip rather than single core performance. In turn, modern jobs are often designed to run on any number of cores. However, to effectively leverage these multi-...