Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2024
Partial Solution Based Constraint Solving Cache in Symbolic Execution
Proceedings of the ACM on Software Engineering (PACMSE), Volume 1, Issue FSEArticle No.: 110, Pages 2493–2514https://doi.org/10.1145/3660817Constraint solving is one of the main challenges for symbolic execution. Caching is an effective mechanism to reduce the number of the solver invocations in symbolic execution and is adopted by many mainstream symbolic execution engines. However, caching ...
- research-articleApril 2024
Volley: Accelerating Write-Read Orders in Disaggregated Storage
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsApril 2024, Pages 657–673https://doi.org/10.1145/3627703.3650090Modern data centers deploy disaggregated storage systems (e.g., NVMe over Fabrics, NVMe-oF) for fine-grained resource elasticity and high resource utilization. A client-side writeback cache is used to absorb writes and buffer frequently accessed data, ...
- research-articleApril 2024JUST ACCEPTED
Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3659207The increasing demand for computing power and the emergence of heterogeneous computing architectures have driven the exploration of innovative techniques to address current limitations in both the compute and memory subsystems. One such solution is the ...
- research-articleDecember 2023
Flash-Based Solid-State Storage Reduces LDPC Read Retry Scheme
CSAE '23: Proceedings of the 7th International Conference on Computer Science and Application EngineeringOctober 2023, Article No.: 32, Pages 1–6https://doi.org/10.1145/3627915.3628024Flash memory, with its high performance and low power consumption, has been widely applied in various fields. The increasing storage density of flash memory has led to a gradual increase in Bit Error Rate (BER). Given the users' requirement for data ...
- research-articleDecember 2023
Hardware Support for Constant-Time Programming
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2023, Pages 856–870https://doi.org/10.1145/3613424.3623796Side-channel attacks are one of the rising security concerns in modern computing platforms. Observing this, researchers have proposed both hardware-based and software-based strategies to mitigate side-channel attacks, targeting not only on-chip caches ...
-
- research-articleDecember 2023
Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings
- Konstantinos Kanellopoulos,
- Rahul Bera,
- Kosta Stojiljkovic,
- F. Nisa Bostanci,
- Can Firtina,
- Rachata Ausavarungnirun,
- Rakesh Kumar,
- Nastaran Hajinazar,
- Mohammad Sadrosadati,
- Nandita Vijaykumar,
- Onur Mutlu
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2023, Pages 1196–1212https://doi.org/10.1145/3613424.3623789Conventional virtual memory (VM) frameworks enable a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical mappings, which leads to high address translation latency and ...
- research-articleDecember 2023
Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources
- Konstantinos Kanellopoulos,
- Hong Chul Nam,
- Nisa Bostanci,
- Rahul Bera,
- Mohammad Sadrosadati,
- Rakesh Kumar,
- Davide Basilio Bartolini,
- Onur Mutlu
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2023, Pages 1178–1195https://doi.org/10.1145/3613424.3614276Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) ...
- research-articleDecember 2023
Uncore Encore: Covert Channels Exploiting Uncore Frequency Scaling
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2023, Pages 843–855https://doi.org/10.1145/3613424.3614259Modern processors dynamically adjust clock frequencies and voltages to reduce energy consumption. Recent Intel processors separate the uncore frequency from the core frequency, using Uncore Frequency Scaling (UFS) to adapt the uncore frequency to ...
- research-articleDecember 2023
CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core Systems
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2023, Pages 714–727https://doi.org/10.1145/3613424.3614245Hardware prefetching is a latency-hiding technique that hides the costly off-chip DRAM accesses. However, state-of-the-art prefetchers fail to deliver performance improvement in the case of many-core systems with constrained DRAM bandwidth. For SPEC ...
- research-articleJune 2023
Lightweight Register File Caching in Collector Units for GPUs
GPGPU '23: Proceedings of the 15th Workshop on General Purpose Processing Using GPUFebruary 2023, Pages 27–33https://doi.org/10.1145/3589236.3589245Modern GPUs benefit from a sizable Register File (RF) to provide fine-grained thread switching. As the RF is huge and accessed frequently, it consumes a considerable share of the dynamic energy of the GPU. Designing a large, high-throughput RF with low ...
- research-articleMay 2023
High Speed Multi-channel Data Cache Design Based on DDR3 SDRAM
AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern RecognitionSeptember 2022, Pages 193–196https://doi.org/10.1145/3573942.3573972With the rapid development of microelectronics technology, the amount of data information is becoming larger and larger, and the speed of data processing is becoming higher and higher. In order to meet the needs of today's data cache and solve a series ...
- research-articleMarch 2023
Design and Implementation of A Cache Adapted to the LoongArch Architecturert
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringOctober 2022, Pages 99–104https://doi.org/10.1145/3573428.3573446A cache adapted to the LoongArch architecture is designed and implemented using the FPAG platform. A simple and performance balanced Cache module is designed and implemented. The design of the read and write data paths of the Cache module is given, and ...
- research-articleMarch 2023
ACTION: Adaptive Cache Block Migration in Distributed Cache Architectures
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 2Article No.: 25, Pages 1–19https://doi.org/10.1145/3572911Chip multiprocessors (CMP) with more cores have more traffic to the last-level cache (LLC). Without a corresponding increase in LLC bandwidth, such traffic cannot be sustained, resulting in performance degradation. Previous research focused on data ...
- research-articleJune 2022
MemSweeper: virtualizing cluster memory management for high memory utilization and isolation
ISMM 2022: Proceedings of the 2022 ACM SIGPLAN International Symposium on Memory ManagementJune 2022, Pages 15–28https://doi.org/10.1145/3520263.3534651Memory caches are critical components of modern web services that improve response times and reduce the load on backend databases. In multi-tenant clouds, several instances of caches compete for memory. The current state-of-the-art is to statically ...
- research-articleAugust 2021
A Large-scale Analysis of Hundreds of In-memory Key-value Cache Clusters at Twitter
ACM Transactions on Storage (TOS), Volume 17, Issue 3Article No.: 17, Pages 1–35https://doi.org/10.1145/3468521Modern web services use in-memory caching extensively to increase throughput and reduce latency. There have been several workload analyses of production systems that have fueled research in improving the effectiveness of in-memory caching systems. However,...
- research-articleFebruary 2021
On Cache Limits for Dataflow Applications and Related Efficient Memory Management Strategies
DASIP '21: Workshop on Design and Architectures for Signal and Image Processing (14th edition)January 2021, Pages 68–76https://doi.org/10.1145/3441110.3441573The dataflow paradigm frees the designer to focus on the functionality of an application, independently from the underlying architecture executing it. While mapping the dataflow computational part to the cores seems obvious, the memory aspects do not ...
- research-articleJanuary 2021
SAC: A Stream Aware Write Cache Scheme for Multi-Streamed Solid State Drives
ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation ConferenceJanuary 2021, Pages 645–650https://doi.org/10.1145/3394885.3431520This work found that the state-of-the-art multi-streamed SSDs are inefficiently used due to two issues. First, the write cache inside SSDs is not aware of data from different streams, which induce conflict among streams. Second, the current stream ...
- research-articleJuly 2020
Efficient Cache Strategy for Face Recognition System
ICBDC '20: Proceedings of the 5th International Conference on Big Data and ComputingMay 2020, Pages 108–113https://doi.org/10.1145/3404687.3404698Recently, the need for real-time deep learning applications, such as face recognition system, implement on the embedded devices is increasing. At the system level, caching system is one of the most effective ways to reduce response time. However, ...
- research-articleJune 2019
A None-Sparse Inference Accelerator that Distills and Reuses the Computation Redundancy in CNNs
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019June 2019, Article No.: 202, Pages 1–6https://doi.org/10.1145/3316781.3317749Prior research on energy-efficient Convolutional Neural Network (CNN) inference accelerators mostly focus on exploiting the model sparsity, i.e., zero patterns in weight and activations, to reduce the on-chip storage and computation overhead. In this ...
- research-articleJune 2019
A General Cache Framework for Efficient Generation of Timing Critical Paths
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019June 2019, Article No.: 108, Pages 1–6https://doi.org/10.1145/3316781.3317744The recent TAU 2018 contest was seeking novel idea for efficient generation of timing reports. When the timing graph is updated, users query different forms of timing reports that happen subsequently and sequentially. This process is computationally ...