Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2023
FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingJune 2023, Pages 25–36https://doi.org/10.1145/3577193.3593740The cache Miss Ratio Curve (MRC) serves a variety of purposes such as cache partitioning, application profiling and code tuning. In this work, we propose a new metric, called cache miss distribution, that describes cache miss behavior over cache sets, ...
- research-articleJune 2023
Analysis of Shared Cache Interference in Multi-Core Systems using Event-Arrival Curves
RTNS '23: Proceedings of the 31st International Conference on Real-Time Networks and SystemsJune 2023, Pages 23–33https://doi.org/10.1145/3575757.3593643Caches are used to bridge the gap between main memory and the significantly faster processor cores. In multi-core architectures, the last-level cache is often shared between cores. However, sharing a cache causes inter-core interference to emerge. ...
- research-articleNovember 2021
A lightweight code storage container for the eclipse OMR ahead-of-time compiler
CASCON '21: Proceedings of the 31st Annual International Conference on Computer Science and Software EngineeringNovember 2021, Pages 93–103Over the years, Ahead-of-Time (AOT) compilation has drawn significant attention in the research community due to its ability to accelerate the startup of a runtime system. AOT compilation involves compiling, persisting and re-using existing compiled ...
- posterSeptember 2020
Exploiting DRAM bank mapping and HugePages for effective denial-of-service attacks on shared cache in multicore
HotSoS '20: Proceedings of the 7th Symposium on Hot Topics in the Science of SecuritySeptember 2020, Article No.: 24, Pages 1–2https://doi.org/10.1145/3384217.3386394In this paper, we propose memory-aware cache DoS attacks that can induce more effective cache blocking by taking advantage of information of the underlying memory hardware. Like prior cache DoS attacks, our new attacks also generate lots of cache misses ...
- extended-abstractJuly 2020
Green Paging and Parallel Paging
SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and ArchitecturesJuly 2020, Pages 493–495https://doi.org/10.1145/3350755.3400231We study two fundamental variants of the classic paging problem: green paging and parallel paging. In green paging one can choose the exact memory capacity in use at any given instant, between a maximum of k and a minimum of k/p pages; the goal is to ...
-
- articleDecember 2018
HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads
Frontiers of Computer Science: Selected Publications from Chinese Universities (FCS), Volume 12, Issue 6December 2018, Pages 1090–1104https://doi.org/10.1007/s11704-017-6349-5The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is ...
- research-articleDecember 2017
Predictable Shared Cache Management for Multi-Core Real-Time Virtualization
ACM Transactions on Embedded Computing Systems (TECS), Volume 17, Issue 1Article No.: 22, Pages 1–27https://doi.org/10.1145/3092946Real-time virtualization has gained much attention for the consolidation of multiple real-time systems onto a single hardware platform while ensuring timing predictability. However, a shared last-level cache (LLC) on modern multi-core platforms can ...
- surveyMay 2017
A Survey of Techniques for Cache Partitioning in Multicore Processors
ACM Computing Surveys (CSUR), Volume 50, Issue 2Article No.: 27, Pages 1–39https://doi.org/10.1145/3062394As the number of on-chip cores and memory demands of applications increase, judicious management of cache resources has become not merely attractive but imperative. Cache partitioning, that is, dividing cache space between applications based on their ...
- research-articleOctober 2015
Rapid analysis of interprocessor communications on heterogeneous system architectures via parallel cache emulation
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systemsOctober 2015, Pages 418–423https://doi.org/10.1145/2811411.2811496The recently proposed heterogeneous system architecture (HSA) specifications enable shared-memory-based interprocessor communications between CPU cores and GPU cores via a flat coherent address space and memory-based signals to reduce explicit data copy ...
- research-articleApril 2015
An ETL optimization framework using partitioning and parallelization
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied ComputingApril 2015, Pages 1015–1022https://doi.org/10.1145/2695664.2695846Extract-Transform-Load (ETL) handles large amounts of data and manages workload through dataflows. ETL dataflows are widely regarded as complex and expensive operations in terms of time and system resources. In order to minimize the time and the ...
- research-articleJune 2014
LAWS: locality-aware work-stealing for multi-socket multi-core architectures
ICS '14: Proceedings of the 28th ACM international conference on SupercomputingJune 2014, Pages 3–12https://doi.org/10.1145/2597652.2597665Modern mainstream powerful computers adopt Multi-Socket Multi-Core (MSMC) CPU architecture and NUMA-based memory architecture. While traditional work-stealing schedulers are designed for single-socket architectures, they incur severe shared cache misses ...
- research-articleApril 2014
Cache matching: thread scheduling to maximize data reuse
HPC '14: Proceedings of the High Performance Computing SymposiumApril 2014, Article No.: 7, Pages 1–8Datacenters today often execute multiple data-intensive threads concurrently. To improve the latency of threads accessing slow external storage, data is often cached in memory. The way in which the cache is shared between concurrent threads has a ...
- research-articleApril 2014
A Unified WCET analysis framework for multicore platforms
ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 4sArticle No.: 124, Pages 1–29https://doi.org/10.1145/2584654With the advent of multicore architectures, worst-case execution time (WCET) analysis has become an increasingly difficult problem. In this article, we propose a unified WCET analysis framework for multicore processors featuring both shared cache and ...
- research-articleMarch 2013
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsMarch 2013, Pages 357–368https://doi.org/10.1145/2451116.2451155Most hardware and software venders suggest disabling hardware prefetching in virtualized environments. They claim that prefetching is detrimental to application performance due to inaccurate prediction caused by workload diversity and VM interference on ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 41 Issue 1, March 2013ACM SIGPLAN Notices: Volume 48 Issue 4, April 2013 - ArticleOctober 2012
An Improved Multi-core Shared Cache Replacement Algorithm
DCABES '12: Proceedings of the 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & ScienceOctober 2012, Pages 13–17https://doi.org/10.1109/DCABES.2012.39Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that traditional LRU and its approximation can lead to poor performance and unfairness when the multiple cores compete for ...
- ArticleSeptember 2012
A Locality-based Performance Model for Load-and-Compute Style Computation
CLUSTER '12: Proceedings of the 2012 IEEE International Conference on Cluster ComputingSeptember 2012, Pages 566–571https://doi.org/10.1109/CLUSTER.2012.25The increasing speed gap between the processor and memory is usually the critical bottleneck in achieving high performance. Hardware caches, programming models, algorithms and data structures have been introduced and proposed to exploit localities on ...
- ArticleJuly 2012
Using an Analytical Model of Shared Caches for Selecting the Optimal Parallelization Scheme
ISPA '12: Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with ApplicationsJuly 2012, Pages 588–594https://doi.org/10.1109/ISPA.2012.88Multicores are now the norm. Their cache hierarchy has often a last level shared cache. The performance of this shared cache during the execution of multithreaded applications depends on the parallelization scheme followed. For example, critical ...
- ArticleJune 2012
A Dynamic Cache Partitioning Mechanism under Virtualization Environment
TRUSTCOM '12: Proceedings of the 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and CommunicationsJune 2012, Pages 1907–1911https://doi.org/10.1109/TrustCom.2012.25Cache sharing among multiple computing units on chip is common in today's multi-core processors, and a lot of research has focused on the effective management of shared cache. A software management method called page coloring is commonly used to divide ...
- ArticleMay 2012
Competitive Cache Replacement Strategies for Shared Cache Environments
IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing SymposiumMay 2012, Pages 215–226https://doi.org/10.1109/IPDPS.2012.29We investigate cache replacement algorithms (CRAs) at a cache shared by several processes under different multicore environments. For a single shared cache, our main result is the first CRA, GLOBAL-MAXIMA, for fixed interleaving under shared full ...
- ArticleApril 2012
A Unified WCET Analysis Framework for Multi-core Platforms
RTAS '12: Proceedings of the 2012 IEEE 18th Real Time and Embedded Technology and Applications SymposiumApril 2012, Pages 99–108https://doi.org/10.1109/RTAS.2012.26With the advent of multi-core architectures, worst case execution time (WCET) analysis has become an increasingly difficult problem. In this paper, we propose a unified WCET analysis framework for multi-core processors featuring both shared cache and ...