Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2024
NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 85–97https://doi.org/10.1145/3650200.3656604Modern last-level caches are partitioned into slices that are spread across the chip, giving rise to varying access latencies dictated by the physical location of the accessing core and the cache slice being accessed. Although, prior work has shown that ...
- research-articleJune 2017
Jenga: Software-Defined Cache Hierarchies
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitecturePages 652–665https://doi.org/10.1145/3079856.3080214Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 45 Issue 2 - research-articleMay 2016
A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA Architecture
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 22, Issue 1Article No.: 4, Pages 1–26https://doi.org/10.1145/2907946Multicore processors have proliferated several domains ranging from small-scale embedded systems to large data centers, making tiled CMPs (TCMPs) the essential next-generation scalable architecture. NUCA architectures help in managing the capacity and ...
- research-articleApril 2015
Dynamic associativity management using utility based way-sharing
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied ComputingPages 1919–1924https://doi.org/10.1145/2695664.2695783The non-uniform distribution of memory accesses of todays applications affect the performance of cache memories. Due to such non-uniform accesses some sets of large sized caches are used heavily while some other sets are used lightly. This paper ...
- ArticleSeptember 2014
Adaptive V-Set Cache for Multi-core Processors
MCSOC '14: Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCsPages 297–302https://doi.org/10.1109/MCSoC.2014.48Development in VLSI design allows multi-to many-cores to be integrated on a single microprocessor chip. This increase in the core count per chip makes it more critical to design an efficient memory sub-system especially the Last Level Cache (LLC). The ...
-
- research-articleJune 2014
Revisiting LP-NUCA Energy Consumption: Cache Access Policies and Adaptive Block Dropping
- Darío Suárez Gracia,
- Alexandra Ferrerón,
- Luis Montesano Del Campo,
- Teresa Monreal Arnal,
- Víctor Viñals Yúfera
ACM Transactions on Architecture and Code Optimization (TACO), Volume 11, Issue 2Article No.: 19, Pages 1–26https://doi.org/10.1145/2632217Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Multithreaded Architectures (SMT) because interthread pollution harms system performance and battery life. Light-Power NUCA (LP-NUCA) is a working-set ...
- articleApril 2014
A workload independent energy reduction strategy for D-NUCA caches
The Journal of Supercomputing (JSCO), Volume 68, Issue 1Pages 157–182https://doi.org/10.1007/s11227-013-1033-5Wire delays and leakage energy consumption are both growing problems in the design of large on chip caches built in deep submicron technologies. D-NUCA caches (Dynamic-Nonuniform Cache Architecture) exploit an aggressive subbanking of the cache and a ...
- research-articleMarch 2014
Exploiting replication to improve performances of NUCA-based CMP systems
ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 3sArticle No.: 117, Pages 1–23https://doi.org/10.1145/2566568Improvements in semiconductor nanotechnology made chip multiprocessors the reference architecture for high-performance microprocessors. CMPs usually adopt large Last-Level Caches (LLC) shared among cores and private L1 caches, whose performances depend ...
- erratumFebruary 2014
Errata to "Process Variation-Aware Nonuniform Cache Management in a 3D Die--Stacked Multicore Processor"
IEEE Transactions on Computers (ITCO), Volume 63, Issue 2Pages 525–526https://doi.org/10.1109/TC.2014.5In the above-named articlt that appeared in ibid., vol. 62, no. 11, pp. 2252-2265, 2013, a production error occurred which resulted in the misalignment of Fig. 13, Fig. 14, Fig. 15, Fig. 16, Fig. 17, and Fig. 18 with their captions, starting from Fig. ...
- research-articleNovember 2013
Process Variation-Aware Nonuniform Cache Management in a 3D Die-Stacked Multicore Processor
IEEE Transactions on Computers (ITCO), Volume 62, Issue 11Pages 2252–2265https://doi.org/10.1109/TC.2012.129Process variations in integrated circuits have significant impact on their performance, leakage, and stability. This is particularly evident in large, regular, and dense structures such as DRAMs. DRAMs are built using minimized transistors with ...
- research-articleOctober 2013
Jigsaw: scalable software-defined caches
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesPages 213–224Shared last-level caches, widely used in chip-multiprocessors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from ...
- ArticleAugust 2012
Adaptive Stackable 3D Cache Architecture for Manycores
ISVLSI '12: Proceedings of the 2012 IEEE Computer Society Annual Symposium on VLSIPages 39–44https://doi.org/10.1109/ISVLSI.2012.36With the emergence of many core architectures, the need of on-chip memories such as caches grows faster than the number of cores. Moreover the bandwidth to off-chip memories is saturating. Big memory caches can alleviate the pressure to off-chip ...
- research-articleApril 2012
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 17, Issue 2Article No.: 13, Pages 1–20https://doi.org/10.1145/2159542.2159545Three-dimensional (3D) stacking technology enables integration of more memory on top of chip multiprocessors (CMPs). As the number of cores and the capacity of on-chip memory increase, the Non-Uniform Cache Architecture (NUCA) becomes more attractive. ...
- research-articleJanuary 2012
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
ACM Transactions on Architecture and Code Optimization (TACO), Volume 8, Issue 4Article No.: 45, Pages 1–20https://doi.org/10.1145/2086696.2086724The exponential increase in multicore processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with a single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been ...
- research-articleDecember 2011
A data layout optimization framework for NUCA-based multicores
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitecturePages 489–500https://doi.org/10.1145/2155620.2155677Future multicore architectures are likely to include a large number of cores connected using an on-chip network with Non-uniform Cache Access (NUCA). In such architectures, whether a data request is satisfied from a local cache or a remote cache can ...
- ArticleOctober 2011
Beforehand Migration on D-NUCA Caches
PACT '11: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation TechniquesPages 197–198https://doi.org/10.1109/PACT.2011.38Determining the best placement for data in the NUCA cache at any particular moment during program execution is crucial for exploiting the benefits that this architecture provides. Dynamic NUCA (D-NUCA) allows data to be mapped to multiple banks within ...
- ArticleAugust 2011
Energy Behaviour of NUCA Caches in CMPs
DSD '11: Proceedings of the 2011 14th Euromicro Conference on Digital System DesignPages 746–753https://doi.org/10.1109/DSD.2011.99Advances in technology of semiconductor make nowadays possible to design Chip Multiprocessor Systems equipped with huge on-chip Last Level Caches. Due to the wire delay problem, the use of traditional cache memories with a uniform access time would ...
- ArticleDecember 2010
Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors
PRDC '10: Proceedings of the 2010 IEEE 16th Pacific Rim International Symposium on Dependable ComputingPages 70–76https://doi.org/10.1109/PRDC.2010.33Large scale Chip-Multiprocessors (CMPs) generally employ Network-on-Chip (NoC) to connect the last level cache (LLC), which is generally organized as distributed NUCA (non-uniform cache access) arrays for scalability and efficiency. On the other hand, ...
- ArticleSeptember 2010
Re-NUCA: Boosting CMP Performance Through Block Replication
DSD '10: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and ToolsPages 199–206https://doi.org/10.1109/DSD.2010.41Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip ...
- articleAugust 2010
Way adaptable D-NUCA caches
International Journal of High Performance Systems Architecture (IJHPSA), Volume 2, Issue 3/4Pages 215–228https://doi.org/10.1504/IJHPSA.2010.034542Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a ...