Keyword: NUCA : Search

research-article

NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches

ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 85–97https://doi.org/10.1145/3650200.3656604

Modern last-level caches are partitioned into slices that are spread across the chip, giving rise to varying access latencies dictated by the physical location of the accessing core and the cache slice being accessed. Although, prior work has shown that ...

research-article

Public Access

Jenga: Software-Defined Cache Hierarchies

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitecturePages 652–665https://doi.org/10.1145/3079856.3080214

Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., ...

Also Published in:

ACM SIGARCH Computer Architecture News: Volume 45 Issue 2

research-article

A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA Architecture

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 22, Issue 1Article No.: 4, Pages 1–26https://doi.org/10.1145/2907946

Multicore processors have proliferated several domains ranging from small-scale embedded systems to large data centers, making tiled CMPs (TCMPs) the essential next-generation scalable architecture. NUCA architectures help in managing the capacity and ...

research-article

Dynamic associativity management using utility based way-sharing

SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied ComputingPages 1919–1924https://doi.org/10.1145/2695664.2695783

The non-uniform distribution of memory accesses of todays applications affect the performance of cache memories. Due to such non-uniform accesses some sets of large sized caches are used heavily while some other sets are used lightly. This paper ...

Article

Adaptive V-Set Cache for Multi-core Processors

Ali A. El Moursy

MCSOC '14: Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCsPages 297–302https://doi.org/10.1109/MCSoC.2014.48

Development in VLSI design allows multi-to many-cores to be integrated on a single microprocessor chip. This increase in the core count per chip makes it more critical to design an efficient memory sub-system especially the Last Level Cache (LLC). The ...

research-article

Open Access

Revisiting LP-NUCA Energy Consumption: Cache Access Policies and Adaptive Block Dropping

ACM Transactions on Architecture and Code Optimization (TACO), Volume 11, Issue 2Article No.: 19, Pages 1–26https://doi.org/10.1145/2632217

Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Multithreaded Architectures (SMT) because interthread pollution harms system performance and battery life. Light-Power NUCA (LP-NUCA) is a working-set ...

article

A workload independent energy reduction strategy for D-NUCA caches

The Journal of Supercomputing (JSCO), Volume 68, Issue 1Pages 157–182https://doi.org/10.1007/s11227-013-1033-5

Wire delays and leakage energy consumption are both growing problems in the design of large on chip caches built in deep submicron technologies. D-NUCA caches (Dynamic-Nonuniform Cache Architecture) exploit an aggressive subbanking of the cache and a ...

research-article

Exploiting replication to improve performances of NUCA-based CMP systems

ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 3sArticle No.: 117, Pages 1–23https://doi.org/10.1145/2566568

Improvements in semiconductor nanotechnology made chip multiprocessors the reference architecture for high-performance microprocessors. CMPs usually adopt large Last-Level Caches (LLC) shared among cores and private L1 caches, whose performances depend ...

erratum

Errata to "Process Variation-Aware Nonuniform Cache Management in a 3D Die--Stacked Multicore Processor"

IEEE Transactions on Computers (ITCO), Volume 63, Issue 2Pages 525–526https://doi.org/10.1109/TC.2014.5

In the above-named articlt that appeared in ibid., vol. 62, no. 11, pp. 2252-2265, 2013, a production error occurred which resulted in the misalignment of Fig. 13, Fig. 14, Fig. 15, Fig. 16, Fig. 17, and Fig. 18 with their captions, starting from Fig. ...

research-article

Process Variation-Aware Nonuniform Cache Management in a 3D Die-Stacked Multicore Processor

IEEE Transactions on Computers (ITCO), Volume 62, Issue 11Pages 2252–2265https://doi.org/10.1109/TC.2012.129

Process variations in integrated circuits have significant impact on their performance, leakage, and stability. This is particularly evident in large, regular, and dense structures such as DRAMs. DRAMs are built using minimized transistors with ...

research-article

Jigsaw: scalable software-defined caches

PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesPages 213–224

Shared last-level caches, widely used in chip-multiprocessors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from ...

Article

Adaptive Stackable 3D Cache Architecture for Manycores

ISVLSI '12: Proceedings of the 2012 IEEE Computer Society Annual Symposium on VLSIPages 39–44https://doi.org/10.1109/ISVLSI.2012.36

With the emergence of many core architectures, the need of on-chip memories such as caches grows faster than the number of cores. Moreover the bandwidth to off-chip memories is saturating. Big memory caches can alleviate the pressure to off-chip ...

research-article

Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 17, Issue 2Article No.: 13, Pages 1–20https://doi.org/10.1145/2159542.2159545

Three-dimensional (3D) stacking technology enables integration of more memory on top of chip multiprocessors (CMPs). As the number of cores and the capacity of on-chip memory increase, the Non-Uniform Cache Architecture (NUCA) becomes more attractive. ...

research-article

Open Access

The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

ACM Transactions on Architecture and Code Optimization (TACO), Volume 8, Issue 4Article No.: 45, Pages 1–20https://doi.org/10.1145/2086696.2086724

The exponential increase in multicore processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with a single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been ...

research-article

A data layout optimization framework for NUCA-based multicores

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitecturePages 489–500https://doi.org/10.1145/2155620.2155677

Future multicore architectures are likely to include a large number of cores connected using an on-chip network with Non-uniform Cache Access (NUCA). In such architectures, whether a data request is satisfied from a local cache or a remote cache can ...

Article

Beforehand Migration on D-NUCA Caches

PACT '11: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation TechniquesPages 197–198https://doi.org/10.1109/PACT.2011.38

Determining the best placement for data in the NUCA cache at any particular moment during program execution is crucial for exploiting the benefits that this architecture provides. Dynamic NUCA (D-NUCA) allows data to be mapped to multiple banks within ...

Article

Energy Behaviour of NUCA Caches in CMPs

DSD '11: Proceedings of the 2011 14th Euromicro Conference on Digital System DesignPages 746–753https://doi.org/10.1109/DSD.2011.99

Advances in technology of semiconductor make nowadays possible to design Chip Multiprocessor Systems equipped with huge on-chip Last Level Caches. Due to the wire delay problem, the use of traditional cache memories with a uniform access time would ...

Article

Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors

PRDC '10: Proceedings of the 2010 IEEE 16th Pacific Rim International Symposium on Dependable ComputingPages 70–76https://doi.org/10.1109/PRDC.2010.33

Large scale Chip-Multiprocessors (CMPs) generally employ Network-on-Chip (NoC) to connect the last level cache (LLC), which is generally organized as distributed NUCA (non-uniform cache access) arrays for scalability and efficiency. On the other hand, ...

Article

Re-NUCA: Boosting CMP Performance Through Block Replication

DSD '10: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and ToolsPages 199–206https://doi.org/10.1109/DSD.2010.41

Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip ...

article

Way adaptable D-NUCA caches

International Journal of High Performance Systems Architecture (IJHPSA), Volume 2, Issue 3/4Pages 215–228https://doi.org/10.1504/IJHPSA.2010.034542

Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches

Jenga: Software-Defined Cache Hierarchies

Also Published in:

A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA Architecture

Dynamic associativity management using utility based way-sharing

Adaptive V-Set Cache for Multi-core Processors

Upcoming Conferences

Revisiting LP-NUCA Energy Consumption: Cache Access Policies and Adaptive Block Dropping

A workload independent energy reduction strategy for D-NUCA caches

Exploiting replication to improve performances of NUCA-based CMP systems

Errata to "Process Variation-Aware Nonuniform Cache Management in a 3D Die--Stacked Multicore Processor"

Process Variation-Aware Nonuniform Cache Management in a 3D Die-Stacked Multicore Processor

Jigsaw: scalable software-defined caches

Adaptive Stackable 3D Cache Architecture for Manycores

Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs

The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

A data layout optimization framework for NUCA-based multicores

Beforehand Migration on D-NUCA Caches

Energy Behaviour of NUCA Caches in CMPs

Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors

Re-NUCA: Boosting CMP Performance Through Block Replication

Way adaptable D-NUCA caches

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Also Published in:

Upcoming Conferences