Keyword: chip multiprocessors : Search

research-article

Public Access

SB-Fetch: synchronization aware hardware prefetching for chip multiprocessors

ICS '20: Proceedings of the 34th ACM International Conference on SupercomputingJune 2020, Article No.: 15, Pages 1–12https://doi.org/10.1145/3392717.3392735

Shared-memory, multi-threaded applications often require programmers to insert thread synchronization primitives (i.e. locks, barriers, and condition variables) in critical sections to synchronize data access between processes. Scaling performance ...

research-article

Odd-even based adaptive two-way routing in mesh NoCs for hotspot mitigation

ICDCN '19: Proceedings of the 20th International Conference on Distributed Computing and NetworkingJanuary 2019, Pages 248–252https://doi.org/10.1145/3288599.3288611

Network-on-Chip is adapted as a profitable framework for communication in on-chip multiprocessors. Congestion management using adaptive routing techniques become the major research focus in recent days. Hotspots are congested cores in multi-core systems,...

research-article

Public Access

Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis

ACM Transactions on Computer Systems (TOCS), Volume 34, Issue 1Article No.: 3, Pages 1–30https://doi.org/10.1145/2851503

To enable performance improvements in a power-efficient manner, computer architects have been building CPUs that exploit greater amounts of thread-level parallelism. A key consideration in such CPUs is properly designing the on-chip cache hierarchy. ...

research-article

Hierarchical Clustering for On-Chip Networks

AISTECS '16: Proceedings of the 1st International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing SystemsJanuary 2016, Article No.: 2, Pages 1–6https://doi.org/10.1145/2857058.2857064

Hierarchy and communication locality are a must for many-core systems. As systems scale to dozens or hundreds of cores, we simply cannot afford the power consumption and latency of random communication that spans the entire chip. Existing hierarchical ...

research-article

Open Access

Sensible Energy Accounting with Abstract Metering for Multicore Systems

ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 4Article No.: 60, Pages 1–26https://doi.org/10.1145/2842616

Chip multicore processors (CMPs) are the preferred processing platform across different domains such as data centers, real-time systems, and mobile devices. In all those domains, energy is arguably the most expensive resource in a computing system. ...

Article

Cache Balancer: Access Rate and Pain Based Resource Management for Chip Multiprocessors

CANDAR '14: Proceedings of the 2014 Second International Symposium on Computing and NetworkingDecember 2014, Pages 453–456https://doi.org/10.1109/CANDAR.2014.81

This paper presents a runtime resource management scheme named Cache Balancer that improves the utilization of on-chip shared caches and reduces access latencies in chip multiprocessor systems. Cache Balancer incorporates an access rate based memory ...

research-article

Open Access

Hardware support for accurate per-task energy metering in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 4Article No.: 34, Pages 1–27https://doi.org/10.1145/2541228.2555291

Accurately determining the energy consumed by each task in a system will become of prominent importance in future multicore-based systems because it offers several benefits, including (i) better application energy/performance optimizations, (ii) ...

research-article

Open Access

Temporal-based multilevel correlating inclusive cache replacement

ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 4Article No.: 33, Pages 1–24https://doi.org/10.1145/2541228.2555290

Inclusive caches have been widely used in Chip Multiprocessors (CMPs) to simplify cache coherence. However, they have poor performance compared with noninclusive caches not only because of the limited capacity of the entire cache hierarchy but also due ...

research-article

Load-balanced pipeline parallelism

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2013, Article No.: 14, Pages 1–12https://doi.org/10.1145/2503210.2503295

Accelerating a single thread in current parallel systems remains a challenging problem, because sequential threads do not naturally take advantage of the additional cores. Recent work shows that automatic extraction of pipeline parallelism is an ...

research-article

A fast and scalable multidimensional multiple-choice knapsack heuristic

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 18, Issue 4Article No.: 51, Pages 1–32https://doi.org/10.1145/2541012.2541014

Many combinatorial optimization problems in the embedded systems and design automation domains involve decision making in multidimensional spaces. The multidimensional multiple-choice knapsack problem (MMKP) is among the most challenging of the ...

research-article

SMT-centric power-aware thread placement in chip multiprocessors

PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesOctober 2013, Pages 167–176

In Simultaneous Multi-Threading (SMT) chip multiprocessors (CMPs), thread placement is performed today in a largely power-unaware manner. For example, consolidation of active threads into fewer cores exposes opportunities for power savings that have not ...

research-article

Directory based cache coherence verification logic in CMPs cache system

MES '13: Proceedings of the First International Workshop on Many-core Embedded SystemsJune 2013, Pages 33–40https://doi.org/10.1145/2489068.2489073

This work reports a high speed protocol verificaion logic for Chip Multiprocessors (CMPs) realizing directory based cache coherence system. A special class of cellular automata (CA) referred to as single length cycle 2-attractor CA (TACA), has been ...

research-article

Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

ACM Transactions on Computer Systems (TOCS), Volume 31, Issue 1Article No.: 1, Pages 1–37https://doi.org/10.1145/2427631.2427632

Reuse Distance (RD) analysis is a powerful memory analysis tool that can potentially help architects study multicore processor scaling. One key obstacle, however, is that multicore RD analysis requires measuring Concurrent Reuse Distance (CRD) and ...

research-article

Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesSeptember 2012, Pages 273–282https://doi.org/10.1145/2370816.2370857

Non-traditional parallelism provides parallel speedup for a single thread without the need to manually divide and coordinate computation. This paper describes coalition threading, a technique that seeks the ideal combination of traditional and non-...

research-article

Mitigating the Effects of Process Variation in Ultra-low Voltage Chip Multiprocessors using Dual Supply Voltages and Half-Speed Units

IEEE Computer Architecture Letters (ICAL), Volume 11, Issue 2July 2012, Pages 45–48https://doi.org/10.1109/L-CA.2011.36

Energy efficiency is a primary concern for microprocessor designers. One very effective approach to improving processor energy efficiency is to lower its supply voltage to very near to the transistor threshold voltage. This reduces power consumption ...

research-article

Locality & utility co-optimization for practical capacity management of shared last level caches

ICS '12: Proceedings of the 26th ACM international conference on SupercomputingJune 2012, Pages 279–290https://doi.org/10.1145/2304576.2304615

Shared last-level caches (SLLCs) on chip-multiprocessors play an important role in bridging the performance gap between processing cores and main memory. Although there are already many proposals targeted at overcoming the weaknesses of the least-...

Article

Architectural Support for Exploiting Fine Grain Parallelism

HPCC '12: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and SystemsJune 2012, Pages 61–70https://doi.org/10.1109/HPCC.2012.19

The advent of multi-core processors, particularly with projections that numbers of cores will continue to increase, has focused attention on parallel programming. It is widely recognized that current programming techniques, including those that are used ...

research-article

Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis

MSPC '12: Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and CorrectnessJune 2012, Pages 2–11https://doi.org/10.1145/2247684.2247687

Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache hierarchies employed in modern CPUs. In today's hierarchies, performance is determined by complicated thread interactions, such as interference in shared ...

Article

Analytical Performance Modeling of Hierarchical Interconnect Fabrics

NOCS '12: Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-ChipMay 2012, Pages 107–114https://doi.org/10.1109/NOCS.2012.20

The continuous scaling of nanoelectronics is increasing the complexity of chip multiprocessors (CMPs) and exacerbating the memory wall problem. As CMPs become more complex, the memory subsystem is organized into more hierarchical structures to better ...

research-article

Balancing Performance and Cost in CMP Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 23, Issue 3March 2012, Pages 452–459https://doi.org/10.1109/TPDS.2011.173

This paper presents an innovative router design, called Rotary Router, which successfully addresses CMP cost/performance constraints. The router structure is based on two independent rings, which force packets to circulate either clockwise or ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences