Keyword: memory wall : Search

research-article

Page Size Aware Cache Prefetching

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 956–974https://doi.org/10.1109/MICRO56248.2022.00070

The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system performance due to the disparity between processor and memory speeds. Prefetching ...

research-article

ASSASIN: Architecture Support for Stream Computing to Accelerate Computational Storage

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 354–368https://doi.org/10.1109/MICRO56248.2022.00035

Computational storage adds computing to storage devices, providing potential benefits in offload, data-reduction, and lower energy. Successful computational SSD architectures should match growing flash bandwidth, which in turn requires high SSD DRAM ...

research-article

Open Access

CAKE: matrix multiplication using constant-bandwidth blocks

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 85, Pages 1–14https://doi.org/10.1145/3458817.3476166

We offer a novel approach to matrix-matrix multiplication computation on computing platforms with memory hierarchies. Constant-bandwidth (CB) blocks improve computation throughput for architectures limited by external memory bandwidth. Configuring the ...

research-article

Computing Utilization Enhancement for Chiplet-based Homogeneous Processing-in-Memory Deep Learning Processors

GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSIPages 241–246https://doi.org/10.1145/3453688.3461499

This paper presents a design strategy of chiplet-based processing-in-memory systems for deep neural network applications. Monolithic silicon chips are area and power limited, failing to catch the recent rapid growth of deep learning algorithms. The paper ...

research-article

FeFET-based low-power bitwise logic-in-memory with direct write-back and data-adaptive dynamic sensing interface

ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and DesignPages 127–132https://doi.org/10.1145/3370748.3406572

Compute-in-memory (CiM) is a promising method for mitigating the memory wall problem in data-intensive applications. The proposed bitwise logic-in-memory (BLiM) is targeted at data intensive applications, such as database, data encryption. This work ...

research-article

A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool

MEMSYS '17: Proceedings of the International Symposium on Memory SystemsPages 71–82https://doi.org/10.1145/3132402.3132403

Derived by the demand for ever increasing computing performance, a steadily widening performance gap between memory and processor architectures has emerged. While attempting to mitigate the effects for processing systems that already face the exascale ...

article

Core Module Optimizing PDE Sparse Matrix Models with HPCG Example

Jennings

Supercomputing Frontiers and Innovations: an International Journal (SCFI), Volume 4, Issue 2Pages 54–70https://doi.org/10.14529/jsfi170205

This paper introduces a fundamentally new computer architecture for supercomputers. The core module is application compatible with an existing superscalar microprocessor, with minimized energy use, and is optimized for local sparse matrix operations. ...

article

The Simultaneous Transmit And Receive STAR Message Protocol

Jennings

Supercomputing Frontiers and Innovations: an International Journal (SCFI), Volume 4, Issue 2Pages 38–53https://doi.org/10.14529/jsfi170204

The STAR protocol is proposed, which solves three inherent problems with MPI, a well known security problem caused by data memory access faults, and the following four exascale communication problems. Exascale systems must efficiently save the state of ...

research-article

Data-Centric Computing Frontiers: A Survey On Processing-In-Memory

MEMSYS '16: Proceedings of the Second International Symposium on Memory SystemsPages 295–308https://doi.org/10.1145/2989081.2989087

A major shift from compute-centric to data-centric computing systems can be perceived, as novel big data workloads like cognitive computing and machine learning strongly enforce embarrassingly parallel and highly efficient processor architectures. With ...

research-article

Filtered runahead execution with a runahead buffer

MICRO-48: Proceedings of the 48th International Symposium on MicroarchitecturePages 358–369https://doi.org/10.1145/2830772.2830812

Runahead execution dynamically expands the instruction window of an out of order processor to generate memory level parallelism (MLP) while the core would otherwise be stalled. Unfortunately, runahead has the disadvantage of requiring the front-end to ...

research-article

Scalable Multicore k-NN Search via Subspace Clustering for Filtering

IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 26, Issue 12Pages 3449–3460https://doi.org/10.1109/TPDS.2014.2372755

k Nearest Neighbors (k-NN) search is a widely used category of algorithms with applications in domains such as computer vision and machine learning. Despite the desire to process increasing amounts of high-dimensional data within these domains, k-NN ...

research-article

C²-bound: a capacity and concurrency driven analytical model for many-core design

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 48, Pages 1–11https://doi.org/10.1145/2807591.2807641

In this paper, we propose C²-Bound, a data-driven analytical model, that incorporates both memory capacity and data access concurrency factors to optimize many-core design. C²-Bound is characterized by combining the newly proposed latency model, ...

short-paper

Data filtering for scalable high-dimensional k-NN search on multicore systems

HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computingPages 305–310https://doi.org/10.1145/2600212.2600710

K Nearest Neighbors (k-NN) search is a widely used category of algorithms with applications in domains such as computer vision and machine learning. With the rapidly increasing amount of data available, and their high dimensionality, k-NN algorithms ...

Article

Universal Numerical Encoder and Profiler Reduces Computing's Memory Wall with Software, FPGA, and SoC Implementations

Al Wegener

DCC '13: Proceedings of the 2013 Data Compression ConferencePage 528https://doi.org/10.1109/DCC.2013.107

Numerical computations have accelerated significantly since 2005 thanks to two complementary, silicon-enabled trends: multi-core processing and single instruction, multiple data (SIMD) accelerators. Unfortunately, due to fundamental limitations of ...

keynote

Blue Gene/Q: design for sustained multi-petaflop computing

Michael Gschwind

ICS '12: Proceedings of the 26th ACM international conference on SupercomputingPages 245–246https://doi.org/10.1145/2304576.2304609

The Blue Gene/Q system represents the third generation of optimized high-performance computing Blue Gene solution servers and provides a platform for continued growth in HPC performance and capability. Blue Gene/Q started with a new design of the ...

Article

A Study of the Memory Wall within the Jacobi Iteration Method

HPCC '12: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and SystemsPages 964–969https://doi.org/10.1109/HPCC.2012.140

In recent years, a great number of applications have been implemented on the CMP and achieved good performance. The success of the parallelism of a great deal of applications on CMP shows a bright future of the development of the CMP. However, some ...

Article

Cache Accurate Time Skewing in Iterative Stencil Computations

ICPP '11: Proceedings of the 2011 International Conference on Parallel ProcessingPages 571–581https://doi.org/10.1109/ICPP.2011.47

We present a time skewing algorithm that breaks the memory wall for certain iterative stencil computations. A stencil computation, even with constant weights, is a completely memory-bound algorithm. For example, for a large 3D domain of $500^3$ doubles ...

research-article

Pinned to the walls: impact of packaging and application properties on the memory and power walls

ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and designPages 51–56

This article presents a study of the impact of packaging on the memory and power walls, in the context of application properties. The analysis is supported by characterizations of 130 hardware designs spanning 30 years, along with both ...

research-article

Cache injection for parallel applications

HPDC '11: Proceedings of the 20th international symposium on High performance distributed computingPages 15–26https://doi.org/10.1145/1996130.1996135

For two decades, the memory wall has affected many applications in their ability to benefit from improvements in processor speed. Cache injection addresses this disparity for I/O by writing data into a processor's cache directly from the I/O bus. This ...

Article

PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers

CCGRID '11: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid ComputingPages 265–274https://doi.org/10.1109/CCGrid.2011.27

Cache replacement policy plays an important role in guaranteeing the availability of cache blocks, reducing miss rates, and improving applications' overall performance. However, recent research efforts on improving replacement policies require either ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences