Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2019
A stochastic-computing based deep learning framework using adiabatic quantum-flux-parametron superconducting technology
- Ruizhe Cai,
- Ao Ren,
- Olivia Chen,
- Ning Liu,
- Caiwen Ding,
- Xuehai Qian,
- Jie Han,
- Wenhui Luo,
- Nobuyuki Yoshikawa,
- Yanzhi Wang
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 567–578https://doi.org/10.1145/3307650.3322270The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially 104--105 gain compared with state-of-the-art CMOS. In 2016,...
- research-articleJune 2019
CoNDA: efficient cache coherence support for near-data accelerators
- Amirali Boroumand,
- Saugata Ghose,
- Minesh Patel,
- Hasan Hassan,
- Brandon Lucia,
- Rachata Ausavarungnirun,
- Kevin Hsieh,
- Nastaran Hajinazar,
- Krishna T. Malladi,
- Hongzhong Zheng,
- Onur Mutlu
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 629–642https://doi.org/10.1145/3307650.3322266Specialized on-chip accelerators are widely used to improve the energy efficiency of computing systems. Recent advances in memory technology have enabled near-data accelerators (NDAs), which reside off-chip close to main memory and can yield further ...
- research-articleJune 2019
Energy-efficient video processing for virtual reality
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 91–103https://doi.org/10.1145/3307650.3322264Virtual reality (VR) has huge potential to enable radically new applications, behind which spherical panoramic video processing is one of the backbone techniques. However, current VR systems reuse the techniques designed for processing conventional ...
- research-articleJune 2019
Duality cache for data parallel acceleration
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 397–410https://doi.org/10.1145/3307650.3322257Duality Cache is an in-cache computation architecture that enables general purpose data parallel applications to run on caches. This paper presents a holistic approach of building Duality Cache system stack with techniques of performing in-cache floating ...
- research-articleJune 2019
Laconic deep learning inference acceleration
- Sayeh Sharify,
- Alberto Delmas Lascorz,
- Mostafa Mahmoud,
- Milos Nikolic,
- Kevin Siu,
- Dylan Malone Stuart,
- Zissis Poulos,
- Andreas Moshovos
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 304–317https://doi.org/10.1145/3307650.3322255We present a method for transparently identifying ineffectual computations during inference with Deep Learning models. Specifically, by decomposing multiplications down to the bit level, the amount of work needed by multiplications during inference can ...
- research-articleJune 2019
SCU: a GPU stream compaction unit for graph processing
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 424–435https://doi.org/10.1145/3307650.3322254Graph processing algorithms are key in many emerging applications in areas such as machine learning and data analytics. Although the processing of large scale graphs exhibits a high degree of parallelism, the memory access pattern tend to be highly ...
- research-articleJune 2019
Scalable interconnects for reconfigurable spatial architectures
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 615–628https://doi.org/10.1145/3307650.3322249Recent years have seen the increased adoption of Coarse-Grained Reconfigurable Architectures (CGRAs) as flexible, energy-efficient compute accelerators. Obtaining performance using spatial architectures while supporting diverse applications requires a ...
- research-articleJune 2019
AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers
- Grant Ayers,
- Nayana Prasad Nagendra,
- David I. August,
- Hyoun Kyu Cho,
- Svilen Kanev,
- Christos Kozyrakis,
- Trivikram Krishnamurthy,
- Heiner Litz,
- Tipp Moseley,
- Parthasarathy Ranganathan
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 462–473https://doi.org/10.1145/3307650.3322234The large instruction working sets of private and public cloud workloads lead to frequent instruction cache misses and costs in the millions of dollars. While prior work has identified the growing importance of this problem, to date, there has been ...
- research-articleJune 2019
Designing vertical processors in monolithic 3D
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 643–656https://doi.org/10.1145/3307650.3322233A processor laid out vertically in stacked layers can benefit from reduced wire delays, low energy consumption, and a small footprint. Such a design can be enabled by Monolithic 3D (M3D), a technology that provides short wire lengths, good thermal ...
- research-articleJune 2019
TWiCe: preventing row-hammering by exploiting time window counters
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 385–396https://doi.org/10.1145/3307650.3322232Computer systems using DRAM are exposed to row-hammer (RH) attacks, which can flip data in a DRAM row without directly accessing a row but by frequently activating its adjacent ones. There have been a number of proposals to prevent RH, but they either ...
- research-articleJune 2019
CROW: a low-cost substrate for improving DRAM performance, energy efficiency, and reliability
- Hasan Hassan,
- Minesh Patel,
- Jeremie S. Kim,
- A. Giray Yaglikci,
- Nandita Vijaykumar,
- Nika Mansouri Ghiasi,
- Saugata Ghose,
- Onur Mutlu
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 129–142https://doi.org/10.1145/3307650.3322231DRAM has been the dominant technology for architecting main memory for decades. Recent trends in multi-core system design and large-dataset applications have amplified the role of DRAM as a critical system bottleneck. We propose Copy-Row DRAM (CROW), a ...
- research-articleJune 2019
Cambricon-F: machine learning computers with fractal von neumann architecture
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 788–801https://doi.org/10.1145/3307650.3322226Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning ...
- research-articleJune 2019
Efficient metadata management for irregular data prefetching
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 449–461https://doi.org/10.1145/3307650.3322225Temporal prefetchers have the potential to prefetch arbitrary memory access patterns, but they require large amounts of metadata that must typically be stored in DRAM. In 2013, the Irregular Stream Buffer (ISB), showed how this metadata could be cached ...
- research-articleJune 2019
Linebacker: preserving victim cache lines in idle register files of GPUs
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 183–196https://doi.org/10.1145/3307650.3322222Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp ...
- research-articleJune 2019
Cryogenic computer architecture modeling with memory-side case studies
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 774–787https://doi.org/10.1145/3307650.3322219Modern computer architectures suffer from lack of architectural innovations, mainly due to the power wall and the memory wall. That is, architectural innovations become infeasible because they can prohibitively increase power consumption and their ...
- research-articleJune 2019
MnnFast: a fast and scalable system architecture for memory-augmented neural networks
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 250–263https://doi.org/10.1145/3307650.3322214Memory-augmented neural networks are getting more attention from many researchers as they can make an inference with the previous history stored in memory. Especially, among these memory-augmented neural networks, memory networks are known for their huge ...
- research-articleJune 2019
Perceptron-based prefetch filtering
ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitecturePages 1–13https://doi.org/10.1145/3307650.3322207Hardware prefetching is an effective technique for hiding cache miss latencies in modern processor designs. Prefetcher performance can be characterized by two main metrics that are generally at odds with one another: coverage, the fraction of baseline ...