Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–13 of 13 results for author: Boroumand, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.08938  [pdf, other

    cs.AR cs.DC cs.LG

    Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

    Authors: Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu

    Abstract: Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches l… ▽ More

    Submitted 27 March, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: This is an extended and updated version of a paper published in IEEE Micro, pp. 1-14, 29 Aug. 2022. arXiv admin note: text overlap with arXiv:2109.14320

  2. arXiv:2205.14664  [pdf, other

    cs.AR cs.AI cs.DB cs.DC cs.LG

    Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

    Authors: Geraldo F. Oliveira, Amirali Boroumand, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

    Abstract: Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major bottleneck for system performance and energy consumption. One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging a… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

  3. arXiv:2204.11275  [pdf, other

    cs.AR cs.DB

    Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Cooperation

    Authors: Amirali Boroumand, Saugata Ghose, Geraldo F. Oliveira, Onur Mutlu

    Abstract: A growth in data volume, combined with increasing demand for real-time analysis (using the most recent data), has resulted in the emergence of database systems that concurrently support transactions and data analytics. These hybrid transactional and analytical processing (HTAP) database systems can support real-time data analysis without the high costs of synchronizing across separate single-purpo… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: Accepted to ICDE 2022. arXiv admin note: substantial text overlap with arXiv:2103.00798

  4. arXiv:2111.02325  [pdf, other

    cs.AR cs.PF

    Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study

    Authors: Geraldo F. Oliveira, Saugata Ghose, Juan Gómez-Luna, Amirali Boroumand, Alexis Savery, Sonny Rao, Salman Qazi, Gwendal Grignou, Rahul Thakur, Eric Shiu, Onur Mutlu

    Abstract: The number and diversity of consumer devices are growing rapidly, alongside their target applications' memory consumption. Unfortunately, DRAM scalability is becoming a limiting factor to the available memory capacity in consumer devices. As a potential solution, manufacturers have introduced emerging non-volatile memories (NVMs) into the market, which can be used to increase the memory capacity o… ▽ More

    Submitted 19 September, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: This paper has been accepted by IEEE Access

  5. arXiv:2109.14320  [pdf, other

    cs.AR cs.LG

    Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

    Authors: Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

    Abstract: Emerging edge computing platforms often contain machine learning (ML) accelerators that can accelerate inference for a wide range of neural network (NN) models. These models are designed to fit within the limited area and energy constraints of the edge computing platforms, each targeting various applications (e.g., face detection, speech recognition, translation, image captioning, video analytics)… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: This work appears at the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT 2021). arXiv admin note: substantial text overlap with arXiv:2103.00768

  6. arXiv:2103.00798  [pdf, other

    cs.AR cs.DB

    Polynesia: Enabling Effective Hybrid Transactional/Analytical Databases with Specialized Hardware/Software Co-Design

    Authors: Amirali Boroumand, Saugata Ghose, Geraldo F. Oliveira, Onur Mutlu

    Abstract: An exponential growth in data volume, combined with increasing demand for real-time analysis (i.e., using the most recent data), has resulted in the emergence of database systems that concurrently support transactions and data analytics. These hybrid transactional and analytical processing (HTAP) database systems can support real-time data analysis without the high costs of synchronizing across se… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  7. arXiv:2103.00768  [pdf, other

    cs.AR cs.LG

    Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

    Authors: Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

    Abstract: As the need for edge computing grows, many modern consumer devices now contain edge machine learning (ML) accelerators that can compute a wide range of neural network (NN) models while still fitting within tight resource constraints. We analyze a commercial Edge TPU using 24 Google edge NN models (including CNNs, LSTMs, transducers, and RCNNs), and find that the accelerator suffers from three shor… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  8. arXiv:2009.07692  [pdf, other

    cs.AR q-bio.GN

    GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

    Authors: Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu

    Abstract: Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. A major co… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: To appear in MICRO 2020

  9. arXiv:1907.12947  [pdf

    cs.DC cs.AR

    A Workload and Programming Ease Driven Perspective of Processing-in-Memory

    Authors: Saugata Ghose, Amirali Boroumand, Jeremie S. Kim, Juan Gómez-Luna, Onur Mutlu

    Abstract: Many modern and emerging applications must process increasingly large volumes of data. Unfortunately, prevalent computing paradigms are not designed to efficiently handle such large-scale data: the energy and performance costs to move this data between the memory subsystem and the CPU now dominate the total costs of computation. This forces system architects and designers to fundamentally rethink… ▽ More

    Submitted 26 July, 2019; originally announced July 2019.

  10. arXiv:1802.00320  [pdf, other

    cs.AR

    Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

    Authors: Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, Onur Mutlu

    Abstract: Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a pin-limited memory channel to the CPU before any computation can take place. This requires a high latency and energy overhead, and the data often cannot benefit from c… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

  11. arXiv:1706.08870  [pdf, other

    cs.AR

    Using ECC DRAM to Adaptively Increase Memory Capacity

    Authors: Yixin Luo, Saugata Ghose, Tianshi Li, Sriram Govindan, Bikash Sharma, Bryan Kelly, Amirali Boroumand, Onur Mutlu

    Abstract: Modern DRAM modules are often equipped with hardware error correction capabilities, especially for DRAM deployed in large-scale data centers, as process technology scaling has increased the susceptibility of these devices to errors. To provide fast error detection and correction, error-correcting codes (ECC) are placed on an additional DRAM chip in a DRAM module. This additional chip expands the r… ▽ More

    Submitted 28 June, 2017; v1 submitted 27 June, 2017; originally announced June 2017.

  12. arXiv:1706.03162  [pdf, other

    cs.AR

    LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures

    Authors: Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Nastaran Hajinazar, Kevin Hsieh, Krishna T. Malladi, Hongzhong Zheng, Onur Mutlu

    Abstract: Processing-in-memory (PIM) architectures have seen an increase in popularity recently, as the high internal bandwidth available within 3D-stacked memory provides greater incentive to move some computation into the logic layer of the memory. To maintain program correctness, the portions of a program that are executed in memory must remain coherent with the portions of the program that continue to e… ▽ More

    Submitted 9 June, 2017; originally announced June 2017.

  13. arXiv:1611.09988  [pdf, other

    cs.AR

    Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM

    Authors: Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, Todd C. Mowry

    Abstract: Bitwise operations are an important component of modern day programming. Many widely-used data structures (e.g., bitmap indices in databases) rely on fast bitwise operations on large bit vectors to achieve high performance. Unfortunately, in existing systems, regardless of the underlying architecture (e.g., CPU, GPU, FPGA), the throughput of such bulk bitwise operations is limited by the available… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

    Comments: arXiv admin note: text overlap with arXiv:1605.06483