Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–34 of 34 results for author: Sadrosadati, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19113  [pdf, other

    cs.AR cs.DC q-bio.GN

    MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

  2. arXiv:2406.18786  [pdf, other

    cs.AR

    Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution

    Authors: Rahul Bera, Adithya Ranganathan, Joydeep Rakshit, Sujit Mahto, Anant V. Nori, Jayesh Gaur, Ataberk Olgun, Konstantinos Kanellopoulos, Mohammad Sadrosadati, Sreenivas Subramoney, Onur Mutlu

    Abstract: Load instructions often limit instruction-level parallelism (ILP) in modern processors due to data and resource dependences they cause. Prior techniques like Load Value Prediction (LVP) and Memory Renaming (MRN) mitigate load data dependence by predicting the data value of a load instruction. However, they fail to mitigate load resource dependence as the predicted load instruction gets executed no… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: To appear in the proceedings of 51st International Symposium on Computer Architecture (ISCA)

  3. arXiv:2406.16153  [pdf, other

    cs.AR cs.CR

    RowPress Vulnerability in Modern DRAM Chips

    Authors: Haocong Luo, Ataberk Olgun, A. Giray Yağlıkçı, Yahya Can Tuğrul, Steve Rhyner, Meryem Banu Cavlak, Joël Lindegger, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Memory isolation is a critical property for system reliability, security, and safety. We demonstrate RowPress, a DRAM read disturbance phenomenon different from the well-known RowHammer. RowPress induces bitflips by keeping a DRAM row open for a long period of time instead of repeatedly opening and closing the row. We experimentally characterize RowPress bitflips, showing their widespread existenc… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: To Appear in IEEE MICRO Top Picks Special Issue (July-August 2024). arXiv admin note: substantial text overlap with arXiv:2306.17061

  4. arXiv:2406.13080  [pdf, other

    cs.AR cs.CR

    An Experimental Characterization of Combined RowHammer and RowPress Read Disturbance in Modern DRAM Chips

    Authors: Haocong Luo, Ismail Emir Yüksel, Ataberk Olgun, A. Giray Yağlıkçı, Mohammad Sadrosadati, Onur Mutlu

    Abstract: DRAM read disturbance can break memory isolation, a fundamental property to ensure system robustness (i.e., reliability, security, safety). RowHammer and RowPress are two different DRAM read disturbance phenomena. RowHammer induces bitflips in physically adjacent victim DRAM rows by repeatedly opening and closing an aggressor DRAM row, while RowPress induces bitflips by keeping an aggressor DRAM r… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: To appear at DSN Disrupt 2024 (June 2024)

  5. arXiv:2405.06081  [pdf, other

    cs.AR cs.DC

    Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis

    Authors: Ismail Emir Yuksel, Yahya Can Tugrul, F. Nisa Bostanci, Geraldo F. Oliveira, A. Giray Yaglikci, Ataberk Olgun, Melina Soysal, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Onur Mutlu

    Abstract: We experimentally analyze the computational capability of commercial off-the-shelf (COTS) DRAM chips and the robustness of these capabilities under various timing delays between DRAM commands, data patterns, temperature, and voltage levels. We extensively characterize 120 COTS DDR4 chips from two major manufacturers. We highlight four key results of our study. First, COTS DRAM chips are capable of… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: To appear in DSN 2024

  6. arXiv:2405.03967  [pdf, other

    cs.LG cs.AI cs.AR

    SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems

    Authors: Kailash Gogineni, Sai Santosh Dayapule, Juan Gómez-Luna, Karthikeya Gogineni, Peng Wei, Tian Lan, Mohammad Sadrosadati, Onur Mutlu, Guru Venkataramani

    Abstract: Reinforcement Learning (RL) trains agents to learn optimal behavior by maximizing reward signals from experience datasets. However, RL training often faces memory limitations, leading to execution latencies and prolonged training times. To overcome this, SwiftRL explores Processing-In-Memory (PIM) architectures to accelerate RL workloads. We achieve near-linear performance scaling by implementing… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2404.11284  [pdf, other

    cs.CR cs.AR

    Amplifying Main Memory-Based Timing Covert and Side Channels using Processing-in-Memory Operations

    Authors: Konstantinos Kanellopoulos, F. Nisa Bostanci, Ataberk Olgun, A. Giray Yaglikci, Ismail Emir Yuksel, Nika Mansouri Ghiasi, Zulal Bingol, Mohammad Sadrosadati, Onur Mutlu

    Abstract: The adoption of processing-in-memory (PiM) architectures has been gaining momentum because they provide high performance and low energy consumption by alleviating the data movement bottleneck. Yet, the security of such architectures has not been thoroughly explored. The adoption of PiM solutions provides a new way to directly access main memory, which can be potentially exploited by malicious user… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  8. arXiv:2404.07164  [pdf, other

    cs.AR cs.AI cs.DC cs.LG

    Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System

    Authors: Steve Rhyner, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu

    Abstract: Machine Learning (ML) training on large-scale datasets is a very expensive and time-consuming workload. Processor-centric architectures (e.g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i.e., due to repeatedly accessing the training dataset. As a result, processor-centric systems suffer from performance degradation and high energy consumpt… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  9. arXiv:2402.18769  [pdf, other

    cs.CR cs.AR

    CoMeT: Count-Min-Sketch-based Row Tracking to Mitigate RowHammer at Low Cost

    Authors: F. Nisa Bostanci, Ismail Emir Yuksel, Ataberk Olgun, Konstantinos Kanellopoulos, Yahya Can Tugrul, A. Giray Yaglikci, Mohammad Sadrosadati, Onur Mutlu

    Abstract: We propose a new RowHammer mitigation mechanism, CoMeT, that prevents RowHammer bitflips with low area, performance, and energy costs in DRAM-based systems at very low RowHammer thresholds. The key idea of CoMeT is to use low-cost and scalable hash-based counters to track DRAM row activations. CoMeT uses the Count-Min Sketch technique that maps each DRAM row to a group of counters, as uniquely as… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: To appear at HPCA 2024

  10. arXiv:2402.18736  [pdf, other

    cs.AR cs.DC

    Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis

    Authors: Ismail Emir Yuksel, Yahya Can Tugrul, Ataberk Olgun, F. Nisa Bostanci, A. Giray Yaglikci, Geraldo F. Oliveira, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog operational properties of DRAM circuitry to enable massively parallel in-DRAM computation. PuD has the potential to reduce or eliminate costly data movement between processing elements and main memory. Prior works experimentally demonstrate three-input MAJ (MAJ3) and two-input AND and OR operations in commercial off-the-… ▽ More

    Submitted 21 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: A shorter version of this work is to appear at the 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA-30), 2024

  11. arXiv:2402.16731  [pdf, other

    cs.AR cs.DC cs.LG cs.PF

    Accelerating Graph Neural Networks on Real Processing-In-Memory Systems

    Authors: Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko

    Abstract: Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple p… ▽ More

    Submitted 25 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  12. arXiv:2312.02880  [pdf, other

    cs.AR cs.DC

    PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips

    Authors: Ismail Emir Yuksel, Yahya Can Tugrul, F. Nisa Bostanci, Abdullah Giray Yaglikci, Ataberk Olgun, Geraldo F. Oliveira, Melina Soysal, Haocong Luo, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where bulk-bitwise operations are performed leveraging intrinsic analog properties within the DRAM array and massive parallelism across DRAM columns. Unfortunately, 1)… ▽ More

    Submitted 18 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  13. arXiv:2311.12527  [pdf, other

    cs.AR q-bio.GN q-bio.QM

    MetaStore: High-Performance Metagenomic Analysis via In-Storage Computing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Ma, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advancements in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases containing information on different species' genomes. Metagenomic analysis suffers from significant data movement overhead due to moving large amo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  14. Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

    Authors: Konstantinos Kanellopoulos, Hong Chul Nam, F. Nisa Bostanci, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Davide-Basilio Bartolini, Onur Mutlu

    Abstract: Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) large software-managed TLBs. Unfortunately, both solutions have significant drawbacks: increased access latency, power and area (for hardware TLBs), an… ▽ More

    Submitted 5 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

    ACM Class: C.0

  15. arXiv:2306.17061  [pdf, other

    cs.CR cs.AR

    RowPress: Amplifying Read Disturbance in Modern DRAM Chips

    Authors: Haocong Luo, Ataberk Olgun, A. Giray Yağlıkçı, Yahya Can Tuğrul, Steve Rhyner, Meryem Banu Cavlak, Joël Lindegger, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Memory isolation is critical for system reliability, security, and safety. Unfortunately, read disturbance can break memory isolation in modern DRAM chips. For example, RowHammer is a well-studied read-disturb phenomenon where repeatedly opening and closing (i.e., hammering) a DRAM row many times causes bitflips in physically nearby rows. This paper experimentally demonstrates and analyzes anoth… ▽ More

    Submitted 28 March, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: Extended version of the paper "RowPress: Amplifying Read Disturbance in Modern DRAM Chips" at the 50th Annual International Symposium on Computer Architecture (ISCA), 2023

  16. Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

    Authors: Rakesh Nadig, Mohammad Sadrosadati, Haiyu Mao, Nika Mansouri Ghiasi, Arash Tavakkol, Jisung Park, Hamid Sarbazi-Azad, Juan Gómez Luna, Onur Mutlu

    Abstract: The performance and capacity of solid-state drives (SSDs) are continuously improving to meet the increasing demands of modern data-intensive applications. Unfortunately, communication between the SSD controller and memory chips (e.g., 2D/3D NAND flash chips) is a critical performance bottleneck for many applications. SSDs use a multi-channel shared bus architecture where multiple memory chips conn… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: To appear in Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023

  17. arXiv:2304.01951  [pdf, other

    cs.MS cs.AR cs.DC cs.LG

    TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

    Authors: Maurus Item, Juan Gómez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures s… ▽ More

    Submitted 5 September, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Our open-source software is available at https://github.com/CMU-SAFARI/transpimlib

  18. arXiv:2212.06292  [pdf, other

    cs.AR cs.DC

    ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

    Authors: Nika Mansouri Ghiasi, Nandita Vijaykumar, Geraldo F. Oliveira, Lois Orosa, Ivan Fernandez, Mohammad Sadrosadati, Konstantinos Kanellopoulos, Nastaran Hajinazar, Juan Gómez Luna, Onur Mutlu

    Abstract: Partitioning applications between NDP and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated from one segment (e.g., instructions, functions) and used in consecutive segments. Prior works take two approaches to this problem. The first class of works maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-seg… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: To appear in IEEE TETC

  19. arXiv:2212.04953  [pdf, other

    q-bio.GN cs.AI cs.LG

    TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

    Authors: Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu

    Abstract: Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for… ▽ More

    Submitted 14 September, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

  20. Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings

    Authors: Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, Nisa Bostanci, Can Firtina, Rachata Ausavarungnirun, Rakesh Kumar, Nastaran Hajinazar, Mohammad Sadrosadati, Nandita Vijaykumar, Onur Mutlu

    Abstract: Conventional virtual memory (VM) frameworks enable a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical mappings, which leads to high address translation latency and large translation-induced interference in the memory hierarchy. On the other hand, restricting the address mapping so that a virtual address can on… ▽ More

    Submitted 6 October, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

    ACM Class: C.0

  21. arXiv:2210.08508  [pdf, other

    cs.AR cs.DC

    RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Geraldo F. Oliveira, Konstantinos Kanellopoulos, Rachata Ausavarungnirun, Juan Gómez Luna, Aditya Manglik, João Ferreira, Jeremie S. Kim, Christina Giannoula, Nandita Vijaykumar, Jisung Park, Onur Mutlu

    Abstract: Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

  22. arXiv:2209.10914  [pdf, other

    cs.AR

    Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources

    Authors: Sina Darabi, Mohammad Sadrosadati, Joël Lindegger, Negar Akbarzadeh, Mohammad Hosseini, Jisung Park, Juan Gómez-Luna, Hamid Sarbazi-Azad, Onur Mutlu

    Abstract: Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel applications. In many GPU applications, GPU memory bandwidth bottlenecks performance, causing underutilization of GPU cores. Hence, disabling many cores does not affect the performance of memory-bound workloads. While simply power-gating unused GPU cores would save energy, prior works attempt to better utilize GPU core… ▽ More

    Submitted 6 April, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

  23. arXiv:2209.08600  [pdf, other

    cs.AR cs.DS q-bio.GN

    GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

    Authors: Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, Onur Mutlu

    Abstract: Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The secon… ▽ More

    Submitted 17 December, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

    Comments: 17 pages, 13 figures

  24. arXiv:2209.05566  [pdf, other

    cs.AR cs.DC

    Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory

    Authors: Jisung Park, Roknoddin Azizi, Geraldo F. Oliveira, Mohammad Sadrosadati, Rakesh Nadig, David Novo, Juan Gómez-Luna, Myungsuk Kim, Onur Mutlu

    Abstract: Bulk bitwise operations, i.e., bitwise operations on large bit vectors, are prevalent in a wide range of important application domains, including databases, graph processing, genome analysis, cryptography, and hyper-dimensional computing. In conventional systems, the performance and energy efficiency of bulk bitwise operations are bottlenecked by data movement between the compute units and the mem… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

  25. arXiv:2209.00188  [pdf, other

    cs.AR cs.LG

    Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

    Authors: Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average a… ▽ More

    Submitted 30 September, 2022; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

    ACM Class: B.3.2; C.0

  26. arXiv:2106.05632  [pdf, other

    cs.AR cs.CR

    CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations

    Authors: Lois Orosa, Yaohua Wang, Mohammad Sadrosadati, Jeremie S. Kim, Minesh Patel, Ivan Puddu, Haocong Luo, Kaveh Razavi, Juan Gómez-Luna, Hasan Hassan, Nika Mansouri-Ghiasi, Saugata Ghose, Onur Mutlu

    Abstract: DRAM is the dominant main memory technology used in modern computing systems. Computing systems implement a memory controller that interfaces with DRAM via DRAM commands. DRAM executes the given commands using internal components (e.g., access transistors, sense amplifiers) that are orchestrated by DRAM internal timings, which are fixed foreach DRAM command. Unfortunately, the use of fixed interna… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Extended version of an ISCA 2021 paper

    ACM Class: B.3; K.6.5

  27. arXiv:2105.03725  [pdf, other

    cs.AR cs.DC cs.PF

    DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

    Authors: Geraldo F. Oliveira, Juan Gómez-Luna, Lois Orosa, Saugata Ghose, Nandita Vijaykumar, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data… ▽ More

    Submitted 6 April, 2023; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: Our open source software is available at https://github.com/CMU-SAFARI/DAMOV

  28. pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

    Authors: João Dinis Ferreira, Gabriel Falcao, Juan Gómez-Luna, Mohammed Alser, Lois Orosa, Mohammad Sadrosadati, Jeremie S. Kim, Geraldo F. Oliveira, Taha Shahroodi, Anant Nori, Onur Mutlu

    Abstract: Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory… ▽ More

    Submitted 3 October, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

    ACM Class: B.3.1; C.1.3

    Journal ref: IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, 900-919

  29. arXiv:2010.09330  [pdf, other

    cs.AR

    Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation

    Authors: Mohammad Sadrosadati, Amirhossein Mirhosseini, Ali Hajiabadi, Seyed Borna Ehsani, Hajar Falahati, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, Onur Mutlu

    Abstract: Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching reg… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: To Appear in ACM Transactions on Computer Systems (TOCS)

  30. arXiv:2009.08437  [pdf, other

    cs.AR

    FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching

    Authors: Yaohua Wang, Lois Orosa, Xiangjun Peng, Yang Guo, Saugata Ghose, Minesh Patel, Jeremie S. Kim, Juan Gómez Luna, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Onur Mutlu

    Abstract: DRAM Main memory is a performance bottleneck for many applications due to the high access latency. In-DRAM caches work to mitigate this latency by augmenting regular-latency DRAM with small-but-fast regions of DRAM that serve as a cache for the data held in the regular-latency region of DRAM. While an effective in-DRAM cache can allow a large fraction of memory requests to be served from a fast DR… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

    Comments: To appear in the MICRO 2020 conference proceedings

  31. NOM: Network-On-Memory for Inter-Bank Data Transfer in Highly-Banked Memories

    Authors: Seyyed Hossein SeyyedAghaei Rezaei, Mehdi Modarressi, Rachata Ausavarungnirun, Mohammad Sadrosadati, Onur Mutlu, Masoud Daneshtalab

    Abstract: Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the DRAM chip and the processor chip. Some prior mechanisms propose to avoid this unnecessary data movement by using the shared internal bus in the DRAM chip to directl… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

    Comments: To appear in IEEE Computer Architecture Letters in 2020

    Report number: CAL-2019-12-0121

  32. arXiv:1902.07344  [pdf, other

    cs.CR

    Dataplant: Enhancing System Security with Low-Cost In-DRAM Value Generation Primitives

    Authors: Lois Orosa, Yaohua Wang, Ivan Puddu, Mohammad Sadrosadati, Kaveh Razavi, Juan Gómez-Luna, Hasan Hassan, Nika Mansouri-Ghiasi, Arash Tavakkol, Minesh Patel, Jeremie Kim, Vivek Seshadri, Uksong Kang, Saugata Ghose, Rodolfo Azevedo, Onur Mutlu

    Abstract: DRAM manufacturers have been prioritizing memory capacity, yield, and bandwidth for years, while trying to keep the design complexity as simple as possible. DRAM chips do not carry out any computation or other important functions, such as security. Processors implement most of the existing security mechanisms that protect the system against security threats, because 1) executing security mechanism… ▽ More

    Submitted 5 November, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

  33. arXiv:1812.11473  [pdf, other

    cs.LG stat.ML

    ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

    Authors: Hajar Falahati, Pejman Lotfi-Kamran, Mohammad Sadrosadati, Hamid Sarbazi-Azad

    Abstract: Memory bandwidth bottleneck is a major challenges in processing machine learning (ML) algorithms. In-memory acceleration has potential to address this problem; however, it needs to address two challenges. First, in-memory accelerator should be general enough to support a large set of different ML algorithms. Second, it should be efficient enough to utilize bandwidth while meeting limited power and… ▽ More

    Submitted 9 January, 2019; v1 submitted 30 December, 2018; originally announced December 2018.

    Comments: 11 pages, 9 figures

    MSC Class: 68M01

  34. arXiv:1810.09360  [pdf, other

    cs.DC

    Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions

    Authors: Arash Tavakkol, Aasheesh Kolli, Stanko Novakovic, Kaveh Razavi, Juan Gomez-Luna, Hasan Hassan, Claude Barthels, Yaohua Wang, Mohammad Sadrosadati, Saugata Ghose, Ankit Singla, Pratap Subrahmanyam, Onur Mutlu

    Abstract: Synchronous Mirroring (SM) is a standard approach to building highly-available and fault-tolerant enterprise storage systems. SM ensures strong data consistency by maintaining multiple exact data replicas and synchronously propagating every update to all of them. Such strong consistency provides fault tolerance guarantees and a simple programming model coveted by enterprise system designers. For c… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.