2012 International Conference on Communications, Devices and Intelligent Systems (CODIS), 2012
Abstract In order to accurately model high frequency effects inductance had been taken into consi... more Abstract In order to accurately model high frequency effects inductance had been taken into consideration. Earlier only the delay caused due to the presence of gates was considered to be an important issue, but now with decreasing feature size and increasing complexity, on chip interconnect delay has acquired prominence for incremental performance-driven layout synthesis. We develop a novel analytical delay model, for RLCG interconnect lines that in addition to preserving the effectiveness of previous RLCG interconnect models, improves ...
Proceedings of the 52nd Annual Design Automation Conference, 2015
Vertical monolayer heterojunction FETs based on transition metal dichalcogenides (TMDCFETs) and p... more Vertical monolayer heterojunction FETs based on transition metal dichalcogenides (TMDCFETs) and planar black phosphorus FETs (BPFETs) have demonstrated excellent sub-threshold swing, high ION/IOFF, and high scalability, making them attractive candidates for post-CMOS memory design. This paper explores TMDCFET and BPFET SRAM design by combining atomistic self-consistent device modeling with SRAM circuit design and simulation. Our simulations show that at low operating voltages, TMDCFET and BPFET SRAMs exhibit significant advantages in static power, dynamic read/write noise margin, and read/write delay over both nominal and read/write-assisted 16nm CMOS SRAMs.
Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern... more Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern main memories. Oblivious RAM (ORAM) is an effective cryptographic primitive developed to thwart access-pattern-based attacks in DRAM-based systems. However, in emerging non-volatile memory (NVM) systems, the increased writes due to encryption of multiple data blocks on every Path ORAM (state-of-the-art efficient ORAM) access impose significant energy, lifetime, and performance overheads. LEO (<bold><italic><underline>L</underline>ow overhead <underline>E </underline>ncryption <underline>O</underline>RAM</italic></bold>) is an efficient Path ORAM encryption architecture that addresses the high write overheads of ORAM integration in NVMs, while providing security equivalent to the baseline Path ORAM. LEO reduces NVM cell writes by securely decreasing the number of block encryptions during the write phase of a Path ORAM access. LEO uses a secure, two-level counter mode encryption framework that opportunistically eliminates re-encryption of unmodified blocks, reducing NVM writes. Our evaluations show that on average, LEO decreases NVM energy by 60 percent, improves lifetime by 1.51<inline-formula><tex-math notation="LaTeX"> $\times$</tex-math><alternatives><inline-graphic xlink:href="rakshit-ieq1-2795621.gif"/></alternatives></inline-formula> , and increases performance by 9 percent over the baseline Path ORAM.
2018 IEEE 36th International Conference on Computer Design (ICCD), 2018
Modern memory systems are susceptible to data confidentiality attacks that leverage memory access... more Modern memory systems are susceptible to data confidentiality attacks that leverage memory access pattern information to obtain secret data. Oblivious RAM (ORAM) is a secure cryptographic construct that effectively thwarts access-pattern-based attacks. However, in Path ORAM (state-of-the-art efficient ORAM for main memories) and its variants, each memory request (read or write) is transformed to an ORAM access, which is a sequence of read and write operations, increasing the latency of the memory requests and degrading system performance. In practice, the ORAM access for a read request is on the critical path of program execution, blocked by ORAM accesses for older write requests. Although modern memory controllers (MCs) realize read prioritization through write buffering, the ORAM access translation of each memory request to multiple memory read and write operations results in frequent MC write buffer overflow, decreasing its efficiency. ReadPRO (Read Prioritization) scheduling in ...
Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving ... more Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizing general-purpose single thread performance, and is overkill for relatively simpler, but still hugely important, data parallel DNN inference workloads. Addressing this disparity efficiently can enable both raw performance scaling and overall performance/Watt improvements for multi-core CPU DNN inference. We present REDUCT, where we build innovative solutions that bypass traditional CPU resources which impact DNN inference power and limit its performance. Fundamentally, REDUCT’s “Keep it close” policy enables consecutive pieces of work to be executed close to each other. REDUCT enables instruction delivery...
Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving ... more Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizing general-purpose single thread performance, and is overkill for relatively simpler, but still hugely important, data parallel DNN inference workloads. Addressing this disparity efficiently can enable both raw performance scaling and overall performance/Watt improvements for multi-core CPU DNN inference.We present REDUCT, where we build innovative solutions that bypass traditional CPU resources which impact DNN inference power and limit its performance. Fundamentally, REDUCT’s "Keep it close" policy enables consecutive pieces of work to be executed close to each other. REDUCT enables instruction...
Scaling off-chip bandwidth is challenging due to fundamental limitations, such as a fixed pin cou... more Scaling off-chip bandwidth is challenging due to fundamental limitations, such as a fixed pin count and plateauing signaling rates. Recently, vendors have turned to 2.5D and 3D stacking to closely integrate system components. Interestingly, these technologies can integrate a logic layer under multiple memory dies, enabling computing capability inside a memory stack. This trend in stacking is making PIM architectures commercially viable. In this work, we investigate the suitability of offloading kernels in scientific applications onto 3D stacked PIM architectures. We evaluate several hardware constraints resulted from the stacked structure. We perform extensive simulation experiments and in-depth analysis to quantify the impact of application locality in TLBs, data caches, and memory stacks. Our results also identify design optimization areas in software and hardware for HPC scientific applications.
The wide adoption of cloud computing has established integrity and confidentiality of data in mem... more The wide adoption of cloud computing has established integrity and confidentiality of data in memory as a first order design concern in modern computing systems. Data integrity is ensured by Merkle Tree (MT) memory authentication. However, in the context of emerging non-volatile memories (NVMs), the MT memory authentication related increase in cell writes and memory accesses impose significant energy, lifetime, and performance overheads. This dissertation presents ASSURE, an Authentication Scheme for SecURE (ASSURE) energy efficient NVMs. ASSURE integrates (i) smart message authentication codes with (ii) multi-root MTs to decrease MT reads and writes, while also reducing the number of cell writes on each MT write. Whereas data confidentiality is effectively ensured by encryption, the memory access patterns can be exploited as a side-channel to obtain confidential data. Oblivious RAM (ORAM) is a secure cryptographic construct that effectively thwarts access-pattern-based attacks. How...
Whereas emerging non-volatile memories (NVMs) are low power, dense, scalable alternatives to DRAM... more Whereas emerging non-volatile memories (NVMs) are low power, dense, scalable alternatives to DRAM, the high latency and low endurance of these NVMs limit the feasibility of NVM-only memory systems. Smart hybrid memories (SHMs) that integrate NVM, DRAM, and on-module processor logic are an efficient means to bridge the latency and endurance gaps between NVM-only and DRAM-only memory systems. However, these SHMs are vulnerable to data confidentiality and integrity attacks that can be executed on the unsecure NVM, DRAM, and/or memory buses. STASH is the first comprehensive end-to-end SecuriTy Architecture for SHMs that integrates (i) counter mode encryption for data confidentiality, (ii) low overhead page-level Merkle Tree (MT) authentication for data integrity, (iii) recovery-compatible MT updates to withstand power/system failures, and (iv) page-migration-friendly security meta-data management. For security guarantees equivalent to the closest state-of-the-art security solution extensible to SHMs, STASH reduces memory overhead by 12.7×, increases system performance by 65%, and improves NVM lifetime by 2.5×.
Deep Neural Network (DNN) inference is emerging as the fundamental bedrock for a multitude of uti... more Deep Neural Network (DNN) inference is emerging as the fundamental bedrock for a multitude of utilities and services. CPUs continue to scale up their raw compute capabilities for DNN inference along with mature high performance libraries to extract optimal performance. While general purpose CPUs offer unique attractive advantages for DNN inference at both datacenter and edge, they have primarily evolved to optimize single thread performance. For highly parallel, throughput-oriented DNN inference, this results in inefficiencies in both power and performance, impacting both raw performance scaling and overall performance/watt. We present Proximu$\$$, where we systematically tackle the root inefficiencies in power and performance scaling for CPU DNN inference. Performance scales efficiently by distributing light-weight tensor compute near all caches in a multi-level cache hierarchy. This maximizes the cumulative utilization of the existing bandwidth resources in the system and minimize...
2012 World Congress on Information and Communication Technologies, 2012
Abstract In this paper a global heuristic search evolutionary optimization technique called crazi... more Abstract In this paper a global heuristic search evolutionary optimization technique called craziness based particle swarm optimization (CRPSO) is used for the design of 8th order infinite impulse response (IIR) band stop (BS) digital filter. The proposed CRPSO based approach has closely mimicked the particle&amp;amp;amp;#x27;s behaviour in a swarm which results in better exploration and exploitation in multidimensional search space. Performance of the proposed optimization technique is compared with some well accepted evolutionary algorithms such ...
Data tampering threatens data integrity in emerging non-volatile memories (NVMs). Whereas Merkle ... more Data tampering threatens data integrity in emerging non-volatile memories (NVMs). Whereas Merkle Tree (MT) memory authentication is effective in thwarting data tampering attacks, it drastically increases cell writes and memory accesses, adversely impact-ing NVM energy, lifetime, and system performance (instructions per cycle (IPC)). We propose ASSURE, a low overhead, high performance Authentication Scheme for SecURE energy efficient (ASSURE) NVMs. ASSURE synergistically integrates (i) smart message authentication codes (SMACs), which eliminate redundant cell writes by enabling MAC computation of only modified words on memory writes, with (ii) multi-root MTs (MMTs), which reduce MT reads/writes by constructing either high performance static MMTs (SMMTs) or low overhead dynamic MMTs (DMMTs) over frequently accessed memory regions. Our full-system simulations of the SPEC CPU2006 benchmarks on a triple-level cell (TLC) re-sistive RAM (RRAM) architecture show that on average, SMMT ASSURE (DMMT ASSURE) reduces NVM energy by 59% (55%), increases memory lifetime by 2.36× (2.11×), and improves IPC by 11% (10%), over state-of-the-art MT memory authentication.
Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern... more Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern main memories. Oblivious RAM (ORAM) is an effective cryptographic primitive developed to thwart access-pattern-based attacks in DRAM-based systems. However, in emerging non-volatile memory (NVM) systems, the increased writes due to encryption of multiple data blocks on every Path ORAM (state-of-the-art efficient ORAM) access impose significant energy, lifetime, and performance overheads. LEO (Low overhead Encryption ORAM) is an efficient Path ORAM encryption architecture that addresses the high write overheads of ORAM integration in NVMs, while providing security equivalent to the baseline Path ORAM. LEO reduces NVM cell writes by securely decreasing the number of block encryptions during the write phase of a Path ORAM access. LEO uses a secure, two-level counter mode encryption framework that opportunistically eliminates re-encryption of unmodified blocks, reducing NVM writes. Our evaluations show that on average, LEO decreases NVM energy by 60%, improves lifetime by 1.51×, and increases performance by 9% over the baseline Path ORAM.
Monolayer heterojunction FETs based on vertical heterogeneous transition metal dichalcogenides (T... more Monolayer heterojunction FETs based on vertical heterogeneous transition metal dichalcogenides (TMD-CFETs) and planar black phosphorus FETs (BPFETs) have demonstrated excellent subthreshold swing, high I_ON/I_OFF , and high scalability, making them attractive candidates for post-CMOS memory design. This article explores TMDCFET and BPFET SRAM design by combining atomistic self-consistent device modeling with SRAM circuit design and simulation. We perform detailed evaluations of the TMDCFET/BPFET SRAMs at a single bitcell and at SRAM array level. Our simulations show that at low operating voltages, TMDCFET/BPFET SRAMs exhibit significant advantages in static power, dynamic read/write noise margin, and read/write delay over nominal 16nm CMOS SRAMs at both bitcell and array-level implementations. We also analyze the effect of process variations on the performance of TMDCFET/BPFET SRAMs. Our simulations demonstrate that TMDCFET/BPFET SRAMs exhibit high tolerance to process variations, which is desirable for low operating voltages. ACM Reference Format: Joydeep Rakshit, Kartik Mohanram, Runlai Wan, Kai Tak Lam, and Jing Guo. 2017. Monolayer transistor SRAMs: Toward low-power, denser memory systems.
Data persistence in emerging non-volatile memories (NVMs) poses a multitude of security vulnerabi... more Data persistence in emerging non-volatile memories (NVMs) poses a multitude of security vulnerabilities, motivating main memory encryption for data security. However, practical encryption algorithms demonstrate strong diffusion characteristics that increase cell flips, resulting in increased write energy/latency and reduced lifetime of NVMs. State-of-the-art security solutions have focused on reducing the encryption penalty (increased write energy/latency and reduced memory lifetime) in single-level cell (SLC) NVMs; however, the realization of low encryption penalty solutions for multi-/triple-level cell (MLC/TLC) secure NVMs remains an open area of research. This work synergistically integrates zero-based partial writes with XOR-based energy masking to realize Smartly EnCRypted Energy efficienT, i.e., SECRET MLC/TLC NVMs, without compromising the security of the underlying encryption technique. Our simulations on an MLC (TLC) resistive RAM (RRAM) architecture across SPEC CPU2006 workloads demonstrate that for 6.25% (7.84%) memory overhead, SECRET reduces write energy by 80% (63%), latency by 37% (49%), and improves memory lifetime by 63% (56%) over conventional advanced encryption standard-based (AES-based) counter mode encryption.
2012 International Conference on Communications, Devices and Intelligent Systems (CODIS), 2012
Abstract In order to accurately model high frequency effects inductance had been taken into consi... more Abstract In order to accurately model high frequency effects inductance had been taken into consideration. Earlier only the delay caused due to the presence of gates was considered to be an important issue, but now with decreasing feature size and increasing complexity, on chip interconnect delay has acquired prominence for incremental performance-driven layout synthesis. We develop a novel analytical delay model, for RLCG interconnect lines that in addition to preserving the effectiveness of previous RLCG interconnect models, improves ...
Proceedings of the 52nd Annual Design Automation Conference, 2015
Vertical monolayer heterojunction FETs based on transition metal dichalcogenides (TMDCFETs) and p... more Vertical monolayer heterojunction FETs based on transition metal dichalcogenides (TMDCFETs) and planar black phosphorus FETs (BPFETs) have demonstrated excellent sub-threshold swing, high ION/IOFF, and high scalability, making them attractive candidates for post-CMOS memory design. This paper explores TMDCFET and BPFET SRAM design by combining atomistic self-consistent device modeling with SRAM circuit design and simulation. Our simulations show that at low operating voltages, TMDCFET and BPFET SRAMs exhibit significant advantages in static power, dynamic read/write noise margin, and read/write delay over both nominal and read/write-assisted 16nm CMOS SRAMs.
Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern... more Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern main memories. Oblivious RAM (ORAM) is an effective cryptographic primitive developed to thwart access-pattern-based attacks in DRAM-based systems. However, in emerging non-volatile memory (NVM) systems, the increased writes due to encryption of multiple data blocks on every Path ORAM (state-of-the-art efficient ORAM) access impose significant energy, lifetime, and performance overheads. LEO (<bold><italic><underline>L</underline>ow overhead <underline>E </underline>ncryption <underline>O</underline>RAM</italic></bold>) is an efficient Path ORAM encryption architecture that addresses the high write overheads of ORAM integration in NVMs, while providing security equivalent to the baseline Path ORAM. LEO reduces NVM cell writes by securely decreasing the number of block encryptions during the write phase of a Path ORAM access. LEO uses a secure, two-level counter mode encryption framework that opportunistically eliminates re-encryption of unmodified blocks, reducing NVM writes. Our evaluations show that on average, LEO decreases NVM energy by 60 percent, improves lifetime by 1.51<inline-formula><tex-math notation="LaTeX"> $\times$</tex-math><alternatives><inline-graphic xlink:href="rakshit-ieq1-2795621.gif"/></alternatives></inline-formula> , and increases performance by 9 percent over the baseline Path ORAM.
2018 IEEE 36th International Conference on Computer Design (ICCD), 2018
Modern memory systems are susceptible to data confidentiality attacks that leverage memory access... more Modern memory systems are susceptible to data confidentiality attacks that leverage memory access pattern information to obtain secret data. Oblivious RAM (ORAM) is a secure cryptographic construct that effectively thwarts access-pattern-based attacks. However, in Path ORAM (state-of-the-art efficient ORAM for main memories) and its variants, each memory request (read or write) is transformed to an ORAM access, which is a sequence of read and write operations, increasing the latency of the memory requests and degrading system performance. In practice, the ORAM access for a read request is on the critical path of program execution, blocked by ORAM accesses for older write requests. Although modern memory controllers (MCs) realize read prioritization through write buffering, the ORAM access translation of each memory request to multiple memory read and write operations results in frequent MC write buffer overflow, decreasing its efficiency. ReadPRO (Read Prioritization) scheduling in ...
Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving ... more Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizing general-purpose single thread performance, and is overkill for relatively simpler, but still hugely important, data parallel DNN inference workloads. Addressing this disparity efficiently can enable both raw performance scaling and overall performance/Watt improvements for multi-core CPU DNN inference. We present REDUCT, where we build innovative solutions that bypass traditional CPU resources which impact DNN inference power and limit its performance. Fundamentally, REDUCT’s “Keep it close” policy enables consecutive pieces of work to be executed close to each other. REDUCT enables instruction delivery...
Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving ... more Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizing general-purpose single thread performance, and is overkill for relatively simpler, but still hugely important, data parallel DNN inference workloads. Addressing this disparity efficiently can enable both raw performance scaling and overall performance/Watt improvements for multi-core CPU DNN inference.We present REDUCT, where we build innovative solutions that bypass traditional CPU resources which impact DNN inference power and limit its performance. Fundamentally, REDUCT’s "Keep it close" policy enables consecutive pieces of work to be executed close to each other. REDUCT enables instruction...
Scaling off-chip bandwidth is challenging due to fundamental limitations, such as a fixed pin cou... more Scaling off-chip bandwidth is challenging due to fundamental limitations, such as a fixed pin count and plateauing signaling rates. Recently, vendors have turned to 2.5D and 3D stacking to closely integrate system components. Interestingly, these technologies can integrate a logic layer under multiple memory dies, enabling computing capability inside a memory stack. This trend in stacking is making PIM architectures commercially viable. In this work, we investigate the suitability of offloading kernels in scientific applications onto 3D stacked PIM architectures. We evaluate several hardware constraints resulted from the stacked structure. We perform extensive simulation experiments and in-depth analysis to quantify the impact of application locality in TLBs, data caches, and memory stacks. Our results also identify design optimization areas in software and hardware for HPC scientific applications.
The wide adoption of cloud computing has established integrity and confidentiality of data in mem... more The wide adoption of cloud computing has established integrity and confidentiality of data in memory as a first order design concern in modern computing systems. Data integrity is ensured by Merkle Tree (MT) memory authentication. However, in the context of emerging non-volatile memories (NVMs), the MT memory authentication related increase in cell writes and memory accesses impose significant energy, lifetime, and performance overheads. This dissertation presents ASSURE, an Authentication Scheme for SecURE (ASSURE) energy efficient NVMs. ASSURE integrates (i) smart message authentication codes with (ii) multi-root MTs to decrease MT reads and writes, while also reducing the number of cell writes on each MT write. Whereas data confidentiality is effectively ensured by encryption, the memory access patterns can be exploited as a side-channel to obtain confidential data. Oblivious RAM (ORAM) is a secure cryptographic construct that effectively thwarts access-pattern-based attacks. How...
Whereas emerging non-volatile memories (NVMs) are low power, dense, scalable alternatives to DRAM... more Whereas emerging non-volatile memories (NVMs) are low power, dense, scalable alternatives to DRAM, the high latency and low endurance of these NVMs limit the feasibility of NVM-only memory systems. Smart hybrid memories (SHMs) that integrate NVM, DRAM, and on-module processor logic are an efficient means to bridge the latency and endurance gaps between NVM-only and DRAM-only memory systems. However, these SHMs are vulnerable to data confidentiality and integrity attacks that can be executed on the unsecure NVM, DRAM, and/or memory buses. STASH is the first comprehensive end-to-end SecuriTy Architecture for SHMs that integrates (i) counter mode encryption for data confidentiality, (ii) low overhead page-level Merkle Tree (MT) authentication for data integrity, (iii) recovery-compatible MT updates to withstand power/system failures, and (iv) page-migration-friendly security meta-data management. For security guarantees equivalent to the closest state-of-the-art security solution extensible to SHMs, STASH reduces memory overhead by 12.7×, increases system performance by 65%, and improves NVM lifetime by 2.5×.
Deep Neural Network (DNN) inference is emerging as the fundamental bedrock for a multitude of uti... more Deep Neural Network (DNN) inference is emerging as the fundamental bedrock for a multitude of utilities and services. CPUs continue to scale up their raw compute capabilities for DNN inference along with mature high performance libraries to extract optimal performance. While general purpose CPUs offer unique attractive advantages for DNN inference at both datacenter and edge, they have primarily evolved to optimize single thread performance. For highly parallel, throughput-oriented DNN inference, this results in inefficiencies in both power and performance, impacting both raw performance scaling and overall performance/watt. We present Proximu$\$$, where we systematically tackle the root inefficiencies in power and performance scaling for CPU DNN inference. Performance scales efficiently by distributing light-weight tensor compute near all caches in a multi-level cache hierarchy. This maximizes the cumulative utilization of the existing bandwidth resources in the system and minimize...
2012 World Congress on Information and Communication Technologies, 2012
Abstract In this paper a global heuristic search evolutionary optimization technique called crazi... more Abstract In this paper a global heuristic search evolutionary optimization technique called craziness based particle swarm optimization (CRPSO) is used for the design of 8th order infinite impulse response (IIR) band stop (BS) digital filter. The proposed CRPSO based approach has closely mimicked the particle&amp;amp;amp;#x27;s behaviour in a swarm which results in better exploration and exploitation in multidimensional search space. Performance of the proposed optimization technique is compared with some well accepted evolutionary algorithms such ...
Data tampering threatens data integrity in emerging non-volatile memories (NVMs). Whereas Merkle ... more Data tampering threatens data integrity in emerging non-volatile memories (NVMs). Whereas Merkle Tree (MT) memory authentication is effective in thwarting data tampering attacks, it drastically increases cell writes and memory accesses, adversely impact-ing NVM energy, lifetime, and system performance (instructions per cycle (IPC)). We propose ASSURE, a low overhead, high performance Authentication Scheme for SecURE energy efficient (ASSURE) NVMs. ASSURE synergistically integrates (i) smart message authentication codes (SMACs), which eliminate redundant cell writes by enabling MAC computation of only modified words on memory writes, with (ii) multi-root MTs (MMTs), which reduce MT reads/writes by constructing either high performance static MMTs (SMMTs) or low overhead dynamic MMTs (DMMTs) over frequently accessed memory regions. Our full-system simulations of the SPEC CPU2006 benchmarks on a triple-level cell (TLC) re-sistive RAM (RRAM) architecture show that on average, SMMT ASSURE (DMMT ASSURE) reduces NVM energy by 59% (55%), increases memory lifetime by 2.36× (2.11×), and improves IPC by 11% (10%), over state-of-the-art MT memory authentication.
Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern... more Data confidentiality attacks utilizing memory access patterns threaten exposure of data in modern main memories. Oblivious RAM (ORAM) is an effective cryptographic primitive developed to thwart access-pattern-based attacks in DRAM-based systems. However, in emerging non-volatile memory (NVM) systems, the increased writes due to encryption of multiple data blocks on every Path ORAM (state-of-the-art efficient ORAM) access impose significant energy, lifetime, and performance overheads. LEO (Low overhead Encryption ORAM) is an efficient Path ORAM encryption architecture that addresses the high write overheads of ORAM integration in NVMs, while providing security equivalent to the baseline Path ORAM. LEO reduces NVM cell writes by securely decreasing the number of block encryptions during the write phase of a Path ORAM access. LEO uses a secure, two-level counter mode encryption framework that opportunistically eliminates re-encryption of unmodified blocks, reducing NVM writes. Our evaluations show that on average, LEO decreases NVM energy by 60%, improves lifetime by 1.51×, and increases performance by 9% over the baseline Path ORAM.
Monolayer heterojunction FETs based on vertical heterogeneous transition metal dichalcogenides (T... more Monolayer heterojunction FETs based on vertical heterogeneous transition metal dichalcogenides (TMD-CFETs) and planar black phosphorus FETs (BPFETs) have demonstrated excellent subthreshold swing, high I_ON/I_OFF , and high scalability, making them attractive candidates for post-CMOS memory design. This article explores TMDCFET and BPFET SRAM design by combining atomistic self-consistent device modeling with SRAM circuit design and simulation. We perform detailed evaluations of the TMDCFET/BPFET SRAMs at a single bitcell and at SRAM array level. Our simulations show that at low operating voltages, TMDCFET/BPFET SRAMs exhibit significant advantages in static power, dynamic read/write noise margin, and read/write delay over nominal 16nm CMOS SRAMs at both bitcell and array-level implementations. We also analyze the effect of process variations on the performance of TMDCFET/BPFET SRAMs. Our simulations demonstrate that TMDCFET/BPFET SRAMs exhibit high tolerance to process variations, which is desirable for low operating voltages. ACM Reference Format: Joydeep Rakshit, Kartik Mohanram, Runlai Wan, Kai Tak Lam, and Jing Guo. 2017. Monolayer transistor SRAMs: Toward low-power, denser memory systems.
Data persistence in emerging non-volatile memories (NVMs) poses a multitude of security vulnerabi... more Data persistence in emerging non-volatile memories (NVMs) poses a multitude of security vulnerabilities, motivating main memory encryption for data security. However, practical encryption algorithms demonstrate strong diffusion characteristics that increase cell flips, resulting in increased write energy/latency and reduced lifetime of NVMs. State-of-the-art security solutions have focused on reducing the encryption penalty (increased write energy/latency and reduced memory lifetime) in single-level cell (SLC) NVMs; however, the realization of low encryption penalty solutions for multi-/triple-level cell (MLC/TLC) secure NVMs remains an open area of research. This work synergistically integrates zero-based partial writes with XOR-based energy masking to realize Smartly EnCRypted Energy efficienT, i.e., SECRET MLC/TLC NVMs, without compromising the security of the underlying encryption technique. Our simulations on an MLC (TLC) resistive RAM (RRAM) architecture across SPEC CPU2006 workloads demonstrate that for 6.25% (7.84%) memory overhead, SECRET reduces write energy by 80% (63%), latency by 37% (49%), and improves memory lifetime by 63% (56%) over conventional advanced encryption standard-based (AES-based) counter mode encryption.
Uploads
Papers by Joydeep Rakshit