A fast half-subtractor using 8T static random access memory for in-memory computation
A fast half-subtractor using 8T static random access memory for in-memory computation
Corresponding Author:
Nagaraja Shylashree
Department of Electronics and Communications, RV College of Engineering
Bangalore 560059, Karnataka, India
Email: shylashreen@rvce.edu.in
1. INTRODUCTION
The Von-Neumann architecture based computing systems which are being used at present [1] has
the arithmetic and logic unit (ALU) unit and the memory unit placed far from each other leading to high
power consumption during the transfer of data from memory unit to the ALU unit and this can be shown in
the Figure 1. To overcome the Von-Neumann bottleneck, computation in memory (CIM) has been
introduced. This system has ALU and memory unit nearby and the power consumed will be reduced [2]-[4].
The computation in-memory architecture is shown in the Figure 2. This structure will conserve energy used
during the transfer of data. Numerous paperscould be observed in the literature, that centers its observation on
complementary metal–oxide–semiconductor (CMOS) based static random access memory (SRAM) for CIM
technology. 6T SRAMs have been used to implement the CIM methodology [5]. But the 6T SRAM has the
disadvantage of read distribution failure [6]. To overcome the read disturbance failure, 8T SRAM cells have
been used effectively [7]. Boolean functions such as NOR and NAND are implemented and shown [8], [9].
Similarly, using these approaches, a fast adder has been implemented where the number of logic gates has
been reducedto reduce the delay of the circuit [10].
The different sensing schemes have been proposed but the latch-based sense amplifier has proven to
be the best one for the detection of small changes in the bit line of 8T SRAM [11]. 8T SRAMs were also
used in computation of dot product [12]. Some of the works also discussed on the spin-transfer torque
magnetic RAM and its computation architecture [13]. Prospects of this new emerging technology has been
discussed in few papers [14]. Even though the noise margin of 6T SRAM cells were better, 8T SRAMs were
considered because of the presence of separate paths for read and write [15].
The contributions made by this proposed work primarily are a fast subtractor has been implemented
with the SRAM array for in memory computation which results in reduction in the delay as well as power
consumed after the usage of transmission based read circuitry in the 8T SRAM cell. The paper is organized
as: section 2 consists of 8T SRAM which are conventional and transistor based. The section 3 contains the
mapping of half subtractor NOR circuit into the SRAM array and its implementation. The section 4 discusses
the results and analysis while the section 5 contains conclusion and future scope.
2. METHOD
The complete method is divided into two sections. Section 2.1 details the implementation of 8T
SRAM and shows the read and write path along with the sense amplifier design. Section 2.2 explains the
implementation 2×5 SRAM memory array equipped with a half subtractor. NOR netlist used for the half-
subtractor, which consists of five NOR gates and subtracts two 1-bit binary numbers (A, B) and provides
output as two 1-bit binary numbers (difference and borrow). The conventional design used pass transistors for
the read circuitry [8]. The read circuitry has been replaced with transmission gate in the proposed work. The
half subtractor is realized using NOR gates and those gates were mapped into the SRAM array to get the
arithmetic circuit and CIM cell co-design. Then the simulation was carried out to compare the obtained results
with the conventional design. The circuit implementation for functionality, delay and power analysis is done
using the CADENCE virtuoso tool in 180 nm CMOS technology.
Int J Reconfigurable & Embedded Syst, Vol. 14, No. 1, March 2025: 273-281
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 275
The write word line is made high and the VDD must be given to word bit line and word bit line bar
must be provided with ground. This will store 1 at Q and 0 at Qbar of the SRAM memory unit. Bit 0 can be
written in the same way by reversing the terminals. Bit line will be recharged before the commencement of
read operation. The transistors which are present in the read circuitry will be used during the read
operation [3], [16]-[24].
The read word line must be ON to be able to get the value at the read bit line output. Now if the read
circuitry is replaced by a transmission gate, then it reduces the total area occupancy along with the significant
delay reduction during the read operation. When the word line of the read circuitry is ON, only then
the transmission gate is activated resulting in the reduction of power and delay produced by the circuit. The
Figure 4 shows the design of the 8T SRAM cell whose read circuitry is replaced by transmission gate. To
detect even the smallest difference in the bit line voltage read from the SRAM, sense amplifiers are used. On
comparing the results and speed of the sense amplifiers, latch-based sense amplifier is best suited for the
proposed design [25]. There is a positive feedback present in the sense amplifier which is latch based. Due to
the presence of positive feedback in latch-based sense amplifiers, they are very fast. The latch-based sense
amplifier design isshown in the Figure 5.
The INN input in the sense amplifier must be connected to the reference voltage (Vref) which must be
at its maximum that is VDD when the input given is (11), Vref should be half of VDD if the input combination
given as (01) or (10). If the input is (00), the Vref should be minimum. The INP terminal mentioned in the
Latch-based sense amplifier is connected to the bit line (RQ) of the SRAM shown in the Figure 2. The output
is taken out from the terminal SO and SON terminal gives the compliment of the SO output. Five such sense
amplifiers are used.
The half-subtractor can be realized using NOR gates only as shown in the Figure 6. The half-
subtractor netlist will be mapped into the SRAM array to perform the arithmetic circuit and memory cell co-
design. In an 8T SRAM cell, there is 6T SRAM cell and an extra read port consisting of transistors P3 and
N5 as in Figure 4. The write operation is similar to the 6T SRAM cell but the read operation is different. When
the read operation has to occur, RL must be kept high while keeping the WWL low. Cells A and B from the
Figure 7 represents 6T SRAM cell part and the extra transmission gate-based read circuitry is connected to
have read operation and the output of the bits read from it will be reflected on the RBL line. When both the
inputs were given as logic ‘0’. Cells A and B stored ‘0’ in their memory unit. During the operation, RBL is
precharged to VDD. Since both the inputs are ‘0’, the RBL will have no change and retains the VDD voltage
which is sense by the sense amplifier and the output was observed to be 1. If any one of the Cells contents
were changed from ‘0’ to ‘1’, the RBL starts discharging and the output observed at the sense amplifier was
‘0’. Hence, it can be inferred that as a result, NOR can be implemented in the 8T bit-cell with minimal
overhead if both of the read word lines are activated simultaneously.
Int J Reconfigurable & Embedded Syst, Vol. 14, No. 1, March 2025: 273-281
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 277
Using the designs and approach shown in Figures 6 and 7, complete circuit was implemented as
shown in the Figure 8. The two bits A and B gets stored in the cells. RBL lines are precharged to V DD through
the precharge transistors. Each of the NOR gates are mapped into the two row, 5 Column SRAM array.
Since serial mapping of each NOR gate into a separate column has been done and each NOR gate has a
different delay, the precharge circuit was controlled to produce the desired half subtractor result. Since each
NOR gate has a different delay, the RBL of the subsequent column precharges when the input from the
previous column becomes available. Then, to obtain the half-subtractor result for all input combinations,
simulation of the NOR netlist mapped SRAM array was carried out in Cadence virtuoso tool using 180 nm
CMOS technology.
To obtain the half-subtractor result for all input combinations, simulation of the NOR netlist mapped
SRAM array was carried out in Cadence virtuoso tool using 180 nm CMOS technology. The output of the
sense amplifiers 2 and 5 gives out the required outputs of half subtractor, borrow, and difference,
respectively. The integration of half subtractor and SRAM array optimizes the overall performance, making
the CIM subtractor well-suited for high-speed, low-power computational tasks.
From the response, it can be observed that the half subtractor output waveform was achieved. A and
B represents the inputs stored in SRAM cells A and B. The corresponding difference and borrow outputs
were shown. There exists some delay during both the read and write operation. The conventional 8T SRAM
CIM design as proposed in the paper [7] shows that the read delay, power and energy consumed by it are
found to be 9.756 ps, 417.4 nW, and 4.069 zj, respectively. The same as computed with the proposed
Int J Reconfigurable & Embedded Syst, Vol. 14, No. 1, March 2025: 273-281
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 279
methodology gives out the results as 4.605 ps, 406.5 nW, and 1.871 zj. The readings are tabulated in the
Tables 1 and 2. Similarly, if we consider the write operation, the write delay, power and energy consumed by
it are found to be 62.85 ps, 1.579 nW, and 0.099 zj when it comes to existing design [7]. The proposed design
gives out the write delay, power and energy consumed as 60.07 ps, 1.459 nW, and 0.088 zj. On comparison,
the proposed design had a noticeable reduction in the read operation rather than write operation. This is
mainly due to the usage oftransmission gate based read circuitry in the 8T SRAM design.
Additionally, static noise margin (SNM) of the transmission gate-based SRAM cell has 209.585 mv
whereas the conventional SRAM has the SNM as 192.79 mv. That means 9% increase in the noise margin.
Greater the noise margin, better will be the design. Hence the half subtractor was implemented and it can be
used in the computing system as ALU and memory array both, so it has wider application in the computing
system. The transmission gate based read circuitry incorporated with the CIM will significantly reduce the
delay. Optimizing the power and delay improves overall efficiency, allowing the system to perform more
effectively while consuming less energy. These factors together enhance the design’s performance, making it
faster and more efficient.
4. CONCLUSION
In the field of very large scale integration (VLSI) design, power, and delay optimization are
fundamental challenges that drive both innovation and practical implementations, especially in the
development of energy-efficient and compact devices. The choice of state of art transmission gate based read
circuitry for 8T SRAM cell has enhanced the performance in of computation in-memory in terms of both area
and delay. It was discovered that the latch-based sense amplifies worked well for detecting and amplifying
the little variations in the bit line signal. The finest sense amplifiers in terms of speed turned out to be latch-
based. It makes the amplifier faster since it has significant positive feedback. The complete half-subtractor
functionality has been realized using NOR gates completely and those are mapped using the SRAM array to
get its functionality for in-memory computation. This design has been implemented using Cadence Virtuoso
in 180 nm technology. It has been observed that the delay produced by the proposed work is reduced by 53%
when compared to conventional design during read operation. Similarly, the delay has been reduced by 4.4%
during write operation. The power consumed by the circuit has also been reduced by 3% and 8.6% during
read and write operation. Further the design can be extended to other higher circuits such as multipliers,
dividers, and other combinational circuits which can be realized using CIM and ultimately, the complete
Von-Neumann architecture can be replaced by the in-memory computation architecture.
REFERENCES
[1] S. Jeloka, N. B. Akesh, D. Sylvester, and D. Blaauw, “A 28 nm configurable memory (TCAM/BCAM/SRAM) using push-rule 6T
Bit Cell enabling logic-in-memory,” IEEE Journal of Solid-State Circuits, vol. 51, no. 4, pp. 1009–1021, Apr. 2016, doi:
10.1109/JSSC.2016.2515510.
[2] N. Shylashree, Y. D. Vahvale, N. Praveena, and A. S. Mamatha, “Design and implementation of 64-bit SRAM and CAM on
Cadence and open-source environment,” International Journal of Circuits, Systems and Signal Processing, vol. 15, pp. 586–594,
Jul. 2021, doi: 10.46300/9106.2021.15.65.
[3] P. P. Ravichandiran and P. D. Franzon, “A review of 3D-dynamic random-access memory based near-memory computation,” in
2021 IEEE International 3D Systems Integration Conference (3DIC), IEEE, Oct. 2021, pp. 1–6. doi:
10.1109/3DIC52383.2021.9687615.
[4] A. Bhaskar, “Design and analysis of low power SRAM cells,” in 2017 Innovations in Power and Advanced Computing
Technologies (i-PACT), IEEE, Apr. 2017, pp. 1–5. doi: 10.1109/IPACT.2017.8244888.
[5] J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classifier in a standard 6T SRAM array,”
IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915–924, Apr. 2017, doi: 10.1109/JSSC.2016.2642198.
A fast half-subtractor using 8T static random access memory for in-memory … (Deepika Prabhakar)
280 ISSN: 2089-4864
[6] V. Aswini, S. Musala, and A. Srinivasulu, “Transmission gate-based 8T SRAM cell for biomedical applications,” in 2021 12th
International Symposium on Advanced Topics in Electrical Engineering (ATEE), IEEE, Mar. 2021, pp. 1–7. doi:
10.1109/ATEE52255.2021.9425314.
[7] A. K. Rajput and M. Pattanaik, “Implementation of boolean and arithmetic functions with 8T SRAM cell for In-memory
computation,” in 2020 International Conference for Emerging Technology (INCET), IEEE, Jun. 2020, pp. 1–5. doi:
10.1109/INCET49848.2020.9154137.
[8] J. Han and Y. Kim, “A fast half adder using 8T SRAM for computation-in-memory,” in 2021 IEEE International Conference on
Consumer Electronics-Asia, ICCE-Asia 2021, 2021, doi: 10.1109/ICCE-Asia53811.2021.9641964.
[9] N. Verma et al., “In-memory computing: advances and prospects,” IEEE Solid-State Circuits Magazine, vol. 11, no. 3, pp. 43–55,
2019, doi: 10.1109/MSSC.2019.2922889.
[10] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, “Yield and speed optimization of a latch-type voltage sense amplifier,” IEEE
Journal of Solid-State Circuits, vol. 39, no. 7, pp. 1148–1158, Jul. 2004, doi: 10.1109/JSSC.2004.829399.
[11] A. Jaiswal, I. Chakraborty, A. Agrawal and K. Roy, “8T SRAM Cell as a multibit dot-product engine for beyond Von Neumann
computing,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 11, pp. 2556-2567, Nov. 2019, doi:
10.1109/TVLSI.2019.2929245.
[12] S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, “Computing in memory with spin-transfer torque magnetic RAM,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 3, pp. 470–483, Mar. 2018, doi:
10.1109/TVLSI.2017.2776954.
[13] A. Agrawal, A. Jaiswal, C. Lee, and K. Roy, “X-SRAM: enabling in-memory boolean computations in CMOS static random
access memories,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 12, pp. 4219–4232, Dec. 2018, doi:
10.1109/TCSI.2018.2848999.
[14] J. Mu and B. Kim, “A 65nm logic-compatible embedded and flash memory for in-memory computation of artificial neural
networks,” in 2020 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, Oct. 2020, pp. 1–4. doi:
10.1109/ISCAS45731.2020.9181104.
[15] A. Manna and V. S. K. Bhaaskaran, “Improved read noise margin characteristics for single bit line SRAM cell using adiabatically
operated word line,” in 2017 International Conference on Nextgen Electronic Technologies: Silicon to Software (ICNETS2),
IEEE, Mar. 2017, pp. 385–393. doi: 10.1109/ICNETS2.2017.8067965.
[16] L. Ammoura, M. L. Flottes, P. Girard, and A. Virazel, “Preliminary defect analysis of 8T SRAM cells for in-memory computing
architectures,” in 2021 16th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS),
IEEE, Jun. 2021, pp. 1–4. doi: 10.1109/DTIS53253.2021.9505101.
[17] M. Kutila, A. Paasio, and T. Lehtonen, “Comparison of 130 nm technology 6T and 8T SRAM cell designs for Near-Threshold
operation,” in 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS), IEEE, Aug. 2014, pp.
925–928. doi: 10.1109/MWSCAS.2014.6908567.
[18] W. Choi, J. Park, and G. Kang, “Dynamic stability estimation for latch-type voltage sense amplifier,” in 2014 International SoC
Design Conference (ISOCC), IEEE, Nov. 2014, pp. 218–219. doi: 10.1109/ISOCC.2014.7087614.
[19] P. Athe and S. Dasgupta, “A comparative study of 6T, 8T and 9T decanano SRAM cell,” in 2009 IEEE Symposium on Industrial
Electronics & Applications, IEEE, Oct. 2009, pp. 889–894. doi: 10.1109/ISIEA.2009.5356318.
[20] T. Nguyen, K. Ngo, N. Trinh, B. Bui, L. Tran, and H. Trang, “Efficient TCAM design based on dual port SRAM on FPGA,”
Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 1, pp. 104–112, 2021, doi:
10.11591/ijeecs.v21.i4.pp104-112.
[21] B.-D. Yang and L.-S. Kim, “A low-power SRAM using hierarchical bit line and local sense amplifiers,” IEEE Journal of Solid-
State Circuits, vol. 40, no. 6, pp. 1366–1376, Jun. 2005, doi: 10.1109/JSSC.2005.848032.
[22] Y. Tsiatouhas, Y. Moisiadis, A. Arapoyanni, and A. Chrisanthopoulos, “Comparative study of different current mode sense
amplifiers in submicron CMOS technology,” IEE Proceedings - Circuits, Devices and Systems, vol. 149, no. 3, pp. 154–158, Jun.
2002, doi: 10.1049/ip-cds:20020425.
[23] T. Na, S.-H. Woo, J. Kim, H. Jeong, and S.-O. Jung, “Comparative study of various latch-type sense amplifiers,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 2, pp. 425–429, Feb. 2014, doi:
10.1109/TVLSI.2013.2239320.
[24] J. Mu and B. Kim, “A 65nm logic-compatible embedded and flash memory for in-memory computation of artificial neural
networks,” in 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 2020, pp. 1-4, doi:
10.1109/ISCAS45731.2020.9181104.
[25] A. Agrawal, A. Jaiswal, C. Lee, and K. Roy, “X-SRAM: enabling in-memory boolean computations in CMOS static random
access memories,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 12, pp. 4219-4232, Dec. 2018, doi:
10.1109/TCSI.2018.2848999.
BIOGRAPHIES OF AUTHORS
Int J Reconfigurable & Embedded Syst, Vol. 14, No. 1, March 2025: 273-281
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 281
A fast half-subtractor using 8T static random access memory for in-memory … (Deepika Prabhakar)