Power Optimization of Binary Division Based On FPG
Power Optimization of Binary Division Based On FPG
Power Optimization of Binary Division Based On FPG
Corresponding Author:
Fadi T. Nasser
Department of Electrical Engineering
University of Technology
Baghdad, Iraq
Email: eee.19.23@grad.uotechnology.edu.iq
1. INTRODUCTION
In recent decades, the modern digital system design with low power has been the primary concern of
very large scale integrated (VLSI) designers. Since the VLSI technology was introduced, designers have paid
great attention to cost, area, and speed. Later, the continuous requirements for high performance and low power
digital systems led to an increase in density and a decrease in size of the integrated circuits (IC) under Moor’s
law. Primarily, this law is a global predictor for the growth of the entire semiconductor industry [1], [2].
It can be understood from Moore’s law that the number of transistors in a chip will double every
eighteen months [3]. The increases in the density of IC’s produce a significant increase in power dissipation.
Moreover, the advent of portable systems in recent years, which operate on batteries, has led to longer battery
life. Therefore, in the contemporary era, the VLSI designers aimed to reduce the power dissipation by
creating and using new techniques to reduce the power these systems consumed. Generally, the significant
advantages of power optimization are increased system reliability, battery life efficiency, noise immunity,
lower system cooling and packaging cost, and demand for portable systems [4]. The total power dissipated in
a VLSI circuit is the sum of dynamic power and leakage or static power [5].
The dominant power in digital circuits and systems is dynamic power consumption. The reason to
reduce the dynamic power is due to the ability to apply the dynamic power reduction techniques, and it is
easy to handle the structure of logic elements for the digital circuits and systems. Additionally, it is not
dependent on technology. At the same time, static power is dependent on technology and concerns about the
manufacturing design’s intellectual property (IP), such as the size of the transistor, channel length, and width
of the gate oxide [6]. Therefore, this work focuses on reducing the dynamic power dissipation rather than
static power, which is outside the scope of this paper.
The design of complex digital systems comprises various data processing units and mathematical
operations, such as addition, subtraction, multiplication, and division. This paper is concerned with division,
which is considered the heart of most computational digital systems such as digital signal processing (DSP),
image processing, communication systems, artificial intelligence, quantum computing, and the internet of
things (IoT) [7], [8]. Based on the conversion method, the division algorithms can be classified as follows:
digit recurrence, functional iteration, very high radix, a lookup table (LUT), and variable latency.
Division is the most challenging and complicated operation among all the mathematical operations
because of its sequential operation [9]. Therefore, it is more costly in propagation delay, area, and power
consumption than other mathematical operations. Thus, many studies of division techniques have been
implemented to reduce the power dissipation in divider circuitry at structural and algorithmic levels. In [10],
the authors suggest a 32-bit unsigned integer divider based on a recursive non-restoring division algorithm.
The proposed design achieves an optimization of 82.9% in dynamic-power dissipation relative to the
sequential divider. Hashemi, Bahar and Reda proposes a dynamic, low-power, low-error divider. The design
in the standalone case can save the total power dissipation up to 70% with an average error of 3.08 % [11].
The researchers in [12] propose a hybrid division called Pre-scaling, Series expansion, and Tylor
expansion (PST) division, in which the design consists of three algorithms. The PST design optimises the
dynamic power and total-power dissipation by 67% and 9.7%, respectively, when compared to IP core
division. The implementation of an 8-bit dividend by a 4-bit divisor of Vedic division is proposed by Kishor
and Bhaaskaran in [13].
This design shows a reduction of about 52% in total power dissipation in comparison with
conventional dividers (division). The researchers in [14], propose an 8-bit dividend by a 4-bit divisor of
binary dividers, which saves dynamic power by about 27% in comparison with conventional dividers that use
the repetitive subtraction method. In this design, the Vedic division algorithm is used to eliminate the
recursion, which significantly reduces dynamic power and area overhead. BhanuTej implements a 32-bit
dividend by a 16-bit divisor of binary Vedic division.
This proposed design is applied on 180 nm and 32 nm field-programmable gate array (FPGA)
platforms. The obtained dynamic power and the total power saved are 90% and 86%, respectively, with
respect to traditional division [15]. The above divider designs were verified and implemented using different
platforms, such as FPGA, application-specific integrated circuits (ASIC) and general-purpose processors
(GPP).
In this paper, FPGA is introduced as a platform for implementing the proposed division-algorithm
designs. The following characteristics are the reasons why FPGA was chosen over the other platforms: it is
high density and high performance and does not have costly multicore processors. Moreover, FPGA can
operate in parallelism and obtain orders-of-magnitude speedup, unlike GPP, in which they operate
sequentially [16]-[18]. FPGA also has the advantages of software flexibility, hardware speed, re-
programmability, minimal time to market and is ideal for prototyping designs in which the hardware testing
and verification are performed quickly on the chip. Also, errors in design can be fixed without any extra
hardware costs [19].
The main goal of this work is to decrease the dynamic-power consumption in digit-recurrence
division, mainly for restoring division and non-restoring division algorithms. Consequently, many suggested
systems of dividers are presented, and new optimization techniques are utilized to these suggested systems to
decrease the dynamic power consumption and increase the execution time. The very hardware descriptions
language (VHDL) technique is one of these techniques, this technique reduces the dynamic power by
reducing switching activities. This technique also transforms the design into its basic elements to consume
less power. The number of reduced cycles achieves high-speed performance.
This paper is organized as follows: the second section introduces the sources of power dissipation,
while the third section describes techniques for dynamic power optimization. As a result of this, the fourth
section explains the types of division algorithms used in this work, while section five describes the suggested
system designs in detail. The simulation results are presented and discussed in the sixth section, and the
conclusion is given in the seventh section.
Where 𝑃𝑠𝑎𝑡𝑖𝑡𝑐 is the static power dissipation, 𝐼𝑠𝑡𝑎𝑡𝑖𝑐 is the summation of all seven leakage current
mechanisms, and 𝑉𝐷𝐷 is the power supply voltage.
The other type of power dissipation is dynamic. Dynamic-power dissipation is the power consumed
during a device’s operation. In other words, it is the power consumed when a complementary metal oxide
semiconductor (CMOS) device is in an active mode. This power has three major types. The first is dissipated
power due to charging and discharging the capacitance known as switching power. The second is short-
circuit power, which is the result of a crowbar flowing through a lapse of time when both 𝑃-𝑀𝑂𝑆 (P-channel
MOS) and 𝑁-𝑀𝑂𝑆 (N-channel MOS) transistors are simultaneously turned ON.
The last one is glitching power, which occurs when input signals arrive at different times to a single logic
block causing racing. Therefore, several intermediate transitions occur before the logic block output stabilizes. The
calculation of three types of dynamic power dissipation can be given in the following [18]-[23],
2
𝑃𝑠𝑤𝑖𝑡𝑐ℎ = 𝛼 𝐶𝐿 𝑉𝐷𝐷 𝑓𝑝 (2)
𝛽 3 𝑉𝑡 3
𝑃𝑠.𝑐 = 𝑉 (1 − 2 ) 𝑡𝑟𝑓 𝑓𝑝 (3)
12 𝐷𝐷 𝑉𝐷𝐷
1
Pgltich = CL VDD (VDD − Vt ) (4)
2
The total dynamic power dissipation is given by the (5) [23], [24],
The average power dissipation can be given in the (6) [23], [24],
Indonesian J Elec Eng & Comp Sci, Vol. 24, No. 3, December 2021: 1354 - 1366
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1357
By substituting ((1), (2)-(3)) in (4), the total power dissipation is summarized in the (7) [23], [24],
β 3 Vt 3 1
2
Ptotal = (Ileakage × VDD)+( α CL VDD fp )+( V (1 −2 ) t rf fp )+( CL VDD (VDD − Vt )) (7)
12 DD VDD 2
where 𝑃𝑠𝑤𝑖𝑡𝑐ℎ is the switching power, 𝛼 known as switching activity, 𝐶𝐿 is the total load capacitances of all
transistors, 𝑓𝑝 known as the frequency of input signal, 𝑃𝑆.𝐶 is the short circuit current, 𝛽 is the current gain of the
MOS transistor, and 𝑡𝑟𝑓 is the rising and falling time, 𝑃𝑔𝑙𝑡𝑖𝑐ℎ is the glitching power and 𝑉𝑡ℎ is the threshold voltage.
4. DIVISION ALGORITHM
Arithmetic operations, such as addition, subtraction, multiplication and division, are fundamental
building blocks in digital systems [7]. This paper is focused on the division operation. Division is considered
an essential operation in many digital systems, such as signal processing, rendering systems, artificial
intelligence, graphics compression [32]. Division is the most complicated and the highest-cost arithmetic
operation. Unlike addition and multiplication, division does not possess associative or commutative
properties [7], [33]. This makes it very difficult to implement division in digital systems.
Division algorithms can be classified into five classes: digit recurrence, functional iteration, very
high radix, lookup table and variable latency [32]. This work studies digit-recurrence division. Despite being
the first and oldest division class, digit-recurrence division is characterised by its high accuracy in
comparison to the other classes. It calculates the quotient by iteratively subtracting the divisor from the
dividend until the resulting quantity of the subtraction is less than the divisor.
Two algorithms come under this class: restoring and non-restoring division algorithms. These
iterative algorithms are implemented sequentially and have a long latency, a significant area overhead and
consume a large amount of power compared to the other mathematical operations [34]. Therefore, many
reduction approaches can reduce the number of division cycles, thus reducing power dissipation [35].
The restoring division consists of P, A, B and Q registers, which represent the n-bit accumulator
register, the n-bit dividend register, the n-bit divisor register and the quotient register, respectively. The P and
A registers will be concatenated to be the PA register (in which P is the high content and A is the low
content), taking into account an additional bit that should be concatenated to the most significant bit (MSB)
of the PA register for the shifting operation. The steps of restoring the division algorithm are described as
follows [7], [36]:
1) Initialize the 𝑃 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟 with zeros, the 𝐴 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟 with the value of the dividend, the 𝐵 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟 with
the value of divisor, and the counter is the number of bits in the dividend.
2) Shift the concatenated 𝑃𝐴 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟 one-bit position to the left.
3) Perform the subtraction for the content of the𝐵 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟 from the high content of the 𝑃𝐴 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟.
4) Two cases are obtained from the subtraction :
Case 1: If the result is positive (i.e., the MSB of 𝑃𝐴 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟 = 0), then the 𝑄0 is set to (1).
Case 2: if the result is negative (i.e., the MSB of 𝑃𝐴 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟 = 1), then the 𝑄0 is set to (0).
5) Decrement the counter by one.
6) Repeat the steps from 2 to 5 until the counter become 0.
7) Finally, the quotient will be in the 𝑄 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟, and the remainder will be in the 𝑃 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟.
Indonesian J Elec Eng & Comp Sci, Vol. 24, No. 3, December 2021: 1354 - 1366
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1359
5.2. Restoring division algorithm with external loop based on the VHDL approach (proposed 2)
The proposed design of the restoring division algorithm is implemented using the VHDL code.
The VHDL code is written and verified by the ISE14.7 package and imported to XSG using the Xilinx black
box. In this design, the pulse generator simulink block is used to generate the external loop.
The external loop is considered the external clock used to operate the division operation correctly, as
shown in Figure 3. The flowchart of the proposed design that illustrated Figure 4 shows the procedure of the
restoring division algorithm. The first step of this algorithm is initializing the high content of the A register,
and the content of the Q register with zeros (i.e., A1[31:16]=0 and Q[31:0]=0), and the low content of the A
register and B register with the absolute values of the dividend and the divisor, respectively (i.e.,
A[15:0]=|dvd| and B[15:0]=|dvr|). After the initialization mentioned above, the CLK (clock edge) is checked.
If the CLK - positive edge, then the A register and the Q register are shifted left one bit, the divisor
subtracted from the high content of the A register (i.e., T=A1[31:16]-B[15:0]), and Q[0] is extracted by
taking the complement of the MSB of the T register.
There are two possibilities for Q[0], if Q[0]=1, then restore the content of the T register to the high
content of the A register (i.e., A[31:16]=T), else do nothing and proceed. After the counter is incremented by
one, check the counter if 𝐶 < 64, then repeat the steps of left shifting. Else, proceed to the next step, which is
checking the sign bit. If S=1, then take 2’s complement of the Q register; otherwise, end the algorithm. Due
to the external loop, each step of this algorithm is performed in one cycle. Therefore, the result is obtained
after 64 cycles. The proposed restoring division algorithm flowchart is illustrated in Figure 4.
UFix_5_0
++ a
Bool
ab
b
Counter
Relational
UFix_5_0
0
Constant2 sel
BitBasher
Absolute d1 A
double Fix_16_0
-10 In
Mux
dvd (A)
Dividend
Mux1
Q[0]
sel Sy stem
Shift1 Generator
X << 1 d0
Fix_32_0 Fix_32_0
BitBasher3
d1
A1 A2 b
Fix_16_0 a
b a Fix_32_0 c
BitBasher1 A1 (High)
a b UFix_32_16 UFix_32_16
Fix_16_0 UFix_1_0 UFix_1_0 a d -1 q
a-b b a not c z
double Fix_16_0 UFix_16_0
3 In a |a| b T=(A1 - dvr) BitBasher4
BitBasher2 Inverter Q[0]= ~T[N-1]
dvr (B) Q
Divisor AddSub
Absolute1
UFix_1_0
b a
xor UFix_1_0
BitBasher5 sel
UFix_1_0
b a Sign bit
Fix_33_16 double -3.333
d0 Out
BitBasher6
Logical Gateway Out
Fix_33_16 Display
x(-1) d1
Negate Mux2
Figure 3. Restoring division algorithm with external loop using VHDL approach
5.3. Restoring division algorithm with internal loop based on the VHDL approach (proposed 3)
In this section, the proposed design is based on the same algorithm that used in the earlier design.
The primer difference of this design is the use of the internal loop instead of using the external loop, which
means the pulse generator block is eliminated, and the design will depend only on the internal clock.
The advantage of this internal clock it can perform all the steps of the restoring division algorithm in one
cycle instead of performing each step in one cycle as in the previous design.
Indonesian J Elec Eng & Comp Sci, Vol. 24, No. 3, December 2021: 1354 - 1366
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1361
Figure 5 shows the flowchart of the proposed design. As demonstrated from the figure, the same
procedure of the previous algorithm is applied to this design. But with one difference, which is the decision box for
checking the clock edge, is eliminated. This decision box represents the behavioural operation of the pulse
generator block (the external loop). Instead of that, the left-shifting operation for the registers is directly performed.
Figure 4. Flowchart of the proposed restoring algorithm with external loop based in the VHDL approach
Figure 5. Flowchart of the proposed restoring division algorithm with internal loop based on the VHDL
approach
UFix_5_0
++ a
Bool
ab
b
Counter
Relational
UFix_5_0
0
Constant2
sel
Mux Sy stem
Generator
Shift1 BitBasher2
A1(High) A2(Low) b
X << 1 a
Fix_32_0 Fix_32_0 c
A1 (High)
Fix_16_0
b a
Q[0]
BitBasher1 UFix_1_0
b a a
a
Fix_16_0 sel b UFix_32_16 UFix_32_16
UFix_1_0 a d -1 q
a+b BitBasher3 c z
double Fix_16_0 UFix_16_0 (~a) b
3 In a |a| b Fix_16_0 UFix_1_0
d0 b a b BitBasher5
dvr (B) AddSub Q
Divisor
Q[0]=~A1[2N-1] XOR B[N-1]
Absolute1 BitBasher4
Expression
a d1
Fix_16_0
a-b
b Mux1
AddSub1
UFix_1_0
b a
xor UFix_1_0
BitBasher6 sel
UFix_1_0 Sign bit
b a
Fix_33_16 double -3.333
d0 Out
BitBasher7
Logical Gateway Out
Fix_33_16 Display
x(-1) d1
Negate Mux2
After the register (A) block has been initialised with a dividend value for the content of A2 (low)
and zeros for the content of A1 (high) content, A1 (high) is subtracted from or add to the absolute value of
the dividend by the addsub and the Addsub1 blocks, respectively. The Xilinx Mux1 block is used to select
between the addition or subtraction operation depending on Q[0]. When Q[0] = 0, then the A1 (high) is
added to the divisor. And when Q[0] = 1, then A1 is subtracted.
The output of the Mux1 block, which represents the new content of A1 (high), will be concatenated
with the low content of the register (A) block (i.e. A2 (low)) and then shift the contents of the (A) register
one-bit position to the left. Likewise, the counter is decremented by one. Bitbasher3 and bitbasher4 blocks
are used to slice the MSBs of the register (A) block and divisor, respectively. To achieve the conditions for
the addition or subtraction operation, which will represent the least significant bit of the Q register, Q[0], the
Expression Xilinx block is used for this step of the algorithm. In this step, the value of the register (A) block
is not restored, as in the restoring algorithm, which is the significant difference between the two algorithms.
The Xilinx bitbasher5 block is used to concatenate the Q[0] with the other initialised bits of the Q register
since this operation is represented by shifting the Q register to the left. This algorithm takes 35 cycles to
obtain the final result of the division.
5.5. Non-restoring algorithm with external loop based on the VHDL approach (proposed 5)
The suggested system design of the non-restoring division algorithm is implemented using VHDL
code. The VHDL code is written and verified using the ISE 14.7 package and exported to system generator
by using Xilinx black box block. In this design, a pulse generator is used as the main clock of the system to
run and process the division operation correctly. The procedure of the suggested Non-restoring algorithm
flowchart is shown in Figure 5. In this algorithm, the pulse generator is used in the suggested model of
restoring division mentioned in section (0). The result of this design can be obtained after 32 cycles. This
extensive execution time is due to the counter's count, which is equal to the sum of the divisor and dividend
bits, as well as clock initialization. Alternatively, counter=2×bits of dividend. Where in this case the number
of bits=2×16=32=the number of counts.
Indonesian J Elec Eng & Comp Sci, Vol. 24, No. 3, December 2021: 1354 - 1366
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1363
5.6. Non-restoring algorithm with internal loop based on the VHDL approach (proposed 6)
The VHDL approach is used to implement the non-restoring division in this part. The verification of
the VHDL code is done using the ISE 14.7 package. This algorithm is identical to the previous design, except
the pulse generator is eliminated. The internal clock or internal loop is used to run this division operation.
The demonstration of the suggested non-restoring algorithm is given in Figure 8. As can be noticed from the
flow char, after the step of determining the inputs and initializing the registers, the left shifting operation is
directly performed. Alternatively, the decision box that detects the clock edge in the previous design is
eliminated. Therefore, this step is considered significantly different from the earlier.
Despite the proposed design, non-restoring division with an external loop based on the VHDL approach
has less resource utilization in terms of the number of LUTs (247), slices (159) and I/OB (input/output bounded)
buffers blocks (65), as illustrated in Figure 9. Simultaneously, the design had the highest dynamic-power
dissipation. The high consumption is due to two reasons: first, the design has the highest flip-flops (85), which is
Indonesian J Elec Eng & Comp Sci, Vol. 24, No. 3, December 2021: 1354 - 1366
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1365
the primary reason for the increase in switching activities, causing the dynamic power to dissipate according to the
dynamic-power equation. Second, using an external loop increases the execution time to 64 cycles in this proposed
design; this leads to an increase in critical-path delays, thus increasing the power dissipation.
While the proposed design of non-restoring division with an internal loop based on the VHDL
approach utilises high-logic resources, it consumes less dynamic power than the others. This reduction in
power is due to the VHDL approach, which transforms the suggested system design into basic element
components. Furthermore, the execution time for this proposed division algorithm is less than one cycle. This
means there is no latency (delay), which leads to minimization of the critical paths. Further, the placing and
routing phases handle internal optimization. The switching activities are reduced or eliminated in this design
because there are no flip-flops. This is considered an additional reason why the dynamic power is reduced.
The second comparison of this work is illustrated in Table 2. This table shows the comparisons
between the proposed design, with the best dynamic-power optimization, and the previous works. Different
techniques and approaches to reduce dynamic-power dissipation from the earlier works are presented in this
table. Compared to the previous works, the best result of dynamic-power optimization (93.66%) is obtained
when using an internal-loop division based on the VHDL approach.
7. CONCLUSION
In this work, various division algorithms of 16 bit by 16 bit have been suggested, where two of them
are realized using Hard FPGA blocks, and four designs were executed using the VHDL technique.
The performance analysis regarding to the dynamic power dissipation shows that the suggested system designs
that contain flip-flops have the highest dynamic power. At the same time, the proposed designs with zero flip-
flops and a lot of LUTs and slices have the lowest dynamic power. This reduction is due to decreasing or
eliminating the switching activities in these designs and low critical path delays. From the experimental results,
the efficient technique to obtain the higher power savings is using the VHDL approach, where the power saving
of (93.66%) for the dynamic power is found in the proposed design of non-restoring division with an internal
loop. In which it is the highest dynamic power optimization relative to the previous works.
REFERENCES
[1] V. Natarajan, A. K. Nagarajan, N. Pandian and V. G. Savithri, "Low Power Design Methodology," IntechOpen,
2018, doi: 10.5772/intechopen.73729.
[2] A. P. Kumar, B. Aditya, G. Sony, C. Prasanna and A. Satish, “Estimation of power and delay in CMOS circuits
using LCT,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 14, pp. 990-998,
2019, doi: 10.11591/ijeecs.v14.i2.pp990-998.
[3] G. E. Moore, “Cramming more components onto integrated circuits,” IEEE Solid-State Circuits Soc. Newsl., vol.
11, no. 3, pp. 33-35, 2009, doi: 10.1109/n-ssc.2006.4785860.
[4] S. P. Mohanty, N. Ranganathan, E. Kougianos and P. Patra, "Low-power high-level synthesis for nanoscale CMOS
circuits," Springer Science & Business Media, 2008, doi: 10.1007/978-0-387-76474-0.
[5] A. K. Dwivedi and A. Islam, “Nonvolatile and robust design of content addressable memory cell using magnetic tunnel
junction at nanoscale regime,” IEEE Trans. Magn., vol. 51, no. 12, pp. 1-13, 2015, doi: 10.1109/TMAG.2015.2454477.
[6] T. L. Floyd, "Digital Fundamentals," 11th edition, Pearson Education India, 2010.
[7] U. S. Patankar and A. Koel, “Review of Basic Classes of Dividers Based on Division Algorithm,” IEEE Access,
vol. 9, pp. 23035-23069, 2021, doi: 10.1109/ACCESS.2021.3055735.
[8] T. T. Hasan, M. H. Jasim and I. A. Hashim, “FPGA Design and Hardware Implementation of Heart Disease
Diagnosis System Based on NVG-RAM Classifier,” in 2018 Third Scientific Conference of Electrical Engineering
(SCEE), 2018, pp. 33-38, doi: 10.1109/SCEE.2018.8684125.
[9] N. Aggarwal, K. Asooja, S. S. Verma and S. Negi, “An improvement in the restoring division algorithm (needy
restoring division algorithm),” in 2009 2nd IEEE International Conference on Computer Science and Information
Technology, 2009, pp. 246-249, doi: 10.1109/ICCSIT.2009.5234956.
[10] K. Hill and J. E. Stine, “An Efficient Implementation of Radix-4 Integer Division Using Scaling,” in 2020 IEEE
63rd International Midwest Symposium on Circuits and Systems (MWSCAS), 2020, pp. 1092-1095, doi:
10.1109/MWSCAS48704.2020.9184631.
[11] S. Hashemi, R. I. Bahar and S. Reda, “A low-power dynamic divider for approximate applications,” in Proceedings
- Design Automation Conference, Jun. 2016, no. 105, doi: 10.1145/2897937.2897965.
[12] J. Liu, M. Chang and C. K. Cheng, “An iterative division algorithm for FPGAs,” in ACM/SIGDA International
Symposium on Field Programmable Gate Arrays - FPGA, 2006, pp. 83-89, doi: 10.1145/1117201.1117213.
[13] D. Kishor and V. Bhaaskaran, “Low power divider using vedic mathematics,” in 2014 International Conference on
Advances in Computing, Communications and Informatics (ICACCI), 2014, pp. 575-580, doi:
10.1109/ICACCI.2014.6968436.
[14] R. Senapati, B. Bhoi and M. P.- On, “Novel binary divider architecture for high speed VLSI applications,” in 2013
IEEE Conference on Information & Communication Technologies, 2013, pp. 675-679, doi:
10.1109/CICT.2013.6558180.
[15] S. BhanuTej, “Vedic divider-A high performance computing algorithm for VLSI applications,” in 2013
International conference on Circuits, Controls and Communications (CCUBE), 2013, pp. 1-5, doi:
10.1109/CCUBE.2013.6718577.
[16] K. Paulsson, M. Hübner and J. Becker, “Dynamic power optimization by exploiting self-reconfiguration in Xilinx
Spartan 3-based systems,” Microprocess. Microsyst., vol. 33, no. 1, pp. 46-52, 2009, doi:
10.1016/j.micpro.2008.08.006.
[17] C. V. S. Chaitanya, C. Sundaresan, P. R. Venkateswaran and K. Prasad, “Asic design of low power-delay product
carry pre-computation based multiplier,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 13, no. 2, pp. 845-852, 2019, doi: 10.11591/ijeecs.v13.i2.pp845-852.
[18] A. Muttaqin, Z. Abidin, R. A. Setyawan and I. A. Zahra, “Development of advanced automated test equipment for
digital system by using FPGA,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
vol. 15, no. 2, pp. 661-670, 2019, doi: 10.11591/ijeecs.v15.i2.pp661-670.
[19] I. Kuon and J. Rose, “Measuring the gap between FPGAs and ASICs,” IEEE Trans. Comput. Des. Integr. circuits
Syst., vol. 26, no. 2, pp. 203-215, 2007, doi: 10.1109/TCAD.2006.884574.
[20] A. Pal, “Sources of Power Dissipation,” in Low-Power VLSI Circuits and Systems, pp.141-173, Springer, New
Delhi, 2015, doi: 10.1007/978-81-322-1937-8_6.
[21] N. P. Kumar, B. S. Charles and V. Sumalatha, “A Review on Leakage Power Reduction Techniques at 45nm
Technology,” Mater. Today Proc., vol. 2, no. 9, pp. 4569-4574, 2015, doi: 10.1016/j.matpr.2015.10.074.
[22] R. Y. Reetu, “Dynamic power reduction of VLSI circuits: a review,” in Int. J. Adv. Res. Electron. Commun. Eng,
vol. 7, no. 3, pp. 245-259, 2018.
[23] S. Alluri, B. R. Naik, N. S. S. Reddy and M. V. Ramanaiah, “Performance Analysis of VLSI Circuits in 45 nm
Technology,” in Advances in Decision Sciences, Image Processing, Security and Computer Vision, Springer, 2020,
pp. 281-289, doi: 10.1007/978-3-030-24318-0_34.
[24] M. Arora, "The art of hardware architecture: Design methods and techniques for digital circuits," Springer Science
& Business Media, 2011, doi: 10.1007/978-1-4614-0397-5.
[25] A. Pal, "Low-Power VLSI circuits and systems," Springer, 2014, doi: 10.1007/978-81-322-1937-8.
[26] S. Devadas and S. Malik, “A survey of optimization techniques targeting low power VLSI circuits,” in Proceedings
of the 32nd annual ACM/IEEE Design Automation Conference, 1995, pp. 242-247, doi: 10.1145/217474.217536.
[27] M. Alidina, J. Monteiro, S. Devadas, A. Ghosh and M. Papaefthymiou, “Precomputation-based sequential logic
optimization for low power,” IEEE Trans. Very Large Scale Integr. Syst., vol. 2, no. 4, pp. 426-436, 1994, doi:
10.1109/92.335011.
[28] S. Mitra and D. Das, “A Comprehensive Review of Applications of Don’t Care Bit Filling Techniques for Test
Power Reduction in Digital VLSI Systems,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 12, no. 3, pp. 941-949, 2018.
[29] E. Macii, M. Pedram, and F. Somenzi, “High-level power modeling, estimation, and optimization,” IEEE Trans.
Comput. Des. Integr. circuits Syst., vol. 17, no. 11, pp. 1061-1079, 1998, doi: 10.1109/43.736181.
[30] S. Ravi, G. Lakshminarayana and N. K. Jha, “TAO: Regular expression based high-level testability analysis and
optimization,” in Proceedings International Test Conference 1998 (IEEE Cat. No. 98CH36270), 1998, pp. 331-
340, doi: 10.1109/TEST.1998.743171.
[31] V. Tiwari, S. Malik and P. Ashar, “Guarded evaluation: Pushing power management to logic synthesis/design,”
IEEE Trans. Comput. Des. Integr. Circuits Syst., vol. 17, no. 10, pp. 1051-1060, 1998, doi: 10.1109/43.728924.
[32] U. S. Patankar, M. E. Flores and A. Koel, “Division algorithms-From Past to Present Chance to Improve Area Time
and Complexity for Digital Applications,” in 2020 IEEE Latin America Electron Devices Conference (LAEDC),
2020, pp. 1-4, doi: 10.1109/LAEDC49063.2020.9073050.
[33] E. Matthews, A. Lu, Z. Fang and L. Shannon, “Rethinking integer divider design for FPGA-based soft-processors,”
in 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines
(FCCM), 2019, pp. 289-297, doi: 10.1109/FCCM.2019.00046.
[34] K. Kataria and S. Patel, “Design Of High Performance Digital Divider,” in 2020 IEEE VLSI DEVICE CIRCUIT
AND SYSTEM (VLSI DCS), 2020, pp. 1-6, doi: 10.1109/VLSIDCS47293.2020.9179903.
[35] A. Parashar, G. Aggarwal, R. Dang, P. Dalmia and N. Pandey, “Fast Combinational Architecture for a Vedic
Divider,” in 2017 14th IEEE India Council International Conference (INDICON), 2017, pp. 1-5, doi:
10.1109/INDICON.2017.8487598.
[36] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs. Oxford University Press, New York, 2010.
Indonesian J Elec Eng & Comp Sci, Vol. 24, No. 3, December 2021: 1354 - 1366