PRODOC121

IMPLEMENTATION OF ALU USING POWER
OPTIMIZING TECHNIQUES
Main report submitted in partial fulfilment of the requirement for the award of
the degree of
MASTER OF TECHNOLOGY
IN
VLSI
Submitted by
USAVARTHI SRAVANA JYOTHI
(Regd. No: 321206540004)
Under the Esteemed Guidance of
Prof. P. RAJESH KUMAR, M.E, Ph.D. HEAD

OF THE DEPARTMENT
Department of Electronics & Communication Engineering

Andhra University College of Engineering
Andhra University
Visakhapatnam-530003
2021-2023
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
ANDHRA UNIVERSITY COLLEGE OF ENGINEERING

VISAKHAPATNAM – 530003
2021 - 2023
CERTIFICATE
This is to certify that the project report entitled “IMPLEMENTATION OF ALU
USING POWER OPTIMIZATION TECHNIQUES” is a bonafide record work
done by Usavarthi Sravana Jyothi bearing the Regd.No.321206540004
submitted in partial fulfilment of the requirement for the award of the degree of
Master of Technology in VLSI in the Department Of Electronics and
Communication Engineering, Andhra University College of Engineering (A),
Visakhapatnam during the year 2021-2023
PROJECT GUIDE HEAD OF THE DEPARTMENT

Prof. P. RAJESH KUMAR M.E., Ph.D. Prof. P. RAJESH KUMAR M.E., Ph.D.
Department of ECE Department of ECE
A. U. College of Engineering A. U. College of Engineering
Andhra University Visakhapatnam Andhra University
Visakhapatnam
DECLARATION
I hereby declare that the project work entitled “IMPLEMENTATION OF

ALU USING POWER OPTIMIZATION TECHNIQUES” is a bonafide work
done and submitted by USAVARTHI SRAVANA JYOTHI bearing the Regd No.:
321206540004, under the esteemed guidance of Prof. P.RAJESH KUMAR, in
the partial fulfilment for the award of degree in Master of Technology in
Electronics and Communication Engineering with specialization in VLSI at
Andhra University College of Engineering, Andhra University. I further declare
that to the best of my knowledge the project does not contain any part of work
which has been submitted for the award of any degree either in this university
or in any other university without proper citation.
Date:
Signature
U. SRAVANA JYOTHI
321206540004
M.TECH – VLSI
ECE DEPARTMENT
AUCE
ACKNOWLEDGEMENT
I express my sincere thanks to Prof. P. Rajesh Kumar, Head of the Department, Department of
Electronics and Communication Engineering for his valuable guidance, supervision, advice,
encouragement and supporting me for the completion of the project work.
I would like to express my gratitude to M. Nageswar Rao, Scientist – E, Department of Instrumentation,

NSTL, for his valuable guidance, advice and supporting me for successful completion of the project.
I express my sincere thanks to Senior Prof. G. Sasibhushana Rao, Principal, Andhra University College
of Engineering, for promotion of learning opportunity, co-operation and encouragement for the
project.
I express my sincere thanks to Prof. M. Satya Anuradha, Professor & Chairperson, Board of Studies,
Department of Electronics and Communication for administration support, academic guidance,
approval and authorization, resource allocation and project work completion.
I would like to express my gratitude to Prof. P.V. Sri Devi, Department of Electronics and Communication
Engineering, for her feedback and review in successful completion of the project.
I express my sincere thanks to Dr. S. Aruna, Associate Professor, Department of Electronics and
Communication Engineering for her co-operation and encouragement of innovation on this project.
I also express my sincere thanks to Prof. V. Malleswara Rao, Prof. G. Rajeswara Rao and Prof. K.
Chiranjeevi, Adjunct Professors, Department of Electronics and Communication Engineering, Andhra
University College of Engineering, for their technical expertise suggestion, cooperation and
encouragement on this project.
I also express my sincere thanks to Research Scholars and Office Staff, Department of Electronics and
Communication Engineering, Andhra University College of Engineering, for their cooperation and
encouragement on this project.
Regards
U. Sravana Jyothi
Regd. No: – 321206540004
ABSTRACT
Configurable ALUs (Arithmetic Logic Units) are essential components in modern

processors, designed to deliver high performance while keeping power
consumption to a minimum. To achieve optimal power efficiency in a
configurable ALU, blend-off techniques are employed. These techniques involve
deactivating unused functionality within the ALU to lower power consumption.
To implement blend-off techniques effectively, the ALU is designed with distinct
functional units for each operation it can perform. These functional units can be
selectively enabled or disabled based on the specific operation being executed.
This approach has the potential to yield significant reductions in power
consumption.
In this project, an 8-bit ALU was developed using Verilog within the Xilinx
Vivado2018.2 environment. This ALU was subjected to simulation and synthesis,
ensuring its proper functionality and performance. The next step involves
implementing clock gating techniques to further decrease power consumption,
enhancing the ALU's overall efficiency.
CONTENTS
CHAPTER 1: INTRODUCTION ................................................................................ 1
1.1 INTRODUCTION .............................................................................................. 1
1.2 ALU ARCHITECTURE......................................................................................3
1.3 DESIGN OF 8-BIT ALU ......................................................................................... 4
CHAPTER2:LITERATUREREVIEW...............................................................................6
CHAPTER 3: EXISTING METHOD................................................................................13
3.1 INTRODUCTION ................................................................................................... 13
3.2 CLOCK GATING TECHNIQUES .......................................................................... 17
3.2.1.Basic AND Gate Clock Gating:............................................................................. 17
3.2.2.Latch-Based Clock Gating Circuit: ....................................................................... 18
3.2.3. Flip flop-based clock gating technique: ............................................................... 18
3.2.4.Synthesis-basedclockgatingtechnique:....................................................................19
CHAPTER 4: PROPOSED METHOD …........................................................................ 22
4.1 INTRODUCTION .............................................................................................. 22
4.2 Booth's algorithm: ............................................................................................... 22
4.3 Carry look ahead adder: ....................................................................................... 23
4.4 Vedic multiplier: ................................................................................................... 25
4.3.1Advantages of vedic multiplier:- ............................................................................. 26
CHAPTER 5:ADVANTAGES AND APPLICATIONS .............................................. 27
5.1ADVANTAGES: ......................................................................................................... 27
5.2 APPILICATIONS: ..................................................................................................... 28
CHAPTER 6: XILINX VIVADO AND VERILOG .................................................... 30
6.1 HISTORY OF VERILOG .......................................................................................... 30
6.2 INTRODUCTION ..................................................................................................... 30
6.3 DESIGN STYLES ..................................................................................................... 31
CHAPTER 7: RESULTS ............................................................................................... 51
CHAPTER 8 CONCLUSION AND FUTURESCOPE ............................................... 56
8.1 CONCLUSION .......................................................................................................... 56
8.2FUTURE SCOPE ..................................................................................................................... 56
REFERENCE
LIST OF FIGURES
FIGURES PG.NO
Fig1: Dimensions to optimize VLSI chip…………………………………….…..2

Fig.2: ALU with clock gating and operation selection…………………………..16
Fig3: Latch free clock gating technique…………………………………………17
Fig.4: Latch based clock gating technique………………………………………18
Fig.5: flop-based clock gating technique………………………………………...19
Fig.6: synthesis-based clock gating technique…………………………………...19
Fig.7: Carry look ahead adder……………………………………………………23
Fig.8: 8bit Vedic multiplier………………………………………………………24
LIST OF TABLES
TABLES PG. NO
TABLE I: ALU OPERATIONS BASED ON SELECTION LINES ………….17
TABLE II: SUMMARY OF VARIOUS CLOCK GATING TECHNIQUES…….20
TABLE III: OUTPUTS OF GATING TECHNIQUES……………………………54

CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
In the earlier days, the designers of VLSI were more interested on

the area of the circuits, performance, reliability and cost was also the main consideration and
power consumption was their minor consideration. Now-a-days, the power is also being given
equal importance in comparison to area and speed. The dynamic power dissipation is being
comparable with both short circuit and leakage power as technology scale down. To identify
and modify the various leakages and switching of components is very essential to estimate and
also the reduction of power consumption in high speed and low power applications. Lot
of research is being conducted on low power design approaches. Clock Gating is one of the
prominent in low power design. ALU contains many arithmetic and logic modules and in a
given time it is required to compute only one of it. But in normal ALU design all the modules
are executed for every operation and generally using a MUX the output of the required module
is selected. Assuming an ALU has 16 modules when we need the addition of two numbers; all
the 16 different modules are operated and then output of only adder module is selected. It’s a
waste of power to execute even the other modules that were not required for that operation.
In the synchronous design we need clock signal to synchronize all the modules and
signals. The clock signal doesn’t carry any significant information and is used only for
synchronization. Power consumed by clock network is increasing as the circuit size is
increasing and also as operating frequency is increasing. In most case clock power is about
2030% of total power consumption.
Clock gating is a technique which can be utilized to save both unnecessary clock and logic
power. By using clock gating we can disable modules that are not required to be operated at a
given time for an operation. The digital synchronous circuit are designed to operate on every
raising edge or falling edge of clock, by maintaining the clock at a single state (i.e. at a high or
low state; not letting it to toggle) for the modules that are not required at that instant, power
can be saved. By disabling portions of the circuit that is not required so that the flips flops in
them doesn’t have to switch states hence power is saved. Switching states requires power and
by not letting to switch state the switching power nears zero.
Clock gating concept is not free from trade-offs, extra gates have to be added to the
circuit to decide when a module is to be disabled and when not. Also another requirement of
clock gating is that the modules must have some kind of enable input to decide when they must
be gated. The clock gating concept is very useful where there are many modules which are not
required to be operated every time.
Reduction in power dissipation is an essential design issue in VLSI circuit. Few
decades back designers mostly focus on area, delay and testability to optimize. While
technology scaling down, we can see more power leakage and dissipation in chip. In order to
1
reduce power dissipation and leakage power while scaling, we need to adopt the optimize
techniques like clock gating, voltage scaling etc.
Now a days designers are focusing in four dimensions to build any application. Those
are four dimensions are area, delay, testability and power. For any application, consumers
expect light weight, early response and not getting hot. For example consumers expect mobile
as light weight with multiple operations and quick response. For multiple operations, we need
to integrate multiple ICs into one chip. These causes more power consumption and phone
getting hot and also area increases. For reducing area, we are scaling down technology but we
can see power dissipation more and chip may getting hot. For maintaining cool system, quick
response and less area, we need to go for low power techniques. Fig. 1. Four dimensions to
Optimize VLSI chip.
Fig1: Dimensions to optimize VLSI chip.
In the previous decades, the real test for the VLSI architect was
region, execution, cost and power utilization. As of late, in any case, this has started to change
and, progressively control utilization is being given practically identical weight to region and
speed contemplations. The thoughts for lessening power utilization vary from application to
application and circuits to circuits. In the zone of small scale controlled battery worked compact
applications, for example, mobile phones, the objective is to keep the battery lifetime and
weight sensible and bundling cost low. Scaling of CMOS gadgets has empowered the
semiconductor business to take care of its demand for higher execution and higher coordination
densities. However as the component measure gets to be distinctly littler, in light of short direct
lengths it brings about expanded sub-threshold leakage current through a transistor when it is
off condition. Another purpose behind expanded sub-threshold leakage current is that,
transistors can't be turned off totally. Henceforth leakage control dissipation has turned into a
vital bit of the aggregate power utilization for silicon innovations. The significant three plan
parameters are power, speed and area. In CMOS VLSI circuits, control dissipation is basically
2
because of the three essential elements: dynamic, static and short out. A few enhancement
strategies have been proposed for leakage current lessening. One essential thing in CMOS
VLSI circuit configuration is to lessen the power dissipation while keeping up the superior of
the circuit.
The advantage of low power design techniques is a lot of
valuable than earlier. In this technologies design area, performance, cost and reliability could
be a major concern. Power consumption is merely a second issue. Now a days, improvement
of aggressive market sectors like wireless applications, laptops and movable medical devices,
supported power dissipation as a vital one. The action for decreasing power consumption varies
from application to application. In micro powered battery operated moveable applications like
cell phones to stay the battery life time and packaging price low. In portable computers as
laptops to cut back the power dissipation of the natural philosophy elements of the system is
merely two divided of the complete total power dissipation. At last, in high performance
systems process technology has been driven power to be first altogether factors of such designs.
1.2 ALU ARCHITECTURE
The Arithmetic Logic Unit (ALU) is a fundamental building

block of the Central Processing Unit (CPU) of a computer. Even one of the simplest
microprocessor contains one ALU for purposes such as maintaining timers. We can say that
ALU is a core component of all central processing unit within in a computer and is an integral
part of the execution unit. ALU is capable of calculating the results of a wide variety of basic
arithmetical and logical computations. The ALU takes as input the data to be operated on
(called operands) and a code from the control unit indicating which operation to perform. The
output is the result of the computation. The ALU implemented will perform the following
operations:
Arithmetic operations (addition, subtraction, increment, decrement, transfer).
Logic operations (AND, NOT, OR, NAND, NOR, EX-OR, EX-NOR).
A digital system can be represented at different levels of

abstraction. This keeps the description and design of complex systems manageable. The highest
level of abstraction is the behavioural level that describes a system in terms of what it does (or
how it behaves) rather than in terms of its components and interconnection between them.
Here the 8- bit ALU is implemented by using the behavioural
modelling style to describe how the operation of ALU is being processed. This is accomplished
by using a Verilog. The behavioural style makes use of a process statement. A process statement
is the main construct in behavioural modelling that allows using sequential statements to
describe the behaviour of a system over time. Process is declared within architecture and is a
concurrent statement.
3
1.3 DESIGN OF 8-BIT ALU
When designing the ALU we will follow the principle "Divide and
Conquer" in order to use a modular design that consists of smaller, more manageable blocks,
some of which can be re-used. Instead of designing the 4-bit ALU as one circuit we will first
design one bit ADDER, SUBTRACTOR, OR, AND, NOT, XOR, LEFT SHIFT, RIGHT SHIFT
UNIT. These bit-slices can then be put together to make a 8-bit ADDER, SUBTRACTOR, OR,
AND, NOT, XOR, LEFT SHIFT, RIGHT SHIFT UNIT.
An 8-bit Arithmetic Logic Unit (ALU) is a crucial component in a computer's central
processing unit (CPU). It is responsible for performing various arithmetic and logic
operations on 8-bit binary numbers. Here's a simplified explanation of what an 8-bit ALU
does and some of the common operations it can perform:
1.3.1 Arithmetic Operations:
Addition (ADD): Adds two 8-bit binary numbers, including handling carry from the previous
bit.
Subtraction (SUB): Subtracts one 8-bit binary number from another, considering borrow from
the previous bit.
Multiplication (MUL): In a simple ALU, implement multiplication by repeatedly adding one
operand (the multiplicand) to itself a number of times specified by the other operand (the
multiplier).
Division: In a simple ALU, division can be implemented by repeatedly subtracting the divisor
from the dividend while keeping track of the quotient.
Increment (INC): Increases the value of an 8-bit binary number by 1.
Decrement (DEC): Decreases the value of an 8-bit binary number by 1.
1.3.2 Logic Operations:
AND: Performs a bitwise AND operation on two 8-bit binary numbers.

NAND: Performs a bitwise NAND operation on two 8-bit binary numbers.
OR: Performs a bitwise OR operation on two 8-bit binary numbers.
NOR: Performs a bitwise NOR operation on two 8-bit binary numbers.
XNOR: Performs a bitwise XNOR operation on two 8-bit binary numbers
XOR: Performs a bitwise XOR (exclusive OR) operation on two 8-bit binary numbers.
4
1.3.3 Shift Operations:
Left Shift (LSHIFT): Shifts the bits of an 8-bit binary number to the left by a specified number
of positions.
Right Shift (RSHIFT): Shifts the bits of an 8-bit binary number to the right by a specified
number of positions.
An 8-bit ALU typically takes two 8-bit binary inputs, performs the specified operation, and
produces an 8-bit binary output along with flags or status bits to indicate results such as carry,
overflow, zero, and others. The flags are essential for branching and decision-making in the
CPU's control flow.
5
CHAPTER 2
LITERATURE SURVEY
B. Geetha, B. Padmavathi, and V. Perumal, “Design methodologies and circuit

optimization techniques for low power cmos vlsi design,” in 2017 IEEE International
Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI),
IEEE, 2017, pp. 1759–1763.
"Design methodologies and circuit optimization techniques for low power CMOS VLSI
design" is a comprehensive research paper or book that explores various approaches and
strategies for designing and optimizing integrated circuits using CMOS technology with a
focus on low power consumption. The objective of this work is to address the growing demand
for energy-efficient electronic devices and systems.
The paper/book begins by introducing the challenges associated with power consumption in
CMOS VLSI design and the importance of reducing power consumption in modern electronic
devices. It highlights the significance of low power design methodologies and optimization
techniques in achieving this goal.
The authors discuss different design methodologies that can be employed to achieve low power
consumption. These methodologies include architectural-level optimization, circuit-level
techniques, and system-level approaches. They provide insights into how to leverage these
methodologies to achieve power-efficient designs.
Furthermore, the paper/book delves into circuit optimization techniques that can be used
specifically for low power CMOS VLSI design. It covers a wide range of techniques, such as
voltage scaling, clock gating, power gating, leakage reduction techniques, and advanced power
management schemes. The authors explain the underlying principles behind each technique
and discuss their advantages and limitations.
The work also emphasizes the importance of trade-offs between power consumption,
performance, and area, as well as the impact of low power design techniques on other design
metrics. It provides guidelines on how to strike a balance between these factors to achieve
optimal results.
6
Throughout the paper/book, the authors present case studies and examples to illustrate the
practical application of the discussed methodologies and techniques. They also discuss the
latest advancements and emerging trends in low power CMOS VLSI design.In conclusion,
"Design methodologies and circuit optimization techniques for low power CMOS VLSI
design" is a valuable resource for researchers, engineers, and students interested in designing
energy-efficient integrated circuits. It provides a comprehensive overview of various design
methodologies and optimization techniques, equipping readers with the knowledge and tools
necessary to develop power-efficient CMOS-based electronic systems.
B. Padmavathi, B. Geetha, and K. Bhuvaneshwari, “Low power design techniques and

implementation strategies adopted in vlsi circuits,” in 2017 IEEE International
Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI),
IEEE, 2017, pp. 1764–1767.
Low power design techniques and implementation strategies in VLSI circuits aim to reduce
power consumption while maintaining the desired functionality and performance of the circuit.
These techniques are essential in modern electronic devices, where power efficiency is a critical
factor. Here is a summary of some common low power design techniques and implementation
strategies adopted in VLSI circuits Power gating: Power gating involves selectively shutting
down or isolating power to specific blocks or components of the circuit when they are not in
use. This technique minimizes leakage power by reducing the supply voltage to inactive
portions of the circuit, thereby conserving power.Clock gating: Clock gating is a technique that
enables the selective activation or deactivation of clock signals to circuit elements based on
their operational requirements. By gating the clock signal to inactive portions of the circuit,
unnecessary switching activity and power consumption are minimized.
Voltage scaling: Voltage scaling involves reducing the supply voltage to the circuit, thereby
reducing power consumption. This technique takes advantage of the fact that power
consumption is proportional to the square of the voltage, allowing for significant power savings
while maintaining acceptable performance.Dynamic voltage and frequency scaling (DVFS):
DVFS adjusts the operating voltage and clock frequency of the circuit dynamically based on
the workload or computational requirements. By scaling the voltage and frequency up or down
as needed, power consumption can be optimized without sacrificing
performance.Multithreshold CMOS (MTCMOS): MTCMOS employs transistors with
different threshold voltages to achieve power reduction. It allows for independent power gating
7
of different circuit blocks or modules, enabling finer control over power consumption. Leakage
reduction techniques: Leakage power refers to power dissipation that occurs even when the
circuit is in a standby or idle state. Various techniques such as transistor stacking, reverse body
biasing, and adaptive body biasing are used to reduce leakage power.Energy-efficient coding
and algorithms: In addition to circuit-level techniques, optimizing the algorithms and coding
strategies used in the design can also contribute to power efficiency. By minimizing
unnecessary data transfers, optimizing computation sequences, and reducing memory access,
overall power consumption can be reduced.System-level power management: System-level
power management techniques involve intelligent power management policies and strategies
that consider the overall system requirements. These techniques may include dynamic power
allocation, power-aware task scheduling, and power gating at the system level.Subthreshold
design: Subthreshold design operates transistors in the subthreshold region where power
consumption is significantly reduced. It is particularly useful for ultra-low power applications
but may come at the cost of reduced performance.Power-aware design methodologies:
Poweraware design methodologies involve incorporating power considerations throughout the
design process, from architecture exploration and circuit design to verification and testing.
These methodologies enable a systematic and holistic approach to power optimization. By
applying these low power design techniques and implementation strategies, VLSI circuits can
achieve significant power savings without compromising functionality or performance. The
choice and combination of techniques depend on the specific requirements of the circuit, power
constraints, and trade-offs between power, performance, and area.
U. Kaur and R. Mehra, “Low power cmos counter using clock gated flip-flop,” Int. J. Eng.
Adv. Tech, vol. 2, pp. 796–8, 2013.
A low-power CMOS counter using clock-gated flip-flops is a design approach that aims to
minimize power consumption in a counter circuit by employing clock gating techniques. In a
traditional counter design, flip-flops consume power even when they are not actively counting
due to continuous clocking. By integrating clock gating into the flip-flops, power consumption
can be significantly reduced. Here is a summary of the low-power CMOS counter using
clockgated flip-flops: Clock-gated flip-flops: Clock-gated flip-flops are modified versions of
traditional flip-flops that incorporate clock gating logic. The clock gating logic enables the
control of the clock signal to the flip-flop, allowing the flip-flop to be selectively activated or
deactivated based on the counter's operational requirements. When the counter is not actively
8
counting, the clock signal to the flip-flops is gated off, resulting in power savings. Power
savings: By using clock-gated flip-flops, power consumption in the counter circuit is reduced.
When the counter is not in use, the clock-gated flip-flops are effectively turned off, minimizing
unnecessary switching activity and reducing power dissipation associated with continuous
clocking.Improved efficiency: The low-power CMOS counter using clock-gated flip-flops
improves power efficiency without compromising the counter's functionality and performance.
The counter can still accurately count and provide the desired output while consuming less
power during idle or non-counting periods.
Design considerations: The design of a low-power CMOS counter using clock-gated flip-flops
requires careful consideration of the gating logic and its impact on the counter's timing and
functionality. Proper synchronization and control signals are necessary to ensure correct
operation and prevent glitches or timing violations. Trade-offs: While clock-gating flip-flops
reduces power consumption, it may introduce additional complexity to the counter design. The
added gating logic increases circuit area and may introduce delays in signal propagation.
Designers need to carefully evaluate the trade-offs between power savings, area overhead, and
timing constraints to achieve an optimal balance.Overall, the low-power CMOS counter using
clock-gated flip-flops provides an effective approach to minimize power consumption in
counter circuits. By integrating clock gating logic into the flip-flops, power consumption
during idle periods is significantly reduced, improving power efficiency while maintaining the
counter's functionality and performance. Power savings: By using clock-gated flip-flops, power
consumption in the counter circuit is reduced. When the counter is not in use, the clock-gated
flip-flops are effectively turned off, minimizing unnecessary switching activity and reducing
power dissipation associated with continuous clocking.Improved efficiency: The low-power
CMOS counter using clock-gated flip-flops improves power efficiency without compromising
the counter's functionality and performance. The counter can still accurately count and provide
the desired output while consuming less power during idle or non-counting periods.
Design considerations: The design of a low-power CMOS counter using clock-gated flip-flops
requires careful consideration of the gating logic and its impact on the counter's timing and
functionality. Proper synchronization and control signals are necessary to ensure correct
operation and prevent glitches or timing violations.Trade-offs: While clock-gating flip-flops
reduces power consumption, it may introduce additional complexity to the counter design. The
added gating logic increases circuit area and may introduce delays in signal propagation.
9
Designers need to carefully evaluate the trade-offs between power savings, area overhead, and
timing constraints to achieve an optimal balance. Overall, the low-power CMOS counter using
clock-gated flip-flops provides an effective approach to minimize power consumption in
counter circuits. By integrating clock gating logic into the flip-flops, power consumption
during idle periods is significantly reduced, improving power efficiency while maintaining the
counter's functionality and performance.
N. Khanna and D. Mishra, “Clock gated 16-bits ALU design&implementation on fpga,”

in 2018 4th International Conference for Convergence in Technology (I2CT),IEEE, 2018,
pp. 1–5.
The clock-gated 16-bit ALU (Arithmetic Logic Unit) design and implementation on an FPGA
(Field-Programmable Gate Array) is a technique used to reduce power consumption in an ALU
circuit by incorporating clock gating. This summary provides an overview of the clock-gated
16-bit ALU design and its implementation on an FPGA.Clock gating: Clock gating is a
powersaving technique that selectively enables or disables the clock signal to specific parts of
the circuit based on their operational requirements. By gating the clock signal, power
consumption can be significantly reduced by preventing unnecessary clock toggling in inactive
sections of the circuit.
16-bit ALU: The 16-bit ALU is a digital circuit that performs arithmetic and logic operations
on 16-bit binary numbers. It typically consists of various submodules, such as adders,
multipliers, shifters, and logic gates, to perform different operations like addition, subtraction,
logical AND/OR, etc. Design and implementation: The clock-gated 16-bit ALU design starts
with identifying the specific components or submodules that can benefit from clock gating.
These components are then modified by adding clock gating logic to control the clock
signals.Clock gating logic: The clock gating logic is designed to activate or deactivate the clock
signals to specific parts of the ALU based on control signals or operational conditions. When a
particular operation is not being performed, the clock to the corresponding submodule is gated
off, reducing power consumption. FPGA implementation: FPGA provides a flexible platform
for implementing digital circuits. The clock-gated 16-bit ALU design is translated into a
hardware description language (HDL), such as Verilog or VHDL, which is then synthesized
and mapped to the FPGA resources. The implementation involves configuring the FPGA to
realize the desired ALU functionality and integrating the clock gating logic into the design.
Power savings: The incorporation of clock gating in the 16-bit ALU design reduces power
10
consumption by minimizing unnecessary clock toggling in inactive submodules. This results
in power savings during idle or non-operational periods of the ALU. Trade-offs: When
implementing clock gating in an ALU design, trade-offs need to be considered. The addition
of clock gating logic may introduce additional circuit complexity, potentially impacting area
utilization and timing performance. Careful analysis and optimization are required to balance
power savings with area and performance considerations.
G. Shrivastava and S. Singh, “Power optimization of sequential circuit based ALU using
gated clock & pulse enable logic,” in 2014 International Conference on Computational
Intelligence and Communication Networks,IEEE, 2014, pp. 1006–1010.
"Power optimization of sequential circuit-based ALU using gated clock and pulse enable logic"
is a research paper or study that focuses on reducing power consumption in sequential circuits,
specifically in an Arithmetic Logic Unit (ALU), by incorporating gated clock and pulse enable
logic techniques. This summary provides an overview of the power optimization techniques
employed in this study:
Sequential circuit-based ALU: The ALU is a digital circuit that performs arithmetic and logical
operations on binary data. A sequential circuit-based ALU incorporates flip-flops and other
sequential elements to store and process data over multiple clock cycles. Power optimization
objectives: The primary objective of this study is to minimize power consumption in the ALU
without sacrificing its functionality or performance. Power optimization techniques aim to
reduce unnecessary power dissipation during idle or non-operational periods of the ALU.Gated
clock: Gated clocking is a technique that selectively enables or disables the clock signal to
specific sections of the circuit based on certain conditions or control signals. In the context of
the ALU, gated clocking is applied to the sequential elements, such as flip-flops, to deactivate
the clock input when the circuit is idle or not performing any computation. This prevents
unnecessary clock toggling and reduces power consumption.Pulse enable logic: Pulse enable
logic involves using control signals or conditions to enable or disable the operation of specific
parts of the ALU. By activating the required circuit components only when needed, power
consumption can be further reduced.Power optimization methodology: The study presents a
methodology for optimizing power in the sequential circuit-based ALU. It involves identifying
the sections of the ALU that can benefit from gated clock and pulse enable logic. The necessary
modifications are made to the circuit design to incorporate these techniques.Power savings:
The introduction of gated clock and pulse enable logic in the ALU leads to significant power
11
savings. By deactivating the clock and selectively enabling circuit components, power
consumption is reduced during idle periods or when specific operations are not being
performed.Trade-offs: When implementing power optimization techniques, trade-offs need to
be considered. The addition of gated clock and pulse enable logic may introduce additional
complexity to the circuit design, impacting area utilization, timing, and potentially introducing
design challenges. Careful analysis and optimization are required to strike a balance between
power savings and other design metrics.Overall, the study on power optimization of the
sequential circuit-based ALU using gated clock and pulse enable logic provides insights into
reducing power consumption in ALU designs. By selectively deactivating clock signals and
enabling circuit components based on control signals, significant power savings can be
achieved without compromising the functionality and performance of the ALU.
12
CHAPTER 3
EXISTING METHOD
3.1 INTRODUCTION
Reduction in power dissipation is an essential design issue in VLSI circuit. One of the important
block in any processor is Arithmetic Logic Unit and it performs arithmetic and logical
operations. If operations are more and more complex then power dissipation is more. The clock
network is a major source of power dissipation so we can reduce significant amount of power
if we can gate the clock whenever it isn't required. From the literature, we have noticed that
there several methods/techniques used to reduce the power within ALU, the used methods are
moderate and still there is scope to reduce power using blend of techniques. So low power ALU
is designed using clock gating techniques besides using PIPO and Booth's algorithm concept.
By giving specific opcode, we can enable the specific operation and other operations are in
inactive mode, so we can see less power dissipation in ALU. Low power ALU is having two 8
bit input data with cin, bin, enable and 2 bit shift data and a decoder 4:16 to select the 16
operations by giving 4 bit opcode to it as a input with start enable function. At each iteration
the proposed design is implemented with one of these clock gating techniques i.e latch free
clock gated technique, latch based clock gated technique, flip flop based clock gated technique,
and synthesis based clock gating technique with parallel in parallel out (PIPO) shift registers.
These all techniques are performed with operation selection feature and PIPO shift registers in
this design at different parameters area, power and delay using Xilinx vivado. This paper
mainly focuses to analyse those parameters in ALU with and without clock gating techniques
combining with PIPO and Booth's algorithm methods.
There are several existing techniques for power optimization in configurable Arithmetic Logic
Units (ALUs) that employ a blend of different approaches. These techniques have been
developed to address the increasing demand for low-power designs in modern computer
systems. Here are some of the commonly used techniques:
Gate-level optimization: This technique focuses on reducing power consumption by
optimizing the digital logic gates in the ALU. It involves using low-power gate structures, such
as static CMOS, transmission gates, or pass-transistor logic, which can minimize power
dissipation during switching transitions.
Circuit-level optimization: At the circuit level, techniques such as voltage scaling, current
scaling, and threshold voltage adjustment can be employed. These techniques optimize the
supply voltage and current levels of the ALU's components to achieve a balance between power
consumption and performance.
Algorithmic optimization: This technique involves designing algorithms or modifying
existing ones to minimize power consumption. By analyzing the operations performed by the
13
ALU and optimizing the algorithmic flow, unnecessary computations can be eliminated,
leading to reduced power consumption.
Power gating: Power gating is a technique that selectively shuts down power to specific parts
of the ALU when they are not in use. By turning off power to inactive components, power
consumption can be significantly reduced. Power gating can be implemented using various
approaches, such as sleep transistors, body biasing, or dynamic voltage scaling. Clock gating:
Clock gating is a technique that involves controlling the clock signal to different parts of the
ALU based on their activity. By disabling the clock to unused or idle components, power
consumption can be reduced. Clock gating can be implemented at different levels, including
the register transfer level (RTL) or the gate level.
Data encoding: Data encoding techniques aim to reduce the switching activity and power
consumption by encoding the data before it is processed by the ALU. Techniques such as value
prediction, data compression, and data encoding schemes like Gray code can be employed to
minimize power dissipation.
Pipelining and parallelism: By breaking down complex operations into smaller stages and
executing them in parallel, pipelining and parallelism techniques can reduce power
consumption. These techniques enable a more efficient utilization of the ALU's resources and
can improve overall power efficiency.
It's important to note that the effectiveness of these techniques may vary depending on the
specific ALU design and the targeted application. Therefore, a combination of these techniques,
tailored to the specific requirements and constraints of the ALU, is often necessary to achieve
optimal power optimization.
Circuit-level optimization techniques play a crucial role in power optimization for configurable
Arithmetic Logic Units (ALUs). These techniques aim to reduce power consumption by
optimizing the electrical characteristics and configurations of the ALU's circuitry. Here are
some commonly used circuit-level optimization techniques:
Voltage scaling: Voltage scaling involves reducing the supply voltage provided to the ALU's
components. By operating at a lower voltage, the power consumption can be significantly
reduced. However, voltage scaling must consider the impact on performance and ensure that
the ALU operates reliably at the reduced voltage. Current scaling: Current scaling involves
adjusting the bias currents in the ALU's circuitry. By reducing the bias currents, the power
consumed by the ALU can be reduced. However, this technique must be carefully optimized to
ensure that the ALU's functionality and performance are not compromised. Threshold voltage
adjustment: By adjusting the threshold voltage of transistors in the ALU, power consumption
can be optimized. Lowering the threshold voltage can reduce leakage currents and static power
consumption. However, careful consideration must be given to ensure that the ALU operates
correctly at the adjusted threshold voltage. Power-gating: Power-gating techniques selectively
turn off power to inactive parts of the ALU. This involves incorporating switches or sleep
transistors that can disconnect the power supply to specific circuit blocks when they are not in
use. Power-gating can significantly reduce power consumption by eliminating power leakage
and dynamic power dissipation in idle components.
Clock gating: Clock gating involves controlling the clock signal to different parts of the ALU
based on their activity. By disabling the clock to unused or idle components, power
consumption can be reduced. Clock gating circuits detect idle states and gate the clock signal
to those parts, preventing unnecessary power dissipation. Dynamic voltage scaling: Dynamic
14
voltage scaling adjusts the supply voltage of the ALU dynamically based on the workload or
computational requirements. By scaling the supply voltage to match the computational needs,
power consumption can be optimized. This technique requires careful monitoring and control
of the workload to ensure correct operation. Leakage power reduction: Leakage power is a
significant concern in deep submicron technologies. Various leakage reduction techniques,
such as transistor stacking, reverse body biasing, and power gating, can be employed to
minimize leakage currents and associated power dissipation. These circuit-level optimization
techniques are often employed in combination to achieve the best balance between power
consumption, performance, and reliability in a configurable ALU. The specific choice and
implementation of these techniques depend on the technology process, design constraints, and
targeted power reduction goals.
From the literature, we have noticed that there several methods/techniques used to reduce the
power within ALU, the used methods are moderate and still there is scope to reduce power
using blend of techniques. So low power ALU is designed using clock gating techniques
besides using PIPO and Booth's algorithm concept. By giving specific opcode, we can enable
the specific operation and other operations are in inactive mode, so we can see less power
dissipation in ALU. Low power ALU is having two 8 bit input data with cin, bin, enable and 2
bit shift data and a decoder 4:16 to select the 16 operations by giving 4 bit opcode to it as a
input with start enable_function. At each iteration the proposed design is implemented with one
of these clock gating techniques i.e latch free clock gated technique, latch based clock gated
technique, flip-flop based clock gated technique, and synthesis based clock gating technique
with parallel in parallel out (PIPO) shift registers. These all techniques are performed with
operation selection feature and PIPO shift registers in this design at different parameters area,
power and delay using Xilinx vivado. This paper mainly focuses to analyse those parameters
in ALU with and without clock gating techniques combining with PIPO and Booth's algorithm
methods.
In integrated circuits, clocking system consumes a colossal portion of chip power, which
includes switching activities of flip-flops, latches, clock distribution networks. Power gating
and clock gating are two of the most effective techniques that is applied today for reducing
dynamic and leakage power, respectively, in digital CMOS circuits. Power gating is essentially
for reducing leakage power by switching off power supply to the nonoperational power domain
of the chip during certain mode of operation. Header and footer switches, isolation cells and
State Retention Flip Flops (SRFFs) used for implementing power gating. Clock gating is for
reducing dynamic power by controlling switching activities on the clock path. Generally, Gate,
Latch, or FF based clock gating cells used for implementing clock gating. The combined use
of the two solutions, however , possess some challenges in terms of practical integration of the
required control logics and power/timing overhead associated to it. Here we present an analysis
in Cadence virtuoso tool using 90nm technology using a simple PIPO (parallel in parallel out)
shift register. This project specifically targets the combined application of clock and power
gating techniques.
15
Fig.2 :ALU with clock gating and operation selection.
We mentioned the opcodes in Table for enabling the instruction and respective operations can
be performed in the design.
OPCODE OPERATION ACTIVE FUNCTIONAL
UNIT
0000 Addition Arithmetic
0001 Subtraction Arithmetic
0010 Multiplication Arithmetic
0011 Division Arithmetic
0100 AND Operation Logical
0101 OR Operation Logical
0110 NAND Operation Logical
0111 NOR Operation Logical
1000 XOR Operation Logical
1001 XNOR Operation Logical
16
1010 Arithmetic left shift Logical
Operation
1011 Arithmetic right shift Logical
Operation
1100 Logical Left shift Operation Logical
1101 Logical right shift Operation Logical
1110 1’s compliment Logical
1111 2’s complement Logical
TABLE I: ALU OPERATIONS BASED ON SELECTION LINES
3.2 CLOCK GATING TECHNIQUES
Clock gating is a power optimization technique used in digital circuits to

reduce dynamic power consumption by disabling the clock signal to specific portions of the
circuit when they are not in use. This prevents unnecessary toggling of flip-flops and reduces
power consumption. Here are some common clock gating techniques:
3.2.1.Basic AND Gate Clock Gating:
• In this technique, an AND gate is inserted between the clock signal and the clock inputs
of the flip-flops.
• The AND gate takes an enable signal as input, along with the original clock signal.
• When the enable signal is low (indicating that the block should be inactive), the output
of the AND gate becomes low, effectively gating the clock to the flip-flops and
preventing clock pulses from reaching them.
Fig3: Latch free clock gating technique
17
3.2.2.Latch-Based Clock Gating Circuit:
• In latch-based clock gating, a level-sensitive latch (often a D latch) is used to gate

(control) the clock signal to a specific portion of the circuit.
• The latch has a clock input (CLK), a data input (D), and an enable or control input (EN).
• When the enable input (EN) of the latch is active (i.e., at the desired logic level), the
latch becomes transparent, allowing the clock signal to pass through to the output (Q).
• When the enable input (EN) is inactive, the latch holds its last value, effectively
blocking the clock signal from reaching the circuitry downstream.
Fig.4: Latch based clock gating technique.
3.2.3. Flip flop based clock gating technique:
In many applications, latch based designs are moved to flip flop based designs. By splitting flip
flop, we can see two latches from the master slave theorem. In this technique, we can see D
flip flop with AND gate.
18
Fig.5: flop based clock gating technique.
From the above figure, gated clock goes to high when flip flop output and clock are in high
state otherwise gated clock goes to zero state. That means when clock in sleep mode then gated
clock also in zero state.
3.2.4 Synthesis based clock gating technique:
In this technique, gated clock can be generated by using either a positive or a negative latch
with combination of AND or OR or EXOR or EXNOR logic gates shown in fig.Synthesis based
clock gating technique using negative latch In synthesis based clock gating using negative
latch, we can observe when enable signal constant then x signal can be one and due to this
controlled clock can be high so negative latch won’t work and it gives previous value as output.
Fig.6: Synthesis based clock gating technique using negative latch
19
When enable signal changes then X value goes to zero value and controlled clock operated the
negative latch circuit and it provides different value at the output side. Gated clock signal can
generated with AND gate of clock signal and output signal of negative latch.
Clock gating is a crucial power-saving technique in modern digital design, especially in

lowpower and energy-efficient applications. The goal is to minimize clock power without
compromising functionality or performance. Careful consideration of the enable conditions and
proper verification are essential to ensure correct operation of clock-gated circuits.
Table for clock gating:-
TABLE II: SUMMARY OF VARIOUS CLOCK GATING TECHNIQUES.

Proposed work presents 8-bit Arithmetic and Logic unit that performs various arithmetic and
logical operations. Here, arithmetic operations are addition, subtraction, multiplication and
division. Multiplication and division operations are designed by using Booth’s algorithm and
logical operations are AND, NAND, OR, NOR, XOR, XNOR, arithmetic left and right shift,
logical left and right shift, 1’s compliment and 2’s compliment. For shift operations, we
designed to shift the bits upto three bits. Here, we designed PIPO using low power D flip flop
and this flip flop designed by using low power D latch by the master slave concept. Here, we
are sending the data to operations through PIPO. The detailed implementation results are
reported in results section. We mentioned the opcodes in Table I for enabling the instruction
and respective operation can be performed in the design.
20
CHAPTER 4
PROPOSED METHOD
4.1 INTRODUCTION
In existing method represents 8-bit Arithmetic and Logic unit that performs various arithmetic
and logical operations. Here, arithmetic operations are addition, subtraction, multiplication and
division. Multiplication and division operations are designed by using Booth’s algorithm and
logical operations are AND, NAND, OR, NOR, XOR, XNOR, arithmetic left and right shift,
logical left and right shift, 1’s compliment and 2’s compliment. For shift operations, we
designed to shift the bits upto three bits.
In Proposed work presents 8-bit Arithmetic and Logic unit that performs various arithmetic and
logical operations. Here, arithmetic operations are addition, subtraction, multiplication and
division. In existing method addition operation can be done by using ripple carry adder. In this
proposed method, Using a Carry Look-Ahead Adder (CLA) in place of a Ripple Carry Adder
(RCA) is a common optimization in digital circuit design, especially for arithmetic units in
processors and digital systems. The primary advantage of a CLA over an RCA is its ability to
compute carry signals in parallel, which significantly speeds up the addition process.
Multiplication operation can be change to booths algorithm to Vedic multiplication technique
and division operations are designed by using Booth’s algorithm and logical operations are
AND, NAND, OR, NOR, XOR, XNOR, arithmetic left and right shift, logical left and right
shift, 1’s compliment and 2’s compliment. For shift operations, we designed to shift the bits
upto three bits. Here, we designed PIPO using low power D flip flop and this flip flop designed
by using low power D latch by the master slave concept. Here, we are sending the data to
operations through PIPO. ALU designs are simulated using Xilinx vivado 2018.2 simulator.
The detailed implementation results are reported in results section. We mentioned the opcodes
in Table I for enabling the instruction and respective operation can be performed in the design.
4.2 Booth's algorithm:
Booth's algorithm is a multiplication algorithm used to multiply two binary

numbers efficiently, particularly in digital computers. It was developed by Andrew Donald
Booth in 1950 and is widely used in hardware implementations for signed binary
multiplication. The algorithm reduces the number of partial products that need to be calculated,
making it more efficient than basic methods like long multiplication. Here's how Booth's
algorithm works: Input:
21
• Two binary numbers: the multiplier (M) and the multiplicand (Q).
• A product register initialized to zero (P).
• An accumulator register initially set to zero.
Algorithm Steps:
a. Initialize a counter (usually called "count" or "n") to the number of bits in the binary
numbers (assuming two's complement representation).
b. Repeat the following steps "n" times (once for each bit):
• Check the two least significant bits of the multiplier and accumulator.
• If they are "00" or "11," do nothing (no addition or subtraction).
• If they are "01," add the multiplicand (Q) to the accumulator (A).
• If they are "10," subtract the multiplicand (Q) from the accumulator (A).
• Right-shift the accumulator and the multiplier.
• Right-shift P and include the carry-out from the least significant bit (LSB) of
P in the most significant bit (MSB) of P.
• Decrement the counter by 1.
c. After the loop, the product (P) contains the result of the multiplication, and the
accumulator (A) may contain a signed extension.
d. If necessary, perform arithmetic operations to adjust the result. For signed

multiplication, this often involves discarding the LSB of P and using A to determine the
sign.
Output:
• The result of the multiplication is in the product register (P).
Booth's algorithm is particularly efficient because it reduces the number of partial products to
half the number of bits in the binary numbers, compared to the standard long multiplication
method, which requires "n" partial products for "n" bits. This efficiency is especially
advantageous in hardware implementations, such as in microprocessors and digital signal
processors, where speed is crucial.
4.3 Carry look ahead adder:

A Carry Look-Ahead Adder (CLA) is a type of digital circuit used in
computer hardware to perform binary addition. It's designed to speed up the addition process
by reducing the dependency on the carry bit generated from previous stages, allowing for faster
and more parallelized addition operations. The traditional method of binary addition involves
a series of full-adder circuits chained together. Each full adder takes two input bits (A and B)
and a carry-in (Cin) and produces a sum (S) and a carry-out (Cout). The carry-out from one
stage is the carry-in for the next stage
22
.
Fig.7: Carry look ahead adder.
In a Carry Look-Ahead Adder, the generation of carry bits is precomputed and made available
for each stage, which eliminates the need to wait for carry bits to propagate through the entire
chain. This results in faster addition operations. The carry look-ahead logic is based on the
observation that the carry-out of a stage can be expressed in terms of the input bits without
waiting for the carry-in from the previous stage.There are different ways to implement a Carry
Look-Ahead Adder, including using binary tree structures and specialized logic gates. One
common method is to use the following equations for carry generation (Gi) and carry
propagation (Pi):
Gi = Ai ⨁ Bi (Gi represents the carry generated by the ith stage)
Pi = Ai ⨁ Bi (Pi represents the carry propagated from the ith stage)
With these equations, you can compute the carry-out (Cout) and sum (Si) for each stage as
follows: Si = Ai ⨁ Bi ⨁ Ci-1 (where Ci-1 is the carry-in from the previous stage)
Cout = Gi + (Pi ⨁ Ci-1)

The Carry Look-Ahead Adder calculates Gi and Pi for each stage and then computes the Cout
and Si for each stage without waiting for the carry bit to ripple through the entire chain. This
parallelism improves the speed of addition.
4.4 Vedic multiplier:
The Vedic 8-bit multiplier is typically implemented using Vedic multiplication techniques such
as Nikhilam Sutra (also known as the general multiplication formula) and UrdhvaTiryakbhyam
Sutra (also known as the vertically and crosswise multiplication formula). These techniques
allow for a faster and more efficient multiplication process compared to traditional
23
multiplication methods. Here is a high-level overview of how a Vedic 8-bit multiplier might
work:
Input: You have two 8-bit binary numbers, which we'll call A and B. These are the numbers you
want to multiply.
Initialization: Initialize the result, which will be a 16-bit number (since the product of two 8bit
numbers can be up to 16 bits).
Partitioning: Split both input numbers (A and B) into smaller segments, usually 4 bits each.
These segments are used in the multiplication process.
Multiplication: Perform the multiplication using Vedic techniques. This typically involves
various steps like vertically and crosswise multiplication, additions, and carry propagation. The
Vedic techniques provide a systematic way to compute the partial products efficiently.
Accumulation: Sum up all the partial products to obtain the final 16-bit product. This might
involve adding or accumulating the products obtained in the previous step.
Result: The result is the 16-bit product of the two 8-bit input numbers A and B.
Fig.8: 8bit Vedic multiplier.
24
4.4.1Advantages of vedic multiplier:-
Speed: Vedic multiplication techniques are known for their speed. They enable faster
multiplication operations compared to traditional long multiplication methods,
making them particularly suitable for high-speed digital circuits and processors.
Parallelism: Vedic multiplication naturally lends itself to parallelism. It allows for
multiple partial products to be generated and added simultaneously, reducing the
overall multiplication time and improving throughput.
Low Power Consumption: Due to their parallel nature and reduced carry propagation,
Vedic multipliers often consume less power than traditional multipliers.
This can be important in battery-powered or energy-efficient devices.
25
CHAPTER 5
ADVANTAGES & APPLICATIONS
5.1ADVANTAGES:
• Reduced Power Consumption: The primary advantage of power optimization
techniques is the significant reduction in power consumption. By implementing various
optimization strategies, the configurable ALU can operate efficiently with lower power
requirements, leading to energy savings.
• Longer Battery Life: Power optimization is crucial in battery-powered devices

such as mobile phones, laptops, or IoT devices. By minimizing power consumption in
the configurable ALU, the overall system can achieve longer battery life, enhancing the
usability and portability of the devices.
• Enhanced Performance: Power optimization techniques often go hand in hand

with performance optimization. By carefully designing the configurable ALU and
optimizing its operations, you can achieve faster execution times and improved
throughput. This means that the ALU can perform computations efficiently without
sacrificing performance.
• Flexibility and Adaptability: Configurable ALUs provide flexibility in

performing a wide range of operations. By optimizing power consumption, you can
ensure that the ALU remains adaptable and efficient across different computational
tasks. It allows for dynamic reconfiguration and efficient utilization of hardware
resources based on the specific requirements of the workload.
• Scalability: Power optimization techniques enable scalability in the configurable

ALU design. As the complexity and size of the ALU increase, power consumption
can become a significant concern. By employing a blend of techniques, the ALU can
scale effectively while keeping power consumption under control.
• Reliability and Thermal Management: Excessive power consumption in an

ALU can lead to thermal issues and reduce overall system reliability. By optimizing
power consumption, you can mitigate these challenges and ensure that the ALU
operates within acceptable temperature ranges, improving the reliability and longevity
of the system.
26
• Cost Savings: Power optimization techniques can also contribute to cost savings. By
reducing power consumption, you may require a smaller power supply, resulting in cost
savings in terms of power management components, cooling systems, and overall
system design.
5.2 APPILICATIONS:
• Mobile Devices: Power optimization is essential in mobile devices such as

smartphones, tablets, and wearables. These devices often operate on battery power and
have limited energy resources. By optimizing the power consumption of the
configurable ALU, mobile devices can achieve longer battery life, enabling extended
usage and better user experience.
• Internet of Things (IoT): IoT devices are typically deployed in large numbers and
often operate in remote or inaccessible locations. Power optimization is crucial for IoT
devices to ensure energy-efficient operation and maximize their lifespan. By optimizing
the power consumption of the configurable ALU, IoT devices can operate on limited
power sources, reducing the need for frequent battery replacements or recharging.
• Embedded Systems: Power optimization is critical in embedded systems, where power
constraints and energy efficiency are key considerations. These systems are commonly
found in automotive, aerospace, industrial automation, and healthcare applications. By
employing power optimization techniques in the configurable ALU, embedded systems
can achieve efficient operation while meeting power constraints and ensuring reliable
performance.
• Data Centers: Data centers house a vast number of servers and computing resources,
resulting in substantial power consumption. Power optimization in configurable ALUs
used in data center servers can help reduce the overall energy consumption of the
facility. By employing power optimization techniques, data centers can achieve higher
computational efficiency, reduce electricity costs, and minimize the environmental
impact associated with data center operations.
• High-Performance Computing (HPC): HPC systems require massive computational
power but are often power-limited due to the high energy consumption of these systems.
Power optimization in configurable ALUs used in HPC architectures can help balance
performance and power consumption. By optimizing power usage, HPC systems can
achieve higher energy efficiency and reduce operating costs.
27
• Automotive Electronics: Modern vehicles incorporate numerous electronic systems,
ranging from infotainment to advanced driver assistance systems (ADAS). Power
optimization in configurable ALUs used in automotive electronics is crucial to
maximize the vehicle's battery life and ensure reliable operation. By reducing power
consumption, automotive electronics can operate efficiently while minimizing the
impact on the vehicle's electrical system.
• Portable and Wearable Devices: Portable devices such as laptops, tablets, and
wearable devices demand power-efficient operation for extended usage and improved
user experience. Power optimization in configurable ALUs enables these devices to
operate on battery power without sacrificing performance. By optimizing power
consumption, portable and wearable devices can offer longer battery life and better
mobility.
28
CHAPTER 6
XILINX VIVADO AND VERILOG HDL
6.1 HISTORY OF VERILOG

Verilog was started initially as a proprietary hardware modelling language by Gateway Design
Automation Inc. around 1984. It is rumoured that the original language was designed by taking
features from the most popular HDL language of the time, called Hello, as well as from
traditional computer languages such as C. At that time, Verilog was not standardized and the
language modified itself in almost all the revisions that came out within 1984 to 1990.
Verilog simulator was first used beginning in 1985 and was extended substantially through
1987. The implementation was the Verilog simulator sold by Gateway. The first major
extension was Verilog-XL, which added a few features and implemented the infamous "XL
algorithm" which was a very efficient method for doing gate-level simulation.
The time was late 1990. Cadence Design System, whose primary product at that time included
thin film process simulator, decided to acquire Gateway Automation System. Along with other
Gateway products, Cadence now became the owner of the Verilog language, and continued to
market Verilog as both a language and a simulator.
At the same time, Synopsys was marketing the top-down design methodology, using Verilog.
This was a powerful combination. In 1990, Cadence recognized that if Verilog remained a
closed language, the pressures of standardization would eventually cause the industry to shift
to VHDL. Consequently, Cadence organized the Open Verilog International (OVI), and in 1991
gave it the documentation for the Verilog Hardware Description Language. This was the event
which "opened" the language.
6.2 INTRODUCTION
• HDL is an abbreviation of Hardware Description Language. Any digital system can be
represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs are used to describe
this RTL.
29
• Verilog is one such HDL and it is a general-purpose language –easy to learn and use. Its
syntax is similar to C.
• The idea is to specify how the data flows between registers and how the design processes
the data.
• To define RTL, hierarchical design concepts play a very significant role. Hierarchical
design methodology facilitates the digital design flow with several levels of abstraction.
• Verilog HDL can utilize these levels of abstraction to produce a simplified and efficient
representation of the RTL description of any digital design.
• For example, an HDL might describe the layout of the wires, resistors and transistors on
an Integrated Circuit (IC) chip, i.e., the switch level or, it may describe the design at a
more micro level in terms of logical gates and flip flops in a digital system, i.e., the gate
level. Verilog supports all of these levels.
6.3 DESIGN STYLES:
Any hardware description language like Verilog can be design in two ways one is bottom-up
design and other one is top-down design.
Bottom-Up Design:
The traditional method of electronic design is bottom-up (designing from transistors and
moving to a higher level of gates and, finally, the system). But with the increase in design
complexity traditional bottom-up designs have to give way to new structural, hierarchical
design methods.
Top-Down Design:
For HDL representation it is convenient and efficient to adapt this design-style. A real topdown
design allows early testing, fabrication technology independence, a structured system design
and offers many other advantages. But it is very difficult to follow a pure top-down design.
Due to this fact most designs are mix of both the methods, implementing some key elements
of both design styles.
6.4 Features of Verilog HDL
• Verilog is case sensitive.

• Ability to mix different levels of abstract freely.
• One language for all aspects of design, testing, and verification.
30
• In Verilog, Keywords are defined in lower case.
• In Verilog, Most of the syntax is adopted from "C" language.
• Verilog can be used to model a digital circuit at Algorithm, RTL, Gate and Switch level.
• There is no concept of package in Verilog.
• It also supports advanced simulation features like TEXTIO, PLI, and UDPs.
6.5 VLSI DESIGN FLOW

The VLSI design cycle starts with a formal specification of a VLSI chip, follows a series of
steps, and eventually produces a packaged chip.
System Specification:
The first step of any design process is to lay down the specifications of the system. System
specification is a high level representation of the system. The factors to be considered in this
process include: performance, functionality, and physical dimensions like size of the chip.
The specification of a system is a compromise between market requirements, technology and
economical viability. The end results are specifications for the size, speed, power, and
functionality of the VLSI system.
Architectural Design
The basic architecture of the system is designed in this step. This includes, such decisions as
RISC (Reduced Instruction Set Computer) versus CISC (Complex Instruction Set Computer),
number of ALUs, Floating Point units, number and structure of pipelines, and size of caches
31
among others. The outcome of architectural design is a Micro-Architectural Specification
(MAS).
Behavioral or Functional Design:
In this step, main functional units of the system are identified. This also identifies the
interconnect requirements between the units. The area, power, and other parameters of each
unit are estimated.
Modules. The key idea is to specify behavior, in terms of input, output and timing of each unit,
without specifying its internal structure.
The outcome of functional design is usually a timing diagram or other relationships between
units.
Logic Design:
In this step the control flow, word widths, register allocation, arithmetic operations, and logic
operations of the design that represent the functional design are derived and tested.
This description is called Register Transfer Level (RTL) description. RTL is expressed in a
Hardware Description Language (HDL), such as VHDL or Verilog. This description can be
used in simulation and verification
Circuit Design:
The purpose of circuit design is to develop a circuit representation based on the logic design.
The Boolean expressions are converted into a circuit representation by taking into
consideration the speed and power requirements of the original design. Circuit Simulation is
used to verify the correctness and timing of each component
The circuit design is usually expressed in a detailed circuit diagram. This diagram shows the
circuit elements (cells, macros, gates, transistors) and interconnection between these
elements. This representation is also called a netlist. And each stage verification of logic is
done.
Physical design:
In this step the circuit representation (or netlist) is converted into a geometric representation.
As stated earlier, this geometric representation of a circuit is called a layout.
Layout is created by converting each logic component (cells, macros, gates, transistors) into a
geometric representation (specific shapes in multiple layers), which perform the intended logic
32
function of the corresponding component. Connections between different components are also
expressed as geometric patterns typically lines in multiple layers.
Layout verification:
Physical design can be completely or partially automated and layout can be generated directly
from netlist by Layout Synthesis tools. Layout synthesis tools, while fast, do have an area and
performance penalty, which limit their use to some designs. These are verified.
Fabrication and Testing:

Silicon crystals are grown and sliced to produce wafers. The wafer is fabricated and diced into
individual chips in a fabrication facility. Each chip is then packaged and tested to ensure that it
meets all the design specifications and that it functions properly.
6.6 MODULE:
A module is the basic building block in Verilog. It can be an element or a collection of low level
design blocks. Typically, elements are grouped into modules to provide common functionality
used in places of the design through its port interfaces, but hides the internal implementation.
Syntax: module<module name>

(<module_port_list>);
…..
<Module internals> //contents of the module
….
Endmodule
Instances
A module provides a template from where one can create objects. When a module is invoked
Verilog creates a unique object from the template, each having its own name, variables,
parameters and I/O interfaces. These are known as instances.
33
Ports:
• Ports allow communication between a module and its environment.
• All but the top-level modules in a hierarchy have ports.
• Ports can be associated by order or by name.
You declare ports to be input, output or inout. The port declaration syntax is:
Input [range_val:range_var] list_of_identifiers;
Output[range_val:range_var] list_of_identifiers; inout
[range_val:range_var] list_of_identifiers;
Identifiers
• Identifiers are user-defined words for variables, function names, module names, and
instance names. Identifiers can be composed of letters, digits, and the underscore
character.
• The first character of an identifier cannot be a number. Identifiers can be any length.
• Identifiers are case-sensitive, and all characters are significant.
An identifier that contains special characters, begins with numbers, or has the same name as a
keyword can be specified as an escaped identifier. An escaped identifier starts with the
backslash character(\) followed by a sequence of characters, followed by white space.
Keywords:
• Verilog uses keywords to interpret an input file.
• You cannot use these words as user variable names unless you use an escaped identifier.
• Keywords are reserved identifiers, which are used to define language constructs.
• Some of the keywords are always, case, assign, begin, case, end and end case etc.
34
•
Data Types:
Verilog Language has two primary data types:

• Nets - represents structural connections between components.
• Registers - represent variables used to store data.
Every signal has a data type associated with it. Data types are:
• Explicitly declared with a declaration in the Verilog code.
• Implicitly declared with no declaration but used to connect structural building blocks
in the code. Implicit declarations are always net type "wire" and only one bit wide.
Register Data Types

• Registers store the last value assigned to them until another assignment statement
changes their value.
• Registers represent data storage constructs.
• Register arrays are called memories.
• Register data types are used as variables in procedural blocks.
• A register data type is required if a signal is assigned a value within a procedural block
• Procedural blocks begin with keyword initial and always.
The data types that are used in register are register, integer, time and real.
6.7 MODELING CONCEPTS:
Abstraction Levels:
• Behavioral level
• Register-Transfer Level
• Gate Level
• Switch level
•
Behavioral or algorithmic Level
• This level describes a system by concurrent algorithms (Behavioral).
35
•
Each algorithm itself is sequential meaning that it consists of a set of instructions that
are executed one after the other.
• The blocks used in this level are ‘initial’, ‘always’ ,‘functions’ and ‘tasks’ blocks
• The intricacies of the system are not elaborated at this stage and only the functional
description of the individual blocks is prescribed.
• In this way the whole logic synthesis gets highly simplified and at the same time more
efficient.
Register-Transfer Level:
• Designs using the Register-Transfer Level specify the characteristics of a circuit by

operations and the transfer of data between the registers.
• An explicit clock is used. RTL design contains exact timing possibility, operations are
scheduled to occur at certain times.
• Modern definition of a RTL code is "Any code that is synthesizable is called RTL code".
Gate Level:
• Within the logic level the characteristics of a system are described by logical links and
their timing properties.
• All signals are discrete signals. They can only have definite logical values (`0', `1', `X',
`Z`). The usable operations are predefined logic primitives (AND, OR, NOT etc gates).
• It must be indicated here that using the gate level modeling may not be a good idea in
logic design.
• Gate level code is generated by tools like synthesis tools in the form of netlists which
are used for gate level simulation and for backend.
6.8 OPERATORS
Verilog provided many different operators types. Operators can be,
• Arithmetic Operators
• Relational Operators
• Bit-wise Operators
36
•
Logical Operators
• Reduction Operators
• Shift Operators
• Concatenation Operator
• Conditional Operator
Arithmetic Operators
• These perform arithmetic operations. The + and - can be used as either unary (-z) or
binary (x-y) operators.
• Binary: +, -, *, /, % (the modulus operator)
• Unary: +, - (This is used to specify the sign)
• Integer division truncates any fractional part
• The result of a modulus operation takes the sign of the first operand
• If any operand bit value is the unknown value x, then the entire result value is x
• Register data types are used as unsigned values (Negative numbers are stored in two's
complement form).
Relational Operators
Relational operators compare two operands and return a single bit 1or 0. These operators
synthesize into comparators. Wire and reg variables are positive Thus (-3’b001) = = 3’b111 and
(-3d001)>3d1 10, however for integers -1<>
• The result is a scalar value

• 0 if the relation is false (a is bigger than b)
37
•
1 if the relation is true ( a is smaller than b)
• x if any of the operands has unknown x bits (if a or b contains X)
Note: If any operand is x or z, then the result of that test is treated as false (0)
Bit-wise Operators
Bitwise operators perform a bit wise operation on two operands. This take each bit in one
operand and perform the operation with the corresponding bit in the other operand. If one
operand is shorter than the other, it will be extended on the left side with zeroes to match the
length of the longer operand
Computations include unknown bits, in the following way:
-> ~x = x
-> 0&x = 0
-> 1&x = x&x = x
-> 1|x = 1
-> 0|x = x|x = x
-> 0^x = 1^x = x^x = x
-> 0^~x = 1^~x = x^~x = x
38
When operands are of unequal bit length, the shorter operand is zero-filled in the most significant
bit positions.
Logical Operators
Logical operators return a single bit 1 or 0. They are the same as bit-wise operators only for
single bit operands. They can work on expressions, integers or groups of bits, and treat all
values that are nonzero as “1”. Logical operators are typically used in conditional (if ... else)
statements since they work with expressions.
Expressions connected by && and || are evaluated from left to right
Evaluation stops as soon as the result is known
The result is a scalar value:
• 0 if the relation is false

• 1 if the relation is true
• x if any of the operands has x (unknown) bits
Reduction Operators
Reduction operators operate on all the bits of an operand vector and return a single-bit value.
These are the unary (one argument) form of the bit-wise operators.
39
• Reduction operators are unary.
• They perform a bit-wise operation on a single operand to produce a single bit result.
• Reduction unary NAND and NOR operators operate as AND and OR respectively, but
with their outputs negated.
Shift Operators
Shift operators shift the first operand by the number of bits specified by the second operand.
Vacated positions are filled with zeros for both left and right shifts (There is no sign extension).
• The left operand is shifted by the number of bit positions given by the right operand. •
The vacated bit positions are filled with zeroes
Concatenation Operator
• The concatenation operator combines two or more operands to form a larger vector.
• Concatenations are expressed using the brace characters { and }, with commas separating
the expressions within.
• ->Example: + {a, b[3:0], c, 4'b1001} // if a and c are 8-bit numbers, the results has 24 bits
• Unsized constant numbers are not allowed in concatenations
40
Operator Precedence
Switch Level:
This is the lowest level of abstraction. A module can be implemented in terms of switches,
storage nodes and interconnection between them. However, as has been mentioned earlier, one
can mix and match all the levels of abstraction in a design. RTL is frequently used for Verilog
description that is a combination of behavioral and dataflow while being acceptable for
synthesis.
6.9 Xilinx Verilog HDL Tutorial

Getting started
Frist we need to download and install Xilinx and ModelSim. These tools both have free student
versions. Please accomplish Appendix B, C, and D in that order before continuing with this
tutorial. Additionally if you wish to purchase your own Spartan3 board, you can do so at
Digilent’s Website. Digilent offers academic pricing. Please note that you must download and
install Digilent Adept software. The software contains the drivers for the board that you need
and also provides the interface to program the board.
Introduction
Xilinx Tools is a suite of software tools used for the design of digital circuits implemented using
Xilinx Field Programmable Gate Array (FPGA) or Complex Programmable Logic Device (CPLD).
The design procedure consists of (a) design entry, (b) synthesis and implementation of the design,
(c) functional simulation and (d) testing and verification. Digital designs can be entered in various
41
ways using the above CAD tools: using a schematic entry tool, using a hardware description
language (HDL) – Verilog or VHDL or a combination of both. In this lab we will only use the
design flow that involves the use of Verilog HDL.
The CAD tools enable you to design combinational and sequential circuits starting with Verilog
HDL design specifications. The steps of this design procedure are listed below:
1. Create Verilog design input file(s) using template driven editor.
2. Compile and implement the Verilog design file(s).
3. Create the test-vectors and simulate the design (functional simulation) without using a PLD
(FPGA or CPLD).
4. Assign input/output pins to implement the design on a target device.
5. Download bitstream to an FPGA or CPLD device.
6. Test design on FPGA/CPLD device
A Verilog input file in the Xilinx software environment consists of the following segments:
Header: module name, list of input and output ports.
Declarations: input and output ports, registers and wires.
Logic Descriptions: equations, state machines and logic functions.
End: endmodule
All your designs for this lab must be specified in the above Verilog input format. Note that the
state diagram segment does not exist for combinational logic designs.
Programmable Logic Device: FPGA
In this lab digital designs will be implemented in the Basys2 board which has a Xilinx
Spartan3E –XC3S250E FPGA with CP132 package. This FPGA part belongs to the Spartan
family of FPGAs. These devices come in a variety of packages. We will be using devices that
are packaged in 132 pin package with the following part number: XC3S250E-CP132.
Creating a New Project
Creating Projects You can use the New Project wizard to easily create different types of projects in
the Vivado IDE. To open the New Project wizard, select File > New Project. This wizard enables
you to specify a project location and name and create the types of projects shown in below figure
42
New Project Wizard—Project Type Page
Project Name: Write the name of your new project which is user defined.
Project Location: The directory where you want to store the new project in the specified
location in one of your drive. In above window they are stored in location c drive which is
not correct, the location of software and code should not be same location and Clicking on
NEXT.
For each of the properties given below, click on the ‘value’ area and select from the list of values
that appear.
• Device Family: Family of the FPGA/CPLD used. In this laboratory we will be using the
Spartan3E FPGA’s.
• Device: The number of the actual device. For this lab you may enter XC3S250E (this can
be found on the attached prototyping board)
• Package: The type of package with the number of pins. The Spartan FPGA used in this lab
is packaged in CP132 package.
• Speed Grade: The Speed grade is “-4”.
• Synthesis Tool: XST [VHDL/Verilog]
43
• Simulator: The tool used to simulate and verify the functionality of the design. Then click
on NEXT to save the entries.
Opening Designs:
Use the Flow Navigator or Flow menu to select the following commands:
• Open Elaborated Design
• Open Synthesized Design
• Open Implemented Design
The Flow > Open Implemented Design command populates the Vivado IDE as shown in below
figure.
Implemented design
All project files such as schematics, netlists, Verilog files, VHDL files, etc., will be stored in a
subdirectory with the project name.
In order to open an existing project in Xilinx Tools, select File->Open Project to show the list of
projects on the machine. Choose the project you want and click OK.
If creating a new source file, click on the NEW SOURCE.
Creating a Verilog HDL input file for a combinational logic design:
44
In this lab we will enter a design using a structural or RTL description using the Verilog HDL.
You can create a Verilog HDL input file (.v file) using the HDL Editor available in the Xilinx
Vivado Tools (or any text editor).
In the previous window, click on the NEW SOURCE
(Note: “Add to project” option is selected by default. If you do not select it then you will have
to add the new source file to the project manually.)
Select Verilog Module and in the “File Name:” area, enter the name of the Verilog source file
you are going to create. Also make sure that the option Add to project is selected so that the
source need not be added to the project again. Then click on Next to accept the entries.
In the Port Name column, enter the names of all input and output pins and specify the
Direction accordingly. A Vector/Bus can be defined by entering appropriate bit numbers in the
MSB/LSB columns. Then click on Next> to get a window showing all the new source
information above window. If any changes are to be made, just click on <Back to go back and
make changes. If everything is acceptable, click on Finish > Next > Next > Finish to continue.
Once you click on Finish, the source file will be displayed in the sources window in the Project
Navigator. If a source has to be removed, just right click on the source file in the Sources in
Project window in the Project Navigator and select remove in that. Then select Project ->
Delete Implementation Data from the Project Navigator menu bar to remove any related files.
Editing the Verilog source file
The source file will now be displayed in the Project Navigator window (Figure 8). The source
file window can be used as a text editor to make any necessary changes to the source file. All
the input/output pins will be displayed. Save your Verilog program periodically by selecting
the File->Save from the menu. You can also edit Verilog programs in any text editor and add
them to the project directory using “Add Copy Source”.
Here in the above window we will write the Verilog programming code for specified design and
algorithm in the window.
After writing the programming code we will go for the synthesis report.
Configuring Project Settings
You can configure the Project Settings in the Settings dialog box to meet your design needs. These
settings include general settings, related to the top module definition and language options, as well
as simulation, elaboration, synthesis, implementation, bitstream, and IP settings.
45
Settings Dialog Box—Project Settings General Category
To open the Settings dialog box, use any of the following methods:
• In the Flow Navigator Project Manager section, click Settings.
• Select Tools > Settings.
• In the main toolbar, click the Settings toolbar button .
• In the Project Summary, click the Edit link next to Settings.
Synthesis and Implementation of the Design:

The design has to be synthesized and implemented before it can be checked for correctness, by
running functional simulation or downloaded onto the prototyping board. With the top-level
Verilog file opened (can be done by double-clicking that file) in the HDL editor window in the right
half of the Project Navigator, and the view of the project being in the Module view , the implement
46
design option can be seen in the process view. Design entry utilities and Generate Programming
File options can also be seen in the process view.
To synthesize the design, double click on the Synthesize Design option in the Processes window.
To implement the design, double click the Implement design option in the Processes window.
It will go through steps like Translate, Map and Place & Route. If any of these steps could
not be done or done with errors, it will place a X mark in front of that, otherwise a tick mark
will be placed after each of them to indicate the successful completion
After synthesis right click on synthesis and click view text report in order to generate the report
of our design.
6.10 XILINX VIVADO SIMULATION PROCEDURE
After completion of synthesis we will go simulation in order to verify the functionality of the
implemented design.
Click on Run Simulation and set the module that is need to Run
Next double click on Run Behavioral Simulation to check the errors. If no errors are found then
double click on simulate behavioral model to get the output waveforms.
After clicking on simulate behavioral model, the simulation widow will appear pass the input
values by making force constant and if it is clock by making force clock. Mention the
simulation period and run for certain time and results will appear as shown in following
window. Verify the results to the given input values.
Using the Schematic Window:
You can generate a Schematic window for any level of the logical or physical hierarchy. You
can select a logic element in an open window, such as a primitive or net in the Netlist window,
and use the Schematic command in the popup menu to create a Schematic window for the
selected object.
An elaborated design always opens with a Schematic window of the top-level of the design, as
shown in below figure.
47
Schematic window
Using the Project Summary
The Vivado IDE includes an interactive Project Summary (Figure 3-11) that updates
dynamically as design commands are run and the design progresses through the design flow. It
provides project and design information, such as the project part, board, and state of synthesis
and implementation.
It also provides links to detailed information, such as links to the Messages and Reports windows
as well as the Settings dialog box.
As synthesis and implementation complete, DRC violations, timing values, utilization
percentages, and power estimates are also populated. To open the Project Summary, do either
of the following:
• Select Window > Project Summary.

• Click the Project Summary toolbar button.
48
Project Summary
49
CHAPTER 7
RESULTS
Latch based clock gating

RTL schematic:
Technology schematic:
50
LATCH FREE CLOCK GATING
RTL schematic:
51
FLIPFLOP BASED CLOCK GATING
RTL schematic:
52
SYNTHESIS BASED CLOCK GATIMG
RTL schematic:
53
Simulation:-
COMPARISION TABLE:
CLOCK GATING POWER(watts) DELAY(ns) AREA(out of 404235)

TECHNIQUES
proposed extension proposed extension proposed extension
LATCH FREE CLOCK 10.932 6.388 29.166 24.174 951 375
LATCH BASED CLOCK 10.932 0.24 24.913 24.174 932 582
FLIPFLOP BASED 10.915 0.122 24.913 24.174 932 580

CLOCK
SYNTHESIS BASED 10.915 0.122 24.913 24.174 932 303
CLOCK
TABLE III: OUTPUTS OF GATING TECHNIQUES
54
CHAPTER 8
8.1 CONCLUSION
Power optimization in configurable ALU using blend of techniques presents a comprehensive

investigation into reducing power consumption in a configurable Arithmetic Logic Unit (ALU)
through the application of various techniques. The goal of the research is to address the
increasing demand for low-power designs in modern computer systems.The paper explores a
range of power optimization techniques, including gate-level optimization, circuit-level
optimization, and algorithmic optimization. These techniques are combined in a blended
approach to achieve significant power savings while maintaining the desired functionality of
the ALU.
The results of the study demonstrate the effectiveness of the proposed power optimization
techniques. By carefully analyzing and optimizing the ALU design at different levels, the
researchers were able to achieve substantial reductions in power consumption without
compromising the performance of the ALU.
8.2FUTURE SCOPE
The application of machine learning and AI techniques can enhance power optimization in
configurable ALUs. By training models on large datasets of workload patterns and power
consumption data, ALUs can learn to dynamically adjust their configurations and operations to
minimize power consumption while maintaining performance levels.
55
REFERENCES:
[1] B. Geetha, B. Padmavathi, and V. Perumal, “Design methodologies and circuit optimization
techniques for low power cmos VLSI design,” in 2017 IEEE International
Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), IEEE,
2017, pp. 1759–176.
[2] B. Padmavathi, B. Geetha, and K. Bhuvaneshwari, “Low power design techniques and
implementation strategies adopted in vlsi circuits,” in 2017 IEEE International Conference
on Power, Control, Signals and Instrumentation Engineering (ICPCSI), IEEE, 2017, pp.
1764–1767
[3] U. Kaur and R. Mehra, “Low power cmos counter using clock gated flip-flop,” Int. J. Eng.
Adv. Tech, vol. 2, pp. 796–8, 2013Pratibhadevi Tapashetti, Dr.Rajkumar B Kulkarni and
Dr.S S Patil, “ MAC Architectures Based on Modified Booth Algorithm”, International
Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering,
Vol. 5, Issue 12, pp. 2320 – 3765, December 2016.
[4] M. P. Dev, D. Baghel, B. Pandey, M. Pattanaik, and A. Shukla, “Clock gated low power
sequential circuit design,” in 2013 IEEE Conference on Information & Communication
Technologies, IEEE, 2013, pp. 440–444.
[5] R.UMA,Vidya Vijayan and M. Mohanapriya, “Area, Delay and Power Comparison of
Adder Topologies” International Journal of VLSI design & Communication Systems (VLSICS)
Vol.3, No.1, February 2012, in-press.
[6] G. Shrivastava and S. Singh, “Power optimization of sequential circuit based alu using
gated clock & pulse enable logic,” in 2014 International Conference on Computational
Intelligence and Communication Networks, IEEE, 2014, pp. 1006–1010.
[7] R. N. A. Shiny, B. Fahimunnisha, S. Akilandeswari, and S. J. Venula, “Integration of clock
gating and power gating in digital circuits,” in 2019 5th international conference on
Advanced Computing & Communication Systems (ICACCS), IEEE, 2019, pp. 704–70.
56

PRODOC121

Uploaded by

Copyright:

Available Formats

PRODOC121

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PRODOC121

Uploaded by

Copyright:

Available Formats

IMPLEMENTATION OF ALU USING POWER

Under the Esteemed Guidance of

Prof. P. RAJESH KUMAR, M.E, Ph.D. HEAD

Department of Electronics & Communication Engineering

ANDHRA UNIVERSITY COLLEGE OF ENGINEERING

PROJECT GUIDE HEAD OF THE DEPARTMENT

I hereby declare that the project work entitled “IMPLEMENTATION OF

I would like to express my gratitude to M. Nageswar Rao, Scientist – E, Department of Instrumentation,

Configurable ALUs (Arithmetic Logic Units) are essential components in modern

Fig1: Dimensions to optimize VLSI chip…………………………………….…..2

TABLE I: ALU OPERATIONS BASED ON SELECTION LINES ………….17

TABLE II: SUMMARY OF VARIOUS CLOCK GATING TECHNIQUES…….20

TABLE III: OUTPUTS OF GATING TECHNIQUES……………………………54

In the earlier days, the designers of VLSI were more interested on

Fig1: Dimensions to optimize VLSI chip.

1.2 ALU ARCHITECTURE

The Arithmetic Logic Unit (ALU) is a fundamental building

A digital system can be represented at different levels of

1.3.1 Arithmetic Operations:

1.3.2 Logic Operations:

AND: Performs a bitwise AND operation on two 8-bit binary numbers.

B. Geetha, B. Padmavathi, and V. Perumal, “Design methodologies and circuit

B. Padmavathi, B. Geetha, and K. Bhuvaneshwari, “Low power design techniques and

N. Khanna and D. Mishra, “Clock gated 16-bits ALU design&implementation on fpga,”

0001 Subtraction Arithmetic

0010 Multiplication Arithmetic

0011 Division Arithmetic

0100 AND Operation Logical

0101 OR Operation Logical

0110 NAND Operation Logical

0111 NOR Operation Logical

1000 XOR Operation Logical

1001 XNOR Operation Logical

1101 Logical right shift Operation Logical

1110 1’s compliment Logical

1111 2’s complement Logical

TABLE I: ALU OPERATIONS BASED ON SELECTION LINES

3.2 CLOCK GATING TECHNIQUES

Clock gating is a power optimization technique used in digital circuits to

3.2.1.Basic AND Gate Clock Gating:

Fig3: Latch free clock gating technique

• In latch-based clock gating, a level-sensitive latch (often a D latch) is used to gate

Fig.4: Latch based clock gating technique.

3.2.3. Flip flop based clock gating technique:

3.2.4 Synthesis based clock gating technique:

Fig.6: Synthesis based clock gating technique using negative latch

Clock gating is a crucial power-saving technique in modern digital design, especially in

Table for clock gating:-

TABLE II: SUMMARY OF VARIOUS CLOCK GATING TECHNIQUES.

4.2 Booth's algorithm:

Booth's algorithm is a multiplication algorithm used to multiply two binary

d. If necessary, perform arithmetic operations to adjust the result. For signed

4.3 Carry look ahead adder:

Fig.7: Carry look ahead adder.

Gi = Ai ⨁ Bi (Gi represents the carry generated by the ith stage)

Pi = Ai ⨁ Bi (Pi represents the carry propagated from the ith stage)

Cout = Gi + (Pi ⨁ Ci-1)

4.4 Vedic multiplier:

Fig.8: 8bit Vedic multiplier.

• Longer Battery Life: Power optimization is crucial in battery-powered devices