Major
Major
Major
INTRODUCTION
With the latest advancement of VLSI technology, digital signal processing plays a
pivotal role in many areas of electrical and electronics engineering. High speed
convolution and deconvolution is central to many applications of Digital Signal
Processing and Image Processing, Convolution and Deconvolution having extreme
importance in Digital signal processing. Convolution is having wide area of application
like designing the digital filter, correlation etc. However it is quite difficult for the new
candidate to perform convolution and deconvolution as convolution and deconvolution
method is so lengthy and time consuming, beginners often struggle with convolution and
deconvolution because the concept and computation requires a number of steps that are
tedious and slow to perform. Many methods are proposed for performing convolution,
one of a tough approach is a Graphical method, it is quite sophisticated and systematic
but, it is very lengthy and time consuming.
1
Now a days, time required in multiplication process is still the dominant factor in
determining the instruction cycle time of a DSP chip. Traditionally shift and add
algorithm is being used for designing. However this is not suitable for VLSI
implementation and also from delay point of view. Some of the important algorithms
proposed in literature for VLSI implementable fast multiplication are Booth multiplier,
array multiplier and Wallace tree multiplier. Although these multiplication techniques
have been effective over conventional “shift and add” technique but their disadvantage of
time consumption has not been completely removed.
2
CHAPTER-2
VEDIC MATHEMATICS
The “Vedic Mathematics” is called so because of its origin from Vedas. To be
more specific, it has originated from “Atharva Vedas” the fourth Veda. “Atharva Veda”
deals with the branches like Engineering, Mathematics, sculpture, Medicine, and all other
sciences with which we are today aware of. The Sanskrit word Veda is derived from the
root Vid, meaning to know without limit. The word Veda covers all Veda-Sakhas known
to humanity. The Veda is a repository of all knowledge, fathomless, ever revealing as it is
delved deeper. Vedic mathematics, which simplifies arithmetic and algebraic operations,
has increasingly found acceptance the world over. Experts suggest that it could be a
handy tool for those who need to solve mathematical problems faster by the day. It is an
ancient technique, which simplifies multiplication, divisibility, complex numbers,
squaring, cubing, square roots and cube roots. Even recurring decimals and auxiliary
fractions can be handled by Vedic mathematics. Vedic Mathematics forms part of Jyotish
Shastra which is one of the six parts of Vedangas. The Jyotish Shastra or Astronomy is
made up of three parts called Skandas. A Skanda means the big branch of a tree shooting
out of the trunk. This subject was revived largely due to the efforts of Jagadguru Swami
Bharathi Krishna Tirtha Ji of Govardhan Peeth, Puri Jaganath (1884-1960). Having
researched the subject for years, even his efforts would have gone in vain but for the
enterprise of some disciples who took down notes during his last days. The basis of Vedic
mathematics, are the 16 sutras, which attribute a set of qualities to a number or a group of
numbers. The ancient Hindu scientists (Rishis) of Bharat in 16 Sutras (Phrases) and 120
words laid down simple steps for solving all mathematical problems in easy to follow 2 or
3 steps. Vedic Mental or one or two line methods can be used effectively for solving
divisions, reciprocals, factorisation, HCF, squares and square roots, cubes and cube roots,
algebraic equations, multiple simultaneous equations, quadratic equations, cubic
equations, biquadratic equations, higher degree equations, differential calculus, Partial
fractions, Integrations, Pythogorus Theoram, Apollonius Theoram, Analytical Conics and
so on. Vedic scholars did not use figures for big numbers in their numerical notation.
Instead, they preferred to use the Sanskrit alphabets, with each alphabet constituting a
number.
3
Several mantras, in fact, denote numbers; that includes the famed Gayatri Mantra,
which adds to 108 when decoded. How fast you can solve a problem is very important.
There is a race against time in all the competitions. Only those people having fast
calculation ability will be able to win the race. Time saved can be used to solve more
problems or used for difficult problems. Given the initial training in modern maths in
today’s schools, students will be able to comprehend the logic of Vedic mathematics after
they have reached the 8th standard. It will be of interest to everyone but more so to
younger students keen to make their mark in competitive entrance exams. India’s past
could well help them make it in today’s world. It is amazing how with the help of 16
Sutras and 13 sub-sutras, the Vedic seers were able to mentally calculate complex
mathematical problems.
Vedic Mathematics provides unique solution for this problem. The Urdhva-
Tiryagbhyam Sutra or vertically and crosswise algorithm for multiplication is discussed
and then used to develop digital multiplier architecture. For division, different division
algorithms are studied, by comparing drawbacks and advantages of each algorithm,
Paravartya Algorithm based on Vedic mathematics is modified according to need and
then used. Many engineering application areas use this Vedic Mathematics, especially in
signal processing. It describes 16 sutras and sub-sutras which cover all the branches of
mathematics such as arithmetic, algebra, geometry, trigonometry, statistics etc.
Implementation of these algorithms in processors has found out to be advantageous in
terms of reduction in power and area along with considerable increase in speed
requirements. These Sutras are given in Vedas centuries ago. To be specific, these sutras
are described in ATHARVA-VEDA. The sutras and sub-sutras were reintroduced to the
world by Swami Bharati Krishna Tirthaji Maharaja in the form of book Vedic
Mathematics.
5
The purpose of this analysis is to prove the feasibility of an FPGA that performs a
convolution on an acquired image in real time. The proposed implementation uses a
changed hierarchical design approach, which efficiently and accurately quickens
computation. The efficiency of the proposed convolution circuit is tested by embedding it
during a prime level FPGA. It additionally provides the required modularity,
expandability, and regularity to form different convolutions. This particular model has the
advantage of being fine tuned for signal processing; in this case it uses the mean squared
error measurement and objective measures of enhancement to achieve a more effective
signal processing model. They have coded their design using the Verilog hardware
description language and have synthesized it for FPGA products using ISE, Modelsim
and DC compiler for other processor usage.
Mohammed Hasmat Ali, Anil Kumar Sahani [2], presented the detailed study of
different multipliers based on Array Multiplier, Constant coefficient multiplication
(KCM) and multiplication based on vedic mathematics. Multiplication-based operations
such as Multiply and Accumulate(MAC) and inner product are among some of the
frequently used Computation Intensive Arithmetic Functions(CIAF) currently
implemented in many Digital Signal Processing (DSP) applications such as convolution,
Fast Fourier Transform(FFT), filtering and in ALU of microprocessors. Since
multiplication dominates the execution time of most DSP algorithms, so there is a need of
high speed multiplier. All these multipliers are coded in Verilog HDL (Hardware
Description Language) and simulated in ModelSimXEIII6.4b and synthesized in EDA
tool Xilinx_ISE12.As multiplication dominates the execution time of the most Digital
Signal Processing algorithms, so there is a need of high speed multiplier. The demand for
high speed processing has been increasing as a result of expanding computer and signal
processing applications. Higher throughput arithmetic operations are important to achieve
the desired performance in many real-time signal and image processing applications. One
of the key arithmetic operations in such applications is multiplication and the
development of fast multiplier circuit has been a subject of interest over decades. This
paper presented the study of different multipliers.
6
2.3 CONVENTIONAL VS. VEDIC MULTIPLICATION SCHEME
The Vedic mathematics is the ancient system of mathematics which has a unique
technique for fast mental calculations, based on 16 sutras [6]. This approach is completely
different from other multiplication algorithms and considered very close to the way a
human mind works. Any ordinary human can perform mental operations for very small
magnitude of numbers and hence Vedic mathematics provides techniques to solve
operations with large magnitude of numbers easily. It covers explanation of several
modern mathematical terms including arithmetic, trigonometry, plain, calculus, quadratic
equations, factorization and spherical geometry. In [5], author has presented a hierarchical
implementation of multiplication based on an array of array technique. This multiplier
architecture is based on generating all partial products and their sums. We have
considered this architecture name as HAOM.The author claim that HAOM his faster than
array multipliers and booth multipliers.
6 5 × 6 5 6 5 × 6 5=4 2 2 5
6 5
4 2 2 5 write’25’)
7
Approximately 39% of the processing time is saved. Similar results can be
obtained on other processors as well. The above results are extremely encouraging so far
as a digital signal processing (DSP) are concerned. Most of the important DSP
algorithms, such as convolution, discrete Fourier transforms, fast Fourier transforms,
digital filters, etc. incorporate multiply-accumulate computations [7]. Since the
multiplication time is generally far greater than the addition time, the total processing
time for any DSP algorithm primarily depends upon the number of multiplications.
8
CHAPTER-3
VLSI
Very-large-scale integration (VLSI) is the process of creating integrated circuits
by combining thousands of transistor-based circuits into a single chip. VLSI began in the
1970s when complex semiconductor and communication technologies were being
developed. The microprocessor is a VLSI device. The term is no longer as common as it
once was, as chips have increased in complexity into the hundreds of millions of
transistors.
3.1 OVERVIEW
The first semiconductor chips held one transistor each. Subsequent advances added
more and more transistors, and, as a consequence, more individual functions or systems
were integrated over time. The first integrated circuits held only a few devices, perhaps as
many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate
one or more logic gates on a single device. Now known retrospectively as "small-scale
integration" (SSI), improvements in technique led to devices with hundreds of logic gates,
known as large-scale integration (LSI), i.e. systems with at least a thousand logic gates.
Current technology has moved far past this mark and today's microprocessors have many
millions of gates and hundreds of millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But
the huge number of gates and transistors available on common devices has rendered such
fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no
longer in widespread use. Even VLSI is now somewhat quaint, given the common
assumption that all microprocessors are VLSI or better.
9
This microprocessor is unique in the fact that its 1.4 Billion transistor count,
capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's
transistor count is largely due to the 24MB L3 cache). Current designs, as opposed to the
earliest devices, use extensive design automation and automated logic synthesis to lay out
the transistors, enabling higher levels of complexity in the resulting logic functionality.
Certain high-performance logic blocks like the SRAM cell, however, are still designed by
hand to ensure the highest efficiency (sometimes by bending or breaking established
design rules to obtain the last bit of performance by trading stability).
Integrated circuit (IC) may contain millions of transistors, each a few µm in size
6) Early 80s VLSI 10,000s of transistors on a chip,(later 100,000s & now 1,000,000s)
10
3.4 ADVANTAGES OF ICS OVER DISCRETE COMPONENTS
Size: Integrated circuits are much smaller-both transistors and wires are
shrunk to micrometer sizes, compared to the millimeter or centimeter
scales of discrete components. Small size leads to advantages in speed and
power consumption, since smaller components have smaller parasitic
resistances, capacitances, and inductances.
Speed: Signals can be switched between logic 0 and logic 1 much quicker
within a chip than they can between chips. Communication within a chip
can occur hundreds of times faster than communication between chips on a
printed circuit board. The high speed of circuits on-chip is due to their
small size-smaller components and wires have smaller parasitic
capacitances to slow down the signal.
Power Consumption : Logic operations within a chip also take much less
power. Once again, lower power consumption is largely due to the small
size of circuits on the chip-smaller parasitic capacitances and resistances
require less power to drive them.
These advantages of integrated circuits translate into advantages at the system level:
11
a simpler cabinet with less shielding for electromagnetic shielding may be
feasible, too.
Reduced cost: Reducing the number of components, the power supply
requirements, cabinet costs, and so on, will inevitably reduce system cost. The
ripple effect of integration is such that the cost of a system built from custom ICs
can be less, even though the individual ICs cost more than the standard parts they
replace.
Electronic systems now perform a wide variety of tasks in daily life. Electronic
systems in some cases have replaced mechanisms that operated mechanically,
hydraulically, or by other means; electronics are usually smaller, more flexible, and easier
to service. In other cases electronic systems have created totally new applications.
Electronic systems perform a variety of tasks, some of them visible, some more hidden:
Personal entertainment systems such as portable MP3 players and DVD players
perform sophisticated algorithms with remarkably little energy.
Electronic systems in cars operate stereo systems and displays; they also control
fuel injection systems, adjust suspensions to varying terrain, and perform the
control functions required for anti-lock braking (ABS) systems.
Digital electronics compress and decompress video, even at high-definition data
rates, on-the-fly in consumer electronics.
Low-cost terminals for Web browsing still require sophisticated electronics,
despite their dedicated function.
12
Personal computers and workstations provide word-processing, financial analysis,
and games. Computers include both central processing units (CPUs) and special-
purpose hardware for disk access, faster screen display, etc.
Medical electronic systems measure bodily functions and perform complex
processing algorithms to warn about unusual conditions. The availability of these
complex systems, far from overwhelming consumers, only creates demand for
even more complex systems.
3.7 ASIC
As feature sizes have shrunk and design tools improved over the years, the
maximum complexity (and hence functionality) possible in an ASIC has grown from
5,000 gates to over 100 million. Modern ASICs often include entire 32-bit processors,
memory blocks including ROM, RAM, EEPROM, Flash and other large building blocks.
Such an ASIC is often termed a SoC (system-on-a-chip). Designers of digital ASICs use a
hardware description language (HDL), such as Verilog or VHDL, to describe the
functionality of ASICs.
13
CHAPTER-4
INTRODUCTION TO VERILOG
Verilog is a Hardware Description Language; a textual format for describing
electronic circuits and systems. Applied to electronic design, Verilog is intended to be
used for verification through simulation, for timing analysis, for test analysis (testability
analysis and fault grading) and for logic synthesis.
The Verilog HDL is an IEEE standard - number 1364. The first version of the
IEEE standard for Verilog was published in 1995. A revised version was published in
2001; this is the version used by most Verilog users. The IEEE Verilog standard
document is known as the Language Reference Manual, or LRM. This is the complete
authoritative definition of the Verilog HDL.
A further revision of the Verilog standard was published in 2005, though it has
little extra compared to the 2001 standard. SystemVerilog is a huge set of extensions to
Verilog, and was first published as an IEEE standard in 2005. See the appropriate
Knowhow section for more details about SystemVerilog.
IEEE Std 1364 also defines the Programming Language Interface, or PLI. This is
a collection of software routines which permit a bidirectional interface between Verilog
and other languages (usually C).
Note that VHDL is not an abbreviation for Verilog HDL - Verilog and VHDL are two
different HDLs. They have more similarities than differences, however.
Cadence Design Systems acquired Gateway in 1989, and with it the rights to the
language and the simulator. In 1990, Cadence put the language (but not the simulator)
into the public domain, with the intention that it should become a standard, non-
proprietary language.
14
The Verilog HDL is now maintained by a non profit making organisation,
Accellera, which was formed from the merger of Open Verilog International (OVI) and
VHDL International. OVI had the task of taking the language through the IEEE
standardisation procedure.
There is also a draft standard for analog and mixed-signal extensions to Verilog,
Verilog-AMS.
The diagram below summarises the high level design flow for an ASIC (ie. gate
array, standard cell) or FPGA. In a practical design situation, each step described in the
following sections may be split into several smaller steps, and parts of the design flow
will be iterated as errors are uncovered.
15
System Aanalysis and Partitioning
Code
Synthesize to Gates
As a first step, Verilog may be used to model and simulate aspects of the complete
system containing one or more ASICs or FPGAs. This may be a fully functional
description of the system allowing the specification to be validated prior to commencing
detailed design. Alternatively, this may be a partial description that abstracts certain
properties of the system, such as a performance model to detect system performance
bottle-necks.
16
Verilog is not ideally suited to system-level modelling. This is one motivation for
SystemVerilog, which enhances Verilog in this area.
Once the overall system architecture and partitioning is stable, the detailed design
of each ASIC or FPGA can commence. This starts by capturing the design in Verilog at
the register transfer level, and capturing a set of test cases in Verilog. These two tasks are
complementary, and are sometimes performed by different design teams in isolation to
ensure that the specification is correctly interpreted. The RTL Verilog should be
synthesizable if automatic logic synthesis is to be used. Test case generation is a major
task that requires a disciplined approach and much engineering ingenuity: the quality of
the final ASIC or FPGA depends on the coverage of these test cases.
For today's large, complex designs, verification can be a real bottleneck. This
provides another motivation for SystemVerilog - it has features for expediting testbench
development. See the SystemVerilog section of Knowhow for more details.
The RTL Verilog is then simulated to validate the functionality against the
specification. RTL simulation is usually one or two orders of magnitude faster than gate
level simulation, and experience has shown that this speed-up is best exploited by doing
more simulation, not spending less time on simulation.
Although some exploratory synthesis will be done early on in the design process,
to provide accurate speed and area data to aid in the evaluation of architectural decisions
and to check the engineer's understanding of how the Verilog will be synthesized, the
main synthesis production run is deferred until functional simulation is complete. It is
pointless to invest a lot of time and effort in synthesis until the functionality of the design
is validated.
17
4.7 LEVELS OF ABSTRACTION
Verilog descriptions can span multiple levels of abstraction i.e. levels of detail,
and can be used for different purposes at various stages in the design process
System analysis
Algorithmic
And partitioing
Synthesis
Gate level
Gate level verification
Switch level
At the highest level, Verilog contains stochastical functions (queues and random
probability distributions) to support performance modelling.
Verilog supports Register Transfer Level descriptions, which are used for the
detailed design of digital circuits. Synthesis tools transform RTL descriptions to gate
level.
18
Verilog supports gate and switch level descriptions, used for the verification of
digital designs, including gate and switch level logic simulation, static and dynamic
timing analysis, testability analysis and fault grading.
19
CHAPTER-5
Y(n)=x(n)*h(n)
∞
Y[n]=∑ x ( k ) h(n−k )
−∞
The impulse response goes by a different name in some applications. If the system
being considered is a filter, the impulse response is called the filter kernel, the
convolution kernel, or simply, the kernel. In image processing, the impulse response is
called the point spread function. While these terms are used in slightly different ways,
they all mean the same thing, the signal produced by a system when the input is a delta
function.
20
21
Fig 5.1 The sequences f(n) and g(n), shown in (a), are graphically convolved
5.2 DECONVOLUTION
If the impulse response and the output of a system are known, then the procedure
to obtain the unknown input is referred to as deconvolution. The concept of
deconvolution is also widely used in the techniques of signal processing and image
processing. In general, the object of deconvolution is to find the solution of a convolution
equation of the form:
x*h = y
Usually, y is some recorded signal, and x is some signal that wish to recover, but
has been convolved with some other signal h before get recorded. The function h might
represent the transfer function of an instrument or a driving force that was applied to a
physical system. If one Know h or at least form of h, then one can perform deterministic
deconvolution. If the two sequences x(n) and h (n) are causal, then the convolution sum is
22
n
y[n]=∑ x ( k ) h(n−k )
k=0
Where,
y (0)
x(0) =
h(0)
(1 0 1 1×1 1 0 1) = 1 0 0 0 1 1 1 1
1 0 1 1
1 1 0 1 ×
1 0 1 1
1 0 0 0 1 1 1 1
For the above multiplication, an array of sixteen AND gates is required to form
the various product terms and an Adder array is required to calculate the sums involving
23
the various product terms and carry combinations in order to get the final Product bits.
The Hardware requirement for an m x n bit array multiplier is given as:-
(m x n) AND gates,
Here from the above example it is inferred that partial products are generated
sequentially, which reduces the speed of the multiplier. However the structure of the
multiplier is regular. Also, in multiplier worst case delay would be (2n+1) td. Multiplier
gives more power consumption as well as optimum number of components required, but
delay for this multiplier is larger. It also requires larger number of gates because of which
area is also increased; due to this multiplier is less economical.
Primary requirement of any application to work fast is that increase the speed of
their basic building block. Multiplier and Divider is the heart of convolution and
deconvolution respectively as shown in above fig. It is most important but, slowest unit of
the system and consumes much time in the system. Many methods are invented to
improve the speed of the Multiplier and Divider, amongst all Vedic Multiplier and
Divider is under focus because, of faster working and low power consumption. In this
project the speed of Convolution and Deconvolution module is improved using Vedic
multiplier and Divider. It consists of multiplier based on vedic sutra i.e. URDHVA
Tiryagbhyam that are embedded into convolution of two finite sequence and
divider based on Vedic sutra i.e. Paravartya Sutra that are embedded in deconvolution
process to recover the original data.
24
Input Sequence Impulse response
Final sequence
Y[n]=x[n]*h[n]
Deconvolution of
Division y(n)
N-bit divider
Architecture
N-bit dividend
Final output as
x(n)
25 X(n)=y(n)h(n)
Fig 5.3 System Block Diagram
26
the number of bits increases, gate delay and area increases very slowly as compared to
other multipliers. Therefore it is time, space and power efficient. The main advantage of
the Vedic Multiplication algorithm (Urdhva Tiryagbhyam Sutra) stems from the fact that
it can be easily implemented in FPGA due to its simplicity and regularity.
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
1 1 0 1 1 1 0 1 1 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0
27
Fig.5.5 Multiplication of two 4 bit numbers using Urdhva Tiryagbhyam method
A1=a0*b0
A2=a0*b1+a1*b0+prevcarry
A3=a0*b2+a1*b1+a2*b0+prevcarry
A4=a0*b3+a1*b2+a2*b1+a3*b0+prevcarry
A5=a1*b3+a2*b2+a3*b1+prevcarry
A6=a2*b3+a3*b2 + prevcarry
A7=a3*b3+ prevcarry
The use of Vedic Mathematics lies in the fact that it reduces the typical
calculations in conventional mathematics to very simple ones. Urdhva Tiryagbhyam Sutra
is a general multiplication formula applicable to all cases of multiplication Because of
parallelism in generation of partial products and their summation obtained, speed is
improved. In this algorithm the small block can be wisely utilized for designing bigger
NxN multiplier. For higher no. of bits in input, little modification is required. Divide the
no. of bit in the inputs equally in two parts. Let’s analyse 4x4 multiplications, say
A3A2A1A0 and B3B2B1B0. Following are the output line for the multiplication result
S7S6S5S4S3S2S1S0. Let’s divide A and B into two parts, say A3A2&A1A0 for A and
B3B2&B1B0 for B. Using the fundamental of Vedic multiplication, taking two bit at a
time and using 2 bit multiplier block, we can have the following structure for
multiplication.
28
Fig 5.6 Block diagram presentation for 4x4 multiplications
Each block as shown above is 2x2 multiplier. First 2x2 multiplier inputs are A1A0
and B1B0. The last block is 2x2 multiplier with inputs A3A2 and B3B2. The middle one
shows two, 2x2 multiplier with inputs A3A2 and B1B0 and A1A0 and B3B2. So the final
result of multiplication, which is of 8 bit, S7S6S5S4S3S2S1S0, can be interpreted as
given below.
Assuming the output of each multiplication is as given above. For the final result,
add the middle product term along with the term shown below.
29
Result of addition of the middle terms by using two, 4 bit full adders will forms
output line from S5S4S3S2. One of the full adder will be used to add (S23S22S21S20)
and (S13S12S11S10) and then the second full adder is required to add the result of 1st
full adder with (S31S30S03S02). The respective sum bit of the 2nd full adder will be
S5S4S3S2. Now the carry generated during 1st full adder operation and that during 2nd
full adder operation should be added using half adder so that the final carry and sum to be
added with next stage i.e. with S33S32 to get S7S6. The same can be extended for input
bits 8, 16, 32.
The hardware architecture of 2X2, 4x4 and 8x8 bit Vedic multiplier module are
displayed in the below sections. Here, “Urdhva-Tiryagbhyam” (Vertically and Crosswise)
sutra is used to propose such architecture for the multiplication of two binary numbers.
The beauty of Vedic multiplier is that here partial product generation and additions are
done concurrently. Hence, it is well adapted to parallel processing. The feature makes it
more attractive for binary multiplications. This in turn reduces delay, which is the
primary motivation behind this work.
The 2X2 Vedic multiplier module is implemented using four input AND gates &
two half-adders which is displayed in its block diagram in Fig. 3. It is found that the
hardware architecture of 2x2 bit Vedic multiplier is same as the hardware architecture of
2x2 bit conventional Array Multiplier [2]. Hence it is concluded that multiplication of 2
bit binary numbers by Vedic method does not made significant effect in improvement of
the multiplier‟s efficiency. Very precisely we can state that the total delay is only 2-half
adder delays, after final bit products are generated, which is very similar to Array
multiplier. So we switch over to the implementation of 4x4 bit Vedic multiplier which
uses the 2x2 bit multiplier as a basic building block. The same method can be extended
for input bits 4 & 8. But for higher no. of bits in input, little modification is required.
30
Fig 5.7 Block Diagram of 2x2 bit Vedic Multiplier
31
CHAPTER- 6
EXISTING METHOD
6.1 WALLACE TREE MULTIPLIER
Multipliers have gained the significant importance with the introduction of the
digital computers. Multipliers are most often used in digital signal processing applications
and microprocessors designs. In contrast to process of addition and subtraction,
multipliers consume more time and more hardware resources. With the recent advances in
technology, a number of multiplication techniques have been implemented for fulfilling
the requirement of producing high speed, low power consumption, less area or a
combination of them in one multiplier. Speed and area are the two major constraints
which conflict each other. Therefore, it is the designer’s task to decide proper balance in
selecting an appropriate multiplication technique as per requirements. Parallel multipliers
are the high speed multipliers. Therefore, the enhanced speed of the multiplication
operation is achieved using various schemes and Wallace tree is one of them [1].
A fast process for multiplication of two numbers was developed by Wallace [7].
Using this method, a three step process is used to multiply two numbers; the bit products
are formed, the bit product matrix is reduced to a two row matrix where sum of the row
equals the sum of bit products, and the two resulting rows are summed with a fast adder
to produce a final product. In the Wallace Tree method, three bit signals are passed to a
one bit full adder (“3W”) which is called a three input Wallace Tree circuit, and the
output signal (sum signal) is supplied to the next stage full adder of the same bit, and the
carry output signal thereof is passed to the next stage full adder of the same no of bit, and
the carry output signal thereof is supplied to the next stage of the full adder located at a
one bit higher position[5].
32
Fig 6.1 Logic used in 4 bit Wallace Tree Multiplier
33
Fig 6.3 RTL Schematic of4 bit Wallace Tree Multiplier
34
In the Wallace Tree method, the circuit layout is not easy although the speed of
the operation is high since the circuit is quite irregular [3]. The delay generated in wallace
tree multiplier can be further reduced by using modified tree structures called
compressors.
35
CHAPTER- 7
PROPOSED METHOD
7.1 PROPOSED CONVOLUTION
The linear convolution of x(n) and h(n) is y(n) = f(n) * g(n). This can be solved by
several methods, resulting in the sequence y(n). In this approach for calculating the
convolution sum is set up like multiplication (except carries are not performed out of a
column).where the convolution of x(n) and h(n) is performed as shown in fig. 5.5. To get
convolution of two sequences, where each sequence consist of 4 samples, sixteen partial
products are calculated and afterwards they are added to get convolution sequence y[n].
In this paper, Partial products are calculated by using vedic multiplier based on Urdhva
Tiryagbhyam algorithm. Here to minimize hardware, width of each input sample is
restricted to 4 bit. Hence maximum possible input sample value would be (1111)2 or
(15)10 or (F) h. Multiplier required is 4×4 bit. Each multiplier gives 8 bit long partial
product. Convolution outputs y[6] and y[0] are direct Partial products. While remaining
obtained after addition of intermediate partial products. Let two discrete length sequences
are x[n] and h[n].Where x[n] = {x3 x2 x1 x0} and h[n] = {h3 h2 h1 h0} are convolved.
As each sample is four bit long ,each partial product is eight bit long e.g. x0h0, x3h0,
x3h3 all are eight bit long. y[n] = x[n] * h[n], in a way as mentioned above. Procedure is
rearranged as shown
36
In propose system to generate sixteen partial products, sixteen vedic multipliers
are used and to perform further operations of addition, all the outputs are latched
X0 h0 x1 h0 x2 h0 x3 h3
- - - - - - - - - - -- - - - - -
- -------------------
Combinational Logic
Y6 Y5 Y4 Y3 Y2 Y1 Y0
Conventional output
37
4 bit long samples are applied to 4X4 bit vedic multipliers (V.M.).Output of each
Vedic multiplier is 8 bit long partial product. Vedic multiplier uses Urdhva Tiryagbhyam
algorithm for multiplication. In parallel processing, to generate sixteen partial products,
sixteen Vedic multipliers are used to boost speed. To perform further operation of
addition, all outputs are latched. And corresponding output Y0, Y1, Y2, Y3, Y4, Y5 and
Y6 are produced. Maximum possible length of Y0 and Y6 is 8 bit, while of Y1 toY5 is 9
bit. The design is built in VHDL and implemented on an FPGA.
Division is most complex and very time consuming if it is done straight forwardly,
because we need to compare the remainder with the divisor after every subtraction.
Basically division algorithm is classified as multiplicative and subtractive approaches.
Multiplicative division algorithms do not compute the quotient directly, but use
successive approximations to converge to the quotient. Normally, such algorithms only
yield a quotient, but with an additional step the final remainder can be computed, if
needed. Consider the following example, Assuming A=(11100110) and B=(110)
Division is always considered to be bulky and one of the most difficult operations
in arithmetic and hence all the implementations of division algorithms in VLSI
architecture have higher orders of time and space complexities. Vedic Mathematics on the
other hand offers a new holistic approach to mathematics.
38
7.3 VEDIC DIVIDER
The word “Vedic” is derived from the word “Veda” which means the store-house
of all knowledge. Vedic mathematics is mainly based on sixteen sutras & from that
paravartya sutra is for division. Sanskrit term PARAYARTYA means all Transpose and
apply.
In this project a systematic method is used for division which based on Paravartya
Sutra. Paravartya Sutra help to minimize computation and maintain accuracy even as the
number of iteration is reduced. This complemented digit is initially multiplied with the
most significant digit of the dividend and this multiplication result is added with columns
of dividend.
The result of addition is again multiplied with complemented digits of Divisor and
added with the remaining column of the dividend, followed successive multiplication and
addition of consecutive column. The summation of all columns results forms quotient and
remainder. Implementation of the algorithm is illustrated using an example. Assume the
dividend is 1111 and divisor is 101. The division of this two numbers using paravartya
sutra is
Divisor Dividend
1 0 -1 1 1 1 1
-0 -1 -0 -1
-0 -1
1 1 0 0
Quotient=11 Remainder-00
39
Divisor Dividend
1 0 1 1 1 1 1
0 1 0 1
1 1
1 1 0 0
Quotient= 11 Remainder=00
The linear deconvolution of two finite length sequences can be solved by several
method, in this approach for calculating the deconvolution ,deconvolution operation is set
up like long-hand division and polynomial division ,just as the propose convolution
method is similar to multiplication. In this approach division operation is implemented by
using Paravartya algorithm based on vedic mathematics while to obtain partial products
vedic multiplier is used.
To illustrate the method Consider the example 2, let Y(n) be the convolved
sequence equal to (8,38,77,80,49,18,3) and h(n) be the finite length sequence equal to
(2,7,9,3).
40
4 5 3 1
2 7 9 3 8 38 77 80 49 18 3
8 28 36 12
0 10 41 68 49 18 3
10 35 45 15
0 0 6 23 34 18 3
6 21 27 9
0 0 0 2 7 9 3
2 7 9 3
0 0 0 0 0 0
41
CHAPTER-8
APPLICATIONS
8.1 ADVANTAGES
8.2 DISADVANTAGES
42
CHAPTER-9
The simulated results of convolution and circular convolution are shown below:
43
Fig. 9.2 Simulation results of Circular Convolution using Vedic Mathematics
The Vedic divider is simulated on Modelsim simulator. The figure8.3 shows the
simulation result of Vedic divider. x and y are the inputs of 4 bit each and quot gives
division result based on paravartya sutra.
44
The simulated result of Vedic divider is shown below
TABLE 2. Execution times for conventional method, ola method using vedic maths and
proposed method.
45
CONCLUSION
This propose system provides a method for calculating the linear convolution and
deconvolution with the help of vedic algorithms that is easy to learn and perform.It
presents faster implementation of linear convolution and deconvolution. The execution
time and area required for propose convolution and deconvolution using vedic
multiplication and division algorithm respectively compare with that of conventional
convolution and deconvolution with the simple multiplication and division is less from
the simulated result. The project, presents speedy implementation of linear convolution
and deconvolution. This particular model has the advantage of being fine tuned for any
signal processing application.
46
REFERENCES
[1] J. G. Proakis and D. G. Manolakis, “Digital Signal Processing: Principles, Algorithm,
and Applications,” 2nd Edition. New York Macmillan,1992.
[2] Pierre, John W. “A novel method for calculating the convolution sum of two finite
length sequences.” Education, IEEE Transactions on 39.1 (1996).
[4] Thapliyal, Himanshu, and M. B. Srinivas. ”High Speed Efficient NxN Bit Parallel H
Mathematics.” Enformatika Trans 2 (2004): 225-228.
[5] Senapati, Ratiranjan, Bandan Kumar Bhoi, and Manoranjan Pradhan ”Novel binary
divider architecture for high speed VLSI applications.” Information & Communication
Technologies (ICT), 2013 IEEE Conference on. IEEE, 2013.
47