Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
143 views

Fast Multiplication Algorithms

The document discusses several fast multiplication algorithms: 1. Gauss's complex multiplication algorithm reduces the number of multiplications from four to three for complex numbers. 2. Karatsuba multiplication improves on long multiplication by splitting numbers into parts and performing three multiplications rather than four. 3. Toom-Cook is a generalization of Karatsuba that splits numbers into more parts to further reduce the number of multiplications needed. 4. Fourier transform methods and related algorithms like Schonhage-Strassen can multiply very large numbers in O(n log n) time by using Fourier transforms.

Uploaded by

Mahesh Simpy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views

Fast Multiplication Algorithms

The document discusses several fast multiplication algorithms: 1. Gauss's complex multiplication algorithm reduces the number of multiplications from four to three for complex numbers. 2. Karatsuba multiplication improves on long multiplication by splitting numbers into parts and performing three multiplications rather than four. 3. Toom-Cook is a generalization of Karatsuba that splits numbers into more parts to further reduce the number of multiplications needed. 4. Fourier transform methods and related algorithms like Schonhage-Strassen can multiply very large numbers in O(n log n) time by using Fourier transforms.

Uploaded by

Mahesh Simpy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

R. V.

College of Engineering
An autonomous institution affiliated to VTU

Department of
Electronics &Communication

VII Semester
ARM Processor Assignments

Report on Fast Multiplication Algorithms


Submitted by,
Name of the student: Mahesh kumar
USN : 1rv07ec047

Date of Submission:

Marks Awarded :

Staff Incharge: MGR


Different types of Multiplication Algorithms

Gauss's complex multiplication algorithm

Complex multiplication normally involves four multiplications. By 1805 Gauss


had discovered a way of reducing the number of multiplications to three.

The product (a + bi) · (c + di) can be calculated in the following way.

k1 = c · (a + b)
k2 = a · (d − c)
k3 = b · (c + d)
Real part = k1 − k3
Imaginary part = k1 + k2.

This algorithm uses only three multiplications, rather than four, and five
additions or subtractions rather than two. If a multiply is more expensive than
three adds or subtracts, as when calculating by hand, then there is a gain in
speed. On modern computers a multiply and an add can take about the same
time so there may be no speed gain. There is a trade-off in that there may be
some loss of precision when using floating point.

For fast Fourier transforms the complex multiplies involve constant 'twiddle'
factors and two of the adds can be precomputed. Only three multiplies and three
adds are required, and modern hardware can often overlap multiplies and adds.
Karatsuba multiplication

For systems that need to multiply numbers in the range of several thousand
digits, such as computer algebra systems and bignum libraries, long
multiplication is too slow. These systems may employ Karatsuba
multiplication, which was discovered in 1960 (published in 1962). The heart of
Karatsuba's method lies in the observation that two-digit multiplication can be
done with only three rather than the four multiplications classically required.
Suppose we want to multiply two 2-digit numbers: x1x2· y1y2:

1. compute x1 · y1, call the result A


2. compute x2 · y2, call the result B
3. compute (x1 + x2) · (y1 + y2), call the result C
4. compute C − A − B,call the result "K"; this number is equal to x1 · y2 + x2
· y1.
5. compute A · 100 + K · 10 + B

Bigger numbers x1x2 can be split into two parts x1 and x2. Then the method
works analogously. To compute these three products of m-digit numbers, we
can employ the same trick again, effectively using recursion. Once the numbers
are computed, we need to add them together (step 5.), which takes about n
operations.

Karatsuba multiplication has a time complexity of O(nlog23). The number log23 is


approximately 1.585, so this method is significantly faster than long
multiplication. Because of the overhead of recursion, Karatsuba's multiplication
is slower than long multiplication for small values of n; typical implementations
therefore switch to long multiplication if n is below some threshold.

Later the Karatsuba method was called ‘divide and conquer’, the other names of
this method, used at the present, are ‘binary splitting’ and ‘dichotomy
principle’.

The appearance of the method ‘divide and conquer’ was the starting point of the
theory of fast multiplications. A number of authors (among them Toom, Cook
and Schönhage) continued to look for an algorithm of multiplication with the
complexity close to the optimal one, and 1971 saw the construction of the
Schönhage–Strassen algorithm, which maintained the best known (until 2007)
upper bound for M(n).

The Karatsuba ‘divide and conquer’ is the most fundamental and general fast
method. Hundreds of different algorithms are constructed on its basis. Among
these algorithms the most well known are the algorithms based on Fast Fourier
Transform (FFT) and Fast Matrix Multiplication.

Toom–Cook

Another method of multiplication is called Toom–Cook or Toom-3. The Toom–


Cook method splits each number to be multiplied into multiple parts. The
Toom–Cook method is one of the generalizations of the Karatsuba method. A
three-way Toom–Cook can do a size-N3 multiplication for the cost of five size-
N multiplications, improvement by a factor of 9/5 compared to the Karatsuba
method's improvement by a factor of 4/3.

Although using more and more parts can reduce the time spent on recursive
multiplications further, the overhead from additions and digit management also
grows. For this reason, the method of Fourier transforms is typically faster for
numbers with several thousand digits, and asymptotically faster for even larger
numbers.

Fourier transform methods

The idea, due to Strassen (1968), is the following: We choose the largest integer
w that will not cause overflow during the process outlined below. Then we split
the two numbers into m groups of w bits

We can then say that

by setting bj = 0 and ai = 0 for j, i > m, k = i + j and {ck} as the convolution of


{ai} and {bj}. Using the convolution theorem ab can be computed by
1. Computing the fast Fourier transforms of {ai} and {bj},
2. Multiplying the two results entry by entry,
3. Computing the inverse Fourier transform and
4. Adding the part of ck that is greater than 2w to ck+1

For many years, the fastest known method for truly massive numbers based on
this idea was described in 1971 by Schönhage and Strassen (Schönhage–
Strassen algorithm) and has a time complexity of Θ(n log(n) log(log(n))). In
2007 this was improved by Martin Fürer (Fürer's algorithm) to give a time
complexity of n log(n) 2Θ(log*(n)) using Fourier transforms over complex numbers.
Anindya De, Chandan Saha, Piyush Kurur and Ramprasad Saptharishi [6] gave a
similar algorithm using modular arithmetic in 2008 achieving the same running
time. It is important to note that these are purely theoretical results as the time
complexities are in the multitape Turing machine model which does not, for
example, allow random access to arbitrary memory locations in constant time.
As a result, they are not necessarily of great practical import.

Applications of the Schönhage–Strassen algorithm include GIMPS.

Using number-theoretic transforms instead of discrete Fourier transforms avoids


rounding error problems by using modular arithmetic instead of complex
numbers.

Linear time multiplication

Knuth[7] describes computational models in which two n-bit numbers can be


multiplied in linear time. The most realistic of these requires that any memory
location can be accessed in constant time (the so-called RAM model). The
approach is to use the FFT based method described above, packing log n bits
into each coefficient of the polynomials and doing all computations with 6 log n
bits of accuracy. The time complexity is now O ( nM ) where M is the time
needed to multiply two log n - bit numbers. By precomputing a linear size
multiplication lookup table of all pairs of numbers of (log n)/2 bits, M is simply
the time needed to perform a constant number of table lookups. If one assumes
this takes constant time per table lookup as is true in the unit-cost word RAM
model, then the overall algorithm is linear time.

Quarter square multiplier


This is any device that multiplies two quantities employing the identity,

Quarter square multipliers were first used to form an analog signal that was the
product of two analog input signals in analog computers. In this application, the
sum and difference of two input voltages are formed using operational
amplifiers. The square of each of these is approximated using piecewise linear
circuits. Finally the difference of the two squares is formed and scaled by a
factor of one fourth using yet another operational amplifier.

In 1980, Everett L. Johnson proposed a method of using the quarter square


method in a digital multiplier.[8] To form the product of two 8-bit integers, for
example, the digital device forms the sum and difference, looks both quantities
up in a table of squares, takes the difference of the results, and divides by four
by shifting two bits to the right. The difficulty with this, though, is that the sum
of two 8-bit integers can span as many as 9 bits. Hence the table of squares
would have to be twice nine, which is 18 bits wide. Computer memories are
typically available in widths of 8 or 16 bits. An 18 bit wide table of squares
does not fit conveniently into such memories. Johnson proposed that, rather
than providing squares, the table should provide for the lookup of n2/4 given n,
discarding the remainder when n is odd. In this way, entries in such a table for n
from 0 to 510 (the possible range of the sum of two 8-bit integers) would never
be wider than 16 bits. Using a table in this form also removes the need for
dividing by 4 at the end. A simple algebraic proof shows that the discarded
remainder would have canceled when the final difference is taken, so no
accuracy is lost by discarding the remainders.

Below is a lookup table for applying Johnson's method on the digits, 0 through
9.

         1 1 1 1
n     0   2   4   6 7 8 9 10 12 14 16 18
1 3 5 1 3 5 7
1 2 3 4 5 7
n2/4 0 0 1 2 4 6 9 16 25 36 49 64 81
2 0 0 2 6 2

If, for example, you wanted to multiply 9 by 3, you observe that the sum and
difference are 12 and 6 respectively. Looking both those values up on the table
yields 36 and 9, the difference of which is 27, which is the product of 9 and 3.
Booth Multiplication Algorithm

Booth algorithm gives a procedure for multiplying binary integers in signed –


2’s complement representation.

The booth algorithm will be illustrated with the following example:

Example, 2 ten x (- 4) ten

0010 two * 1100 two

Step 1: Making the Booth table

I. From the two numbers, pick the number with the smallest difference
between a series of consecutive numbers, and make it a multiplier.
i.e., 0010 -- From 0 to 0 no change, 0 to 1 one change, 1 to 0 another
change ,so there are two changes on this one
II. Let X = 1100 (multiplier)
Let Y = 0010 (multiplicand)
Take the 2’s complement of Y and call it –Y
-Y=1110

III. Load the X value in the table.

IV. Load 0 for X-1 value it should be the previous first least significant
bit of X

V. Load 0 in U and V rows which will have the product of X and Y at


the end of operation.

VI. Make four rows for each cycle; this is because we are multiplying
four bits numbers.
Load the value U V X X-1
st
1 cycle 0000 0000 1100 0
nd
2 cycle
rd
3 Cycle
Step 2: Booth Algorithm

Booth algorithm requires examination of the multiplier bits, and


shifting of the partial product. Prior to the shifting, the multiplicand
may be added to partial product, subtracted from the partial product,
or left unchanged according to the following rules:
Look at the first least significant bits of the multiplier “X”, and the
previous least significant bits of the multiplier “X - 1”.

I. 00 Shift only
11 Shift only.
01 Add Y to U, and shift
10 Subtract Y from U, and shift or add (-Y) to U
II. Take U & V together and shift arithmetic right shift which
preserves the sign bit of 2’s number. Thus a positive number
remains positive, and a negative number remains negative.
III. Shift X circular right shifts because this will prevent us from
using two registers for the X value.

U V X X-1
0000 0000 1100 0
0000 0000 0110 0

Repeat the same steps until the four cycles are completed.

U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
1110 0000 0011 0
1111 0000 1001 1

U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
1110 0000 0011 0
1111 0000 1001 1

1111 1000 1100 1

We have finished four cycles, so the answer is shown, in the last rows of
U and V
which is: 11111000 two
Note: By the fourth cycle, the two algorithms have the same values in the
Product register.

You might also like