Fast Multiplication Algorithms
Fast Multiplication Algorithms
College of Engineering
An autonomous institution affiliated to VTU
Department of
Electronics &Communication
VII Semester
ARM Processor Assignments
Date of Submission:
Marks Awarded :
k1 = c · (a + b)
k2 = a · (d − c)
k3 = b · (c + d)
Real part = k1 − k3
Imaginary part = k1 + k2.
This algorithm uses only three multiplications, rather than four, and five
additions or subtractions rather than two. If a multiply is more expensive than
three adds or subtracts, as when calculating by hand, then there is a gain in
speed. On modern computers a multiply and an add can take about the same
time so there may be no speed gain. There is a trade-off in that there may be
some loss of precision when using floating point.
For fast Fourier transforms the complex multiplies involve constant 'twiddle'
factors and two of the adds can be precomputed. Only three multiplies and three
adds are required, and modern hardware can often overlap multiplies and adds.
Karatsuba multiplication
For systems that need to multiply numbers in the range of several thousand
digits, such as computer algebra systems and bignum libraries, long
multiplication is too slow. These systems may employ Karatsuba
multiplication, which was discovered in 1960 (published in 1962). The heart of
Karatsuba's method lies in the observation that two-digit multiplication can be
done with only three rather than the four multiplications classically required.
Suppose we want to multiply two 2-digit numbers: x1x2· y1y2:
Bigger numbers x1x2 can be split into two parts x1 and x2. Then the method
works analogously. To compute these three products of m-digit numbers, we
can employ the same trick again, effectively using recursion. Once the numbers
are computed, we need to add them together (step 5.), which takes about n
operations.
Later the Karatsuba method was called ‘divide and conquer’, the other names of
this method, used at the present, are ‘binary splitting’ and ‘dichotomy
principle’.
The appearance of the method ‘divide and conquer’ was the starting point of the
theory of fast multiplications. A number of authors (among them Toom, Cook
and Schönhage) continued to look for an algorithm of multiplication with the
complexity close to the optimal one, and 1971 saw the construction of the
Schönhage–Strassen algorithm, which maintained the best known (until 2007)
upper bound for M(n).
The Karatsuba ‘divide and conquer’ is the most fundamental and general fast
method. Hundreds of different algorithms are constructed on its basis. Among
these algorithms the most well known are the algorithms based on Fast Fourier
Transform (FFT) and Fast Matrix Multiplication.
Toom–Cook
Although using more and more parts can reduce the time spent on recursive
multiplications further, the overhead from additions and digit management also
grows. For this reason, the method of Fourier transforms is typically faster for
numbers with several thousand digits, and asymptotically faster for even larger
numbers.
The idea, due to Strassen (1968), is the following: We choose the largest integer
w that will not cause overflow during the process outlined below. Then we split
the two numbers into m groups of w bits
For many years, the fastest known method for truly massive numbers based on
this idea was described in 1971 by Schönhage and Strassen (Schönhage–
Strassen algorithm) and has a time complexity of Θ(n log(n) log(log(n))). In
2007 this was improved by Martin Fürer (Fürer's algorithm) to give a time
complexity of n log(n) 2Θ(log*(n)) using Fourier transforms over complex numbers.
Anindya De, Chandan Saha, Piyush Kurur and Ramprasad Saptharishi [6] gave a
similar algorithm using modular arithmetic in 2008 achieving the same running
time. It is important to note that these are purely theoretical results as the time
complexities are in the multitape Turing machine model which does not, for
example, allow random access to arbitrary memory locations in constant time.
As a result, they are not necessarily of great practical import.
Quarter square multipliers were first used to form an analog signal that was the
product of two analog input signals in analog computers. In this application, the
sum and difference of two input voltages are formed using operational
amplifiers. The square of each of these is approximated using piecewise linear
circuits. Finally the difference of the two squares is formed and scaled by a
factor of one fourth using yet another operational amplifier.
Below is a lookup table for applying Johnson's method on the digits, 0 through
9.
1 1 1 1
n 0 2 4 6 7 8 9 10 12 14 16 18
1 3 5 1 3 5 7
1 2 3 4 5 7
n2/4 0 0 1 2 4 6 9 16 25 36 49 64 81
2 0 0 2 6 2
If, for example, you wanted to multiply 9 by 3, you observe that the sum and
difference are 12 and 6 respectively. Looking both those values up on the table
yields 36 and 9, the difference of which is 27, which is the product of 9 and 3.
Booth Multiplication Algorithm
I. From the two numbers, pick the number with the smallest difference
between a series of consecutive numbers, and make it a multiplier.
i.e., 0010 -- From 0 to 0 no change, 0 to 1 one change, 1 to 0 another
change ,so there are two changes on this one
II. Let X = 1100 (multiplier)
Let Y = 0010 (multiplicand)
Take the 2’s complement of Y and call it –Y
-Y=1110
IV. Load 0 for X-1 value it should be the previous first least significant
bit of X
VI. Make four rows for each cycle; this is because we are multiplying
four bits numbers.
Load the value U V X X-1
st
1 cycle 0000 0000 1100 0
nd
2 cycle
rd
3 Cycle
Step 2: Booth Algorithm
I. 00 Shift only
11 Shift only.
01 Add Y to U, and shift
10 Subtract Y from U, and shift or add (-Y) to U
II. Take U & V together and shift arithmetic right shift which
preserves the sign bit of 2’s number. Thus a positive number
remains positive, and a negative number remains negative.
III. Shift X circular right shifts because this will prevent us from
using two registers for the X value.
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
Repeat the same steps until the four cycles are completed.
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
1110 0000 0011 0
1111 0000 1001 1
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
1110 0000 0011 0
1111 0000 1001 1
We have finished four cycles, so the answer is shown, in the last rows of
U and V
which is: 11111000 two
Note: By the fourth cycle, the two algorithms have the same values in the
Product register.