Algorithms
Algorithms
(Version 0.0.1)
Chapter 1 – Sorting Based Problems
1.1 Inversions
Let’s start by defining the problem of inversions. In an array, two elements form an
inversion if a[i] > a[j] and i < j. In simple words, a bigger element appears before
smaller one. In the following list: [ 1, 3, 5, 2, 4, 6 ], there are 3 inversions: (3, 2), (5, 2),
and (5, 4).
Inversion Count for an array indicates – how far (or close) the array is from being
sorted. If array is already sorted, then inversion count is 0. If array is sorted in
descending order, then inversion count is the maximum.
Inversion count is usually used by social media networks to find the people sharing
common preferences. For example, if I wanted to find who in a group I most share the
same hobbies with, I could have ranked everyone’s hobbies, map them to values based
on my hobbies to establish an order, and run them through an inversion counting
algorithm to see who’s hobbies most match mine. We would probably have to do a little
extra clean-up work to get matching sets of data but after that it could look like this:
I have 4 inversions with person A and 2 inversions with person B. So, it looks like I
should hang out with person B.
1
Similarly, music sites try to match your song preferences with others. The more you
listen a song, that song gets a higher rank in your list. Music site consults database to
find people with similar tastes. And on the basis of what they listen, you get music
recommendations.
The naïve algorithm is – traverse through the array from start to end; for every
element find the count of elements smaller than the current in front of it; and finally
sum up the count of inversion for every index. This is an 𝑂(𝑛 ) time complex approach.
The efficient algorithm is – to use modified merge sort. Let’s see what modification
we need to make. Like merge sort, we first divide the array into almost two halves.
This is the divide step.
We will pass these halved arrays to recursive function. Now assume that the recursive
function gives you back the answer of inversion count for these halved arrays. This is
the conquer step.
In the final step, we need to merge these halved arrays. How do we combine these
arrays?
Assume each half is sorted. Now count the number of inversions of 𝑎 and 𝑎 , where 𝑎
and 𝑎 belong to different halves. The merge step is similar to traditional merge of
merge sort. Only additional thing you have to do is – keep a count in one half for every
element, that how many elements bigger than that element are present in another half.
Since the halves are sorted, this could be done in 𝑂(𝑛) time.
2
So, total number of inversions = 5 + 8 + 9 = 22.
This is an 𝑂(𝑛 ∗ log 𝑛) time complex approach. However, it takes 𝑂(𝑛) extra space.
3
Problem
Q. 1 Given an array A, for every element, find the sum of all the previous elements which are
smaller than the current element. Print the total sum of all such individual sums.
For eg. A = { 1, 5, 3, 6, 4 }.
For i = 0: sum = 0
For i = 1: sum = 1 (1 is smaller than 5)
For i = 2: sum = 1 (only 1 is smaller than 3)
For i = 3: sum = 9 (1, 5, 3 all are smaller than 6)
For i = 4: sum = 4 (1, 3 are smaller than 4)
Total sum = 0 + 1 + 1 + 9 + 4 = 15
Output = 15.
Solution: Here we are not interested in inversions, instead we are interested in the pairs which
are in order. So in the merge() of inversion count code, instead of writing ‘counting statement’
in else block, we will write a ‘summing statement’ in if block.
long merge (int A[], int left, int mid, int right) {
int i = left, j = mid, k = 0;
int temp[right-left+1]; long sum = 0;
4
1.2 Partition method of Quicksort
Partition method of Quicksort can solve many re-arrangement types of problems which
may not be obvious at first look. Let’s see few of them.
The naïve way will be to sort the elements in 𝑂(𝑛 ∗ log (𝑛)) quicksort way and then
swap alternate negative numbers with positive numbers. Here, by sorting the elements,
we are overdoing. We just need to arrange positive and negative elements alternatively.
We can achieve this in 𝑂(𝑛) complexity.
The efficient way is to use Quicksort’s partition algorithm by considering pivot point
as 0. After partition process, all the negative numbers will come at the beginning of
the array and all the positive elements will be pushed to the end of the array. The
negative elements and positive elements will not be in sorted order, but are just
separated. We then swap the positive and negative elements to get the desired result.
One famous follow up question is – look at the output carefully. You will see the order
of negative (as well as positive) numbers has changed in the output. The negative
numbers in original array were in order: -5, -2, -8. But in output, order is -2, -5, -8.
The order has changed because quicksort is not a stable sorting algorithm.
So, the follow up question is – Can you maintain the order of the appearance of the
elements and still get the alternate positive-negative pattern?
5
The above problem becomes very easy if extra 𝑂(𝑛) space is allowed. You can just
keep negative elements in order in some separate array and then replace the negative
elements in the answer using this array. However, the problem becomes interesting
(and it is not possible to solve it using Quicksort’s partition method) if 𝑂(1) space
constraint is kept.
This is the simple idea which solves this problem. Find the element which is out of
place. In our example, we want alternate positive and negative elements. So, you can
imagine, positive elements occupy even indices of array and negative elements occupy
odd indices of array. So, we call an element out of place, if it is negative and at even
index or if it is positive and at odd index. Once we find out of place element, we then
find the next out of place element which has opposite sign, i.e. if the current out of
place element is +ve, we will find next -ve out of place element. Once we got such next
element, we rotate this subarray to right by 1.
1. -5 5 { 5, -5, -2, 2, 4, 7, 1, 8, 0, -8 }
2. -2 2 { 5, -5, 2, -2, 4, 7, 1, 8, 0, -8 }
3. 7 -8 { 5, -5, 2, -2, 4, -8, 7, 1, 8, 0 }
Since we are using right rotations to bring +ve and -ve elements in alternate order,
the order of appearance of +ve as well as -ve elements remains unchanged. Although
the follow up discussion was not related to quicksort’s partition method, but you should
know how to achieve rearrangements by maintaining the order of elements.
6
// condition for element to be out of place
if (((arr[i] >= 0) && (i&1)) || ((arr[i] < 0) && !(i&1)))
out_of_place_index = i;
}
if (out_of_place_index >= 0) {
if (((arr[i] >= 0) && (arr[out_of_place_index] < 0)) ||
((arr[i] < 0) && (arr[out_of_place_index] >= 0))) {
rightrotate(arr, n, out_of_place_index, i);
// if the rotated elements are adjacent
if (i - out_of_place_index < 2)
out_of_place_index = -1;
// else the new out of place entry is two steps ahead
else
out_of_place_index += 2;
}
}
}
}
For A = { -5, -2, 5, 2, 4, 7, 1, 8, 0, -8 }, above logic will produce the output as:
5 -5 2 -2 4 -8 7 1 8 0
Task for you: Above implementations are for rearranging elements in alternate
positive and negative manner. Edit above code (or write on your own) so that elements
are rearranged in alternate negative and positive manner, i.e. this time, your first
element should be negative.
Let’s now continue our discussion on quicksort’s partition method. We saw rearranging
alternate positive and negative elements.
How about rearranging alternate odd and even elements? If order doesn’t matter, we
will use quicksort’s partition method.
7
By now, you must have got the idea of how to rearrange alternately – with and without
maintenance of the order of elements. Now, let’s see one slightly different and important
problem – pushing all zeros to the end.
The idea is – maintain the ‘count’ of non-zero elements. So, at any moment, ‘count’
represents the index before which there are non-zero elements and element at index
‘count’ is zero. Traverse the array. When you see a non-zero element, you swap it with
‘count’ and increment ‘count’.
Given an array with only elements as ‘0’, ‘1’ and ‘2’. Sort this array.
The problem is very famous because the problem aims to sort this array in 𝑂(𝑛) time
complexity. This is possible because there are only three types of elements and we can
separate them using a special partitioning technique known as ‘3-way partitioning’.
This is a partition scheme which segregates items into three different sets.
If the element at arr[mid] is a 0, then swap arr[mid] and arr[low] and increase the low and
mid pointers by 1.
If the element at arr[mid] is a 1, increase the mid pointer by 1.
If the element at arr[mid] is a 2, then swap arr[mid] and arr[high] and decrease the high
pointer by 1.
8
void dnf(int arr[], int n) {
int low = 0, mid = 0, high = n-1;
while (mid <= high) {
if (arr[mid] == 0)
swap(arr[low++], arr[mid++]);
else if (arr[mid] == 1)
mid++;
else
swap(arr[mid], arr[high--]);
}
}
9
1.4 Form the biggest number
Given an array of numbers, arrange them in a way that yields the largest value. For
example, if the given numbers are {54, 546, 548, 60}, the arrangement 6054854654
gives the largest value. And if the given numbers are {1, 34, 3, 98, 9, 76, 45, 4}, then
the arrangement 998764543431 gives the largest value.
Here if you notice, we need to sort the numbers in the reverse dictionary order. Here
numbers are not sorted as per their value. They are sorted as if they are treated like
strings. C++’s inbuilt string methods allow us an easy string to number and number
to string conversion.
So the idea is – to write our own compare function for inbuilt sort() method, which
will compare the numbers while sorting as if they are appended strings. For example,
let X and Y be 542 and 60. To compare X and Y, we compare 54260 and 60542. Since
60542 is greater than 54260, we put Y first.
Before discussing about medians, let’s discuss a more general problem first – finding
𝑘 smallest element in an array. For e.g. if input array is A = [3, 5, 1, 2, 6, 9, 7],
4 smallest element in array A is 5, because if you sort the array A, it becomes A =
th
[1, 2, 3, 5, 6, 7, 9] and now you can easily see that 4th element is 5.
The naïve way is to sort the array and then find the element at index ‘k’. Since
quicksort doesn’t take extra space, it is usually preferred over merge sort. But the
worst-case complexity of quicksort is 𝑂(𝑛 ). But we can usually avoid it by selecting
the pivot element randomly.
10
The efficient way is to optimize quicksort little more. One thing to notice is – if after
partitioning, pivot is at position ‘j’, we can say that the pivot is jth smallest element of
the array. Our task is to make this ‘j’ equal to ‘k’. If j < k, the kth smallest element lies
on the right side of the j, and if j > k, then on the left side. So, at any time, we need
to partition only one side of the array. Theoretically, this algorithm still has the
complexity of 𝑂(𝑛 ∗ log(𝑛)) but practically, you do not need to sort the entire array
before you find kth smallest element. And if you choose the pivot element wisely, the
expected time complexity reduces to 𝑂(𝑛). Let’s take an example to see how this
method works.
For demonstration purposes, we will consider pivot element as the first element of the
array. So here pivot = 0.
Since 3 is less than k – 1 (because zero-based indexing), kth smallest element is on the
right side.
Now the pivot is the 1st element of the right subarray. So pivot = 4.
This algorithm of using quicksort’s partition method for selecting the kth
smallest/biggest element from an array is called ‘quick-select algorithm’.
There are a lot many efficient methods devised to choose pivot wisely. As of now, we
won’t be adopting the most efficient technique, but a sufficient good technique is to
choose pivot element using random function.
11
int randomPartition (int arr[], int l, int r) {
int n = r - l + 1;
int pivot = rand()%n;
swap(arr[pivot+1], arr[r]);
return partition(arr, l, r);
}
Now coming back to our median topic. We know, median of an array is the middle
element of the sorted form of that array. So, if we know the size of the array, which we
do most of the times, we can find median in 𝑂(𝑛) time by setting -
Then for odd case, median is the 𝑘 smallest element and for even case, median is the
average of 𝑘 𝑎𝑛𝑑 𝑘 smallest elements.
One more thing which you should know is – how to calculate 𝑘 smallest element using
heaps (or priority queues)? This is the method which you should tell to the interviewer
first, if asked – how will you find 𝑘 smallest element?
Min-heap logic:
When working with Min-heap, logic is simple. Create a heap of all elements of the array.
Pop the root of the heap, k-1 times. Then the root of the heap will be the 𝑘 smallest
element. Time complexity of this method is 𝑂(𝑛 + 𝑘 ∗ log (𝑛)).
12
int KthSmallest(vector<int> &vec, int k) {
// use std::greater as the comparison function for min-heap
priority_queue<int,vector<int>,greater<>> pq(vec.begin(), vec.end());
// pop from min-heap exactly (k-1) times
while (--k) {
pq.pop();
}
// return the root of min-heap
return pq.top();
}
Max-heap logic:
Let’s take an example to see how above logic works. Let’s say we want to find 6th smallest
element in the following array.
2. Take the next element from set B and check if it is less than the root of max heap.
In this case, yes, it is. Remove the root and insert the new element into max heap.
13
3. It continues to 10, nothing happens as the new element is greater than the root of
max heap. Same for 9. At 6, again the root of max heap is greater than 6. Remove
the root and add 6 to max heap.
Again, new element from set B is less than root of max heap. Root is removed and
new element is added. Array scan is finished, so just return the root of the max heap,
6 which is the sixth smallest element in given array.
14
Chapter 2 – Searching Based Problems
2.1 Binary search
Search a sorted array by repeatedly dividing the search interval in half. Time
complexity of BS is 𝑂(log 𝑛). Binary search is not limited to arrays. It is an idea which
can work on numbers, functions, etc. and not just on sequences. We will soon work on
problems where this fact will get clearer.
Recursive version:
Iterative version:
In C++’s algorithm library, we have a binary search function. Its declaration is:
15
Lower bound implementation:
int main() {
int arr[] = {1, 7, 8, 10, 13, 20, 25, 30, 33};
int size = sizeof(arr)/sizeof(arr[0]);
// Case when key doesn't exist in array
int* lb1 = lower_bound(arr, arr+size, 12);
int lb2 = bs_lower_bound(arr, size, 12);
cout << *lb1 << " " << arr[lb2] << endl;
16
Output:
13 13
20 20
13 13
25 25
Problems
Q. 1 Given an array, find a peak element in it. A peak element is an element that is greater
than its neighbours.
Solution: The idea is based on the technique of Binary Search to check if the middle element
is the peak element or not. If the middle element is not the peak element, then check if the
element on the right side is greater than the middle element. If yes, then there is always a
peak element on the right side. If the element on the left side is greater than the middle
element, then there is always a peak element on the left side. Form a recursion and the peak
element can be found in 𝑂(log 𝑛) time.
Q. 2 Given a 2D array, find a peak element in it. Neighbours of a cell in 2D array are 4
cells situated at left, right, top and bottom. For corner elements, missing neighbours are
considered of negative infinite value.
Solution: Consider mid column and find max value in it. Let index of the mid column be
‘mid’, value of the maximum element be ‘max’ and maximum element be at
‘mat[max_index][mid]’. If max >= A[index][mid-1] and max >= A[index][pick+1], max is a peak,
return max. If max < mat[max_index][mid-1], recur for left half of matrix. If max <
mat[max_index][mid+1], recur for right half of matrix.
17
const int MAX = 100;
int findMax(int arr[][MAX], int rows, int mid, int& max) {
int max_index = 0;
for (int i = 0; i < rows; i++) {
if (max < arr[i][mid]) {
max = arr[i][mid]; max_index = i;
}
}
return max_index;
}
Q. 3 There are 2 sorted arrays A and B of size n each. Write an algorithm to find the median
of the array obtained after merging the above 2 arrays (i.e. array of length 2n).
Solution:
The naïve approach is to merge both arrays in 𝑂(𝑛) time and then get the middle element as
median. However, we can do better in 𝑂(log 𝑛) time by comparing the medians of both the
arrays. Here is the idea:
1) Calculate the medians m1 and m2 of the input arrays arr1[] and arr2[] respectively.
2) If m1 and m2 both are equal then we are done. Return m1 (or m2)
3) If m1 is greater than m2, then median is present in one of the below two subarrays.
a) From first element of arr1 to m1 𝑎𝑟𝑟1 0 . . .
18
int median(int arr[], int n) {
if (n % 2 == 0)
return (arr[n/2] + arr[n/2 - 1]) / 2;
else
return arr[n/2];
}
Above method doesn’t work for arrays of unequal length. So, this could be a nice follow-up
task: Write an algorithm which works for unequal sized arrays as well. Achieving this in
logarithmic time is bit tricky. Here is our approach:
Start partitioning the two arrays into two groups of halves (not two parts, but both partitioned
in such a way, that the partitioned sets should have same number of elements). The first half
contains some first elements from first and second arrays, and the second half contains the
rest (or the last) elements form first and second arrays. Because the arrays can be of different
sizes, it does not mean to take half from each array. The below example clarifies the
explanation.
Reach a condition such that, every element in the first half is less than or equal to every
element in the second half. How to reach this condition?
19
Suppose, the total number of elements is even. Now let’s say that we have found the partition
(we will discuss partition procedure soon) such that a1 is less than or equal to a2, and b2 is
less than or equal to b3.
Now check if a1 is less than or equal to b3, and if b2 is less than or equal to a2. If that’s the
case, it means that every element in the first half is less than or equal to every element in the
second half. We can then directly calculate median using below formula:
But if that’s not the case, then there are two possibilities:
One should ask the question – Why does the above condition lead to the median?
20
The median is the (n + 1)/2 smallest element of the array, and here, the median is the
(n + m + 1)/2 smallest element for the two arrays. If, all the elements in the first half are less
than (or equal) to all elements in the second half, in case of odd numbers in total, just calculate
the maximum between the last two elements in the first half (a2 and b2 in our example), and
this will lead us to the (n + m + 1)/2 smallest element among the two arrays, which is the
median ((7 + 4 + 1)/2 = 6). But in case of even numbers in total, calculate the average
between the maximum of the last two elements in the first half (a1 and b2 in our example)
with its successive number among the arrays which is the minimum of first two elements in
the second half (a2 and b3 in our example). Now, here is how we make the partitions:
To make two halves, make the partition such that the index at partition of array A[] + the
index at partition of array B[ ] is equal to the total number of elements plus one divided by 2,
i.e. (n + m + 1)/2 (+1 is, if the total number of elements is odd).
First, define two variables: min_index and max_index, and initialize min_index to 0, and
max_index to the length of the smaller array. In the examples given below, A[] is the smaller
array.
The variable i means the number of elements to be inserted from A[ ] into the first half, and j
means the number of elements to be inserted from B[ ] into the first half, the rest of the
elements will be inserted into the second half.
21
Below is another example which leads to the condition that returns a median that exists in
the merged array.
22
Here is the code for above approach:
Searching on right */
if (i < n && j > 0 && b[j - 1] > a[i])
min_index = i + 1;
Searching on left */
else if (i > 0 && j < m && b[j] < a[i - 1])
max_index = i - 1;
23
// we have found the desired halves.
else {
/* This condition happens when we don't have any elements in the
first half from a[] so we returning the last element in b[]
from the first half. */
if (i == 0)
median = b[j - 1];
We have to paint n boards of length {A1, A2, ... An}. There are k painters available and each
takes 1-unit time to paint 1-unit of board. The problem is to find the minimum time to get
this job done under the constraints that any painter will only paint continuous sections of
boards, say board {2, 3, 4} or only board {1} or {2, 2} or nothing but not board {2, 4, 5}.
Examples:
Output: 20.
Here we can divide the boards into 2 equal sized partitions, so each painter gets 20 units of
board and the total time taken is 20.
Output: 60.
Here we can divide first 3 boards for one painter and the last board for second painter.
24
Solution:
Let’s forget about binary search for a moment. And let’s try to solve this problem from scratch.
We can observe that the problem can be broken down into: Given an array A of non-negative
integers and a positive integer k, we have to divide A into k or fewer partitions such that the
maximum sum of the elements in a partition, among overall partitions is minimized. So, for
the second example above, possible divisions are:
- Two partitions: (10) & (20, 30, 40), so time is 90. Similarly, we can put the first divider after
20 (=> time 70) or 30 (=> time 60); this means the minimum time from (100, 90, 70, 60) is
60.
A brute force solution is to consider all possible set of contiguous partitions and calculate
the maximum sum partition in each case and return the minimum of all these cases.
1) Optimal Substructure:
We can implement the naive solution using recursion with the following optimal substructure
property:
Assuming that we already have k-1 partitions in place (using k-2 dividers), we now have to
put the k-1th divider to get k partitions. How can we do this? We can put the k-1th divider
between the ith and i+1th element where i = 1 to n. Please note that putting it before the first
element is the same as putting it after the last element.
The total cost of this arrangement can be calculated as the maximum of the following:
a) The cost of the last partition: sum(Ai ... An), where the k-1th divider is before element i.
b) The maximum cost of any partition already formed to the left of the k-1th divider.
Here a) can be found out using a simple helper function to calculate sum of elements between
two indices in the array. How to find out b)?
We can observe that b) actually is to place the k-2 separators as fairly as possible, so it is a
subproblem of the given problem. Thus, we can write the optimal substructure property as
the following recurrence relation:
25
Following is the implementation of this recursive approach:
You can analyse that the time complexity of above solution is exponential. Following is the
partial recursion tree for T(4, 3) in above equation.
T(4, 3)
/ / \ ...
/... /...
T(1, 1) T(1, 1)
We can observe that many subproblems like T(1, 1) in the above problem are being solved
again and again. Because of these two properties of this problem, we can solve it using dynamic
programming, either by top down memoized method or bottom up tabular method.
// bottom up tabular dp
int partition(int arr[], int n, int k) {
int dp[k + 1][n + 1] = { 0 }; // initialize table
// base cases
// k=1
for (int i = 1; i <= n; i++)
dp[1][i] = sum(arr, 0, i - 1);
// n=1
for (int i = 1; i <= k; i++)
dp[i][1] = arr[0];
26
// 2 to k partitions
for (int i = 2; i <= k; i++) { // 2 to n boards
for (int j = 2; j <= n; j++) {
// track minimum
int best = INT_MAX;
// i-1 th separator before position arr[p = 1…j]
for (int p = 1; p <= j; p++)
best = min(best, max(dp[i - 1][p], sum(arr, p, j - 1)));
dp[i][j] = best;
}
}
return dp[k][n];
}
The time complexity of the above program is 𝑂(𝑘 ∗ 𝑁 ). It can be easily brought down to
𝑂(𝑘 ∗ 𝑁 ) by precomputing the cumulative sums in an array thus avoiding repeated calls to
the sum function:
// base cases
for (int i = 1; i <= n; i++)
dp[1][i] = sum[i];
for (int i = 1; i <= k; i++)
dp[i][1] = arr[0];
27
Now, let’s look at this problem – with the view of binary search. We know that the invariant
of binary search has two main parts:
We also know that the values in this range must be in sorted order. Here our target value is
the maximum sum of a contiguous section in the optimal allocation of boards. Now how can
we apply binary search for this? We can fix the possible low to high range for the target value
and narrow down our search to get the optimal allocation.
We can see that the highest possible value in this range is the sum of all the elements in the
array and this happens when we allot 1 painter all the sections of the board. The lowest
possible value of this range is the maximum value of the array max, as in this allocation we
can allot max to one painter and divide the other sections such that the cost of them is less
than or equal to max and as close as possible to max. Now if we consider we use x painters in
the above scenarios, it is obvious that as the value in the range increases, the value of x
decreases and vice-versa. From this we can find the target value when x = k and use a helper
function to find x, the minimum number of painters required when the maximum length of
section a painter can paint is given.
28
int partition(int arr[], int n, int k) {
int low = getMax(arr, n);
int high = getSum(arr, n);
while (low < high) {
int mid = low + (high - low) / 2;
int requiredPainters = numberOfPainters(arr, n, mid);
// find better optimum in lower half. Here mid is included
// because we may not get anything better.
if (requiredPainters <= k)
high = mid;
// find better optimum in upper half. Here mid is excluded
// because it gives required Painters > k, which is invalid
else
low = mid + 1;
}
return low;
}
Q. 5 Aggressive Cows Problem. Farmer John has built a new long barn, with N (2 <= N
<= 100,000) stalls. The stalls are located along a straight line at positions x1, ..., xN (0 <= xi
<= 1,000,000,000). His C (2 <= C <= N) cows don't like this barn's layout and become
aggressive towards each other once put into a stall. To prevent the cows from hurting each
other, FJ wants to assign the cows to the stalls, such that the minimum distance between any
two of them is as large as possible. What is the largest minimum distance?
Input:
t – the number of test cases, then t test cases follows.
* Line 1: Two space-separated integers: N and C
* Lines 2 … N+1: Line i+1 contains an integer stall location, xi
Output:
For each test case output one integer: the largest minimum distance.
Example:
1
53
1
2
8
4
9
Ans = 3
29
Explanation: FJ can put his 3 cows in the stalls at positions 1, 4 and 8, resulting in a minimum
distance of 3.
Solution: It is clear, that we shall first sort the stalls’ positions. Now, we need to put C cows
in these N stalls. Let’s work on above example to understand our approach.
1 2 4 8 9
Now, let’s say we have just 2 cows. The minimum distance possible is 0 (you put both the
cows in same stall) and the maximum distance possible is 9 – 1 = 8 (you put both of them in
extreme stalls). But now we have ‘C’ cows. Here it’s 3 in our example. So, in brute force, we
check – Can we place 3 cows at distance 1 away from each other? If yes, then can we place 3
cows at distance 2 away from each other? If yes, then can we do so for distance 3? We keep
on doing this till 8. We then print the maximum possible distance.
But instead of incrementing the distance by 1 and checking it for the next stall, we can use
binary search here. Look that we are not applying binary search to any array or sequence, but
we are using to get our optimal guess in fastest possible time. We check for possible distance
= 4. If no, then search from 0 to 3 and if yes, then search from 5 to 8. This is the abstraction
of idea of binary search.
int main() {
int t; cin >> t;
while (t--) {
int n, c; cin >> n >> c;
ll positions[n];
for (int i = 0; i < n; i++)
cin >> positions[i];
sort (positions, positions+n);
30
ll start = 0, end = positions[n-1] - positions[0], ans = -1;
while (start <= end) {
ll mid = start + (end-start)/2;
if (check(c, positions, n, mid)) {
ans = mid; start = mid+1;
} else
end = mid-1;
}
cout << ans << endl;
}
return 0;
}
Input:
First line contains 1<=T<=20 the number of test cases. Then T test cases follow. First line
of each test case contains N and K. Next line contains N integers, ith of which is the number
of candies in ith box.
Output:
For each test case print the required answer in a separate line.
Example:
2
32
314
41
3239
Ans:
3
9
31
Solution: The idea is similar to aggressive cows. Find the possible range of your answer and
then apply binary search to find the optimal possible answer in fewest guesses.
int main() {
int t; cin >> t;
while (t--) {
int n, k; cin >> n >> k;
ll candies[n];
for (int i = 0; i < n; i++)
cin >> candies[i];
32
Chapter 3 – Divide and Conquer
3.1 Introduction
This is a technique of solving a problem by dividing them into smaller and smaller
subproblems until solving each subproblem becomes very easy and then
merging/conquering the results of subproblem in such a way that it gives us back the
result of the original problem. Most of the times, it is the merge step which is tricky
to think about. We have been using many ‘divide and conquer’ algorithms by now (in
Data Structures and in this book). Here is the quick review of that:
a) Binary Search: Here you keep on dividing your sequence into halves until your
subproblem ends up to one cell of the array. This is a very basic DAC algorithm
as it doesn’t require a tricky merge step.
b) Mergesort: This is a classic example of DAC. The algorithm divides the array in
two halves, recursively sorts them and finally merges the two sorted halves.
c) Quicksort: The algorithm first places the pivot element at its correct position and
then divides the problem into left and right subarray. Here again, there is no
explicit merge step.
In this chapter, we will be seeing few of the very famous problems which were solved
efficiently using DAC approach.
We will come back to this problem in a while. For now, let’s talk about something
completely different – Catalan numbers. The definition of nth Catalan number is –
2𝑛 ( )!
𝐶 = =(
𝑛 )! !
𝐶 =1 𝑎𝑛𝑑 𝐶 =∑ 𝐶 ×𝐶 𝑓𝑜𝑟 𝑛 ≥ 0.
1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440, 9694845,
35357670, 129644790, 477638700, 1767263190, 6564120420, …
33
a) If you have ‘n’ pairs of parentheses, then there are Cn ways of writing expressions
in which the parentheses are correctly matched. For e.g., n = 3 gives Cn = 5 and
the 5 ways are: ((())) ()(()) ()()() (())() (()()).
b) The number of full binary trees with n+1 leaf nodes is nth Catalan number.
c) The number of BSTs with n nodes is nth Catalan number.
d) Cn is the number of monotonic lattice paths along the edges of a grid with n × n
square cells, which do not pass above the diagonal. A monotonic path is one which
starts in the lower left corner, finishes in the upper right corner, and consists
entirely of edges pointing rightwards or upwards.
e) A convex polygon with n + 2 sides can be cut into triangles by connecting vertices
with non-crossing line segments (a form of polygon triangulation). The number of
triangles formed is n and the number of different ways that this can be achieved
is Cn.
So, we can see, Catalan numbers are everywhere. The reason we studied a little about
Catalan numbers is because we will see them once again in context of irrational
numbers.
34
Coming back to our problem of finding irrational numbers up to ‘d’ digits of precision,
say √2 for example, we know we can’t rely on computer system, because it will only
give us the precision which could be stored in 64 bits. So, what we can do is – we can
use numerical methods of finding roots of equation 𝑥 − 2 = 0 and then approximate
the root to ‘d’ digits of accuracy. And the numerical method which we shall be using
is Newton’s method. According to this method, the successive approximation of root
at ith iteration is given by,
𝑓(𝑥 )
𝑥 =𝑥 −
𝑓 (𝑥 )
𝑥 − 𝑎 𝑥 + 𝑎/𝑥
𝑥 =𝑥 − =
2𝑥 2
One thing to notice is – we need to compute 𝑥 to ‘d’ digits of precision and we require
a division operation here, 𝑎/𝑥 . Again, the computer can’t perform such accurate
division because the word length of ALU is just 64 bits. So, we need to somehow
manage this division operation, persisting the accuracy. To do so, we first need to
understand how ALU carries out division in modern chips. Inside a chip, a division
operation is performed by using multiplication as a subroutine. So, in order to perform
efficient division, we need to perform efficient multiplication first.
Efficient Multiplication:
The time complexity of multiplication of two numbers each ‘n’ bits long is 𝑂(𝑛 ). But
it seems that we can do better. Let’s solve this by considering a problem.
Given two binary strings that represent value of two integers, find the product of two
strings. For example, if the first bit string is “1100” and second bit string is “1010”,
output should be 120.
Approach:
A Naive Approach is to follow the process we study in school. One by one take all bits
of second number and multiply it with all bits of first number. Finally add all
multiplications. This algorithm takes 𝑂(𝑛 ) time.
Using Divide and Conquer, we can multiply two integers in lesser time complexity. We
divide the given numbers in two halves. Let the given numbers be X and Y. For
simplicity let us assume that n is even.
35
The product XY can be written as following.
𝑋𝑌 = (𝑋 × 2 + 𝑋 )( 𝑌 × 2 + 𝑌 )
= 2 𝑋 𝑌 + 2 (𝑋 𝑌 + 𝑋 𝑌 ) + 𝑋 𝑌
If we take a look at the above formula, there are four multiplications of size n/2, so we
basically divided the problem of size n into four sub-problems of size n/2. But that
doesn’t help because solution of recurrence T(n) = 4T(n/2) + O(n) is 𝑂(𝑛 ). The tricky
part of this algorithm is to change the middle two terms to some other form so that
only one extra multiplication would be sufficient. The following is tricky expression for
middle two terms.
𝑋 𝑌 + 𝑋 𝑌 = (𝑋 + 𝑋 )(𝑌 + 𝑌 ) − 𝑋 𝑌 − 𝑋 𝑌
𝑋𝑌 = 2 𝑋 𝑌 + 2 [(𝑋 + 𝑋 )(𝑌 + 𝑌 ) − 𝑋 𝑌 − 𝑋 𝑌 ] + 𝑋 𝑌
With above trick, the recurrence becomes T(n) = 3T(n/2) + O(n) and solution of this
recurrence is 𝑂(𝑛 ) = 𝑂(𝑛 . ).
What if the lengths of input strings are different and are not even? To handle this case
of different length, we append 0’s in the beginning of shorter one. To handle odd length,
we put floor(n/2) bits in left half and ceil(n/2) bits in right half. So, the expression for
XY changes to following.
𝑋𝑌 = 2 ( / )
𝑋𝑌 +2 ( / )
[(𝑋 + 𝑋 )(𝑌 + 𝑌 ) − 𝑋 𝑌 − 𝑋 𝑌 ] + 𝑋 𝑌
The above algorithm is called Karatsuba algorithm and it can be used for any
base.
// Given two unequal sized bit strings, converts them to same length
// by adding leading 0s in the smaller string. Returns the new length.
int makeEqualLength(string &str1, string &str2) {
int len1 = str1.size();
int len2 = str2.size();
if (len1 < len2) {
for (int i = 0; i < len2 - len1; i++)
str1 = '0' + str1;
return len2;
}
else if (len1 > len2) {
for (int i = 0; i < len1 - len2; i++)
str2 = '0' + str2;
}
return len1; // If len1 >= len2
}
36
// Function that adds two bit sequences and returns the addition
string addBitStrings(string first, string second) {
string result; // To store the sum bits
// make the lengths same before adding
int length = makeEqualLength(first, second);
int carry = 0; // Initialize carry
// Add all bits one by one
for (int i = length-1 ; i >= 0 ; i--) {
int firstBit = first.at(i) - '0';
int secondBit = second.at(i) - '0';
// boolean expression for sum of 3 bits
int sum = (firstBit ^ secondBit ^ carry)+'0';
result = (char)sum + result;
// boolean expression for 3-bit addition
carry = (firstBit & secondBit) | (secondBit & carry) |
(firstBit & carry);
}
// if overflow, then add a leading 1
if (carry) result = '1' + result;
return result;
}
37
int main() {
printf ("%ld\n", multiply("1100", "1010"));
printf ("%ld\n", multiply("110", "1010"));
printf ("%ld\n", multiply("11", "1010"));
printf ("%ld\n", multiply("1", "1010"));
printf ("%ld\n", multiply("0", "1010"));
printf ("%ld\n", multiply("111", "111"));
printf ("%ld\n", multiply("11", "11"));
return 0;
}
High-precision division:
We want to compute 1/b as any fraction a/b can be computed as a*(1/b). So, our
focus will be on finding the reciprocals.
The answer to this question is – We will convert this division by ‘b’ to division by ‘R’
because division by R will be easy (after all its again shifting of bits in opposite
direction). Question is – how will we make this conversion? To do so, we will use
Newton’s method again.
1 𝑏
𝑓(𝑥) = −
𝑥 𝑅
Here, the root of this equation is x = R/b. Plugging this f(x) in Newton’s method,
𝑥 =𝑥 −
𝑥 = 2𝑥 −
Now if you look at above equation, 2xi is easy, because we can multiply efficiently using
Karatsuba (or we can even use naïve algorithm) – so bxi2 is easy, subtraction is easy
and division by R is easy. So, using above successive formula, we can find R/b.
38
To get d digits of precision, we will require only log 𝑑 iterations. So, if 𝑂(𝑛 ) is the
time complexity of multiplication, then the time complexity of division is 𝑂(𝑛 log 𝑛).
For Karatsuba multiplication, 𝛼 = 1.59.
So, finally, here are the steps which we need to perform for high precision square root
of 2:
For sqrt of 2 upto ‘d’ digits of precision, we need to solve, 𝑥 − 2 ∗ 10 = 0. This will
give us all d digits into integral part, so no decimal inaccuracy. However, storing such big
‘d’ digit integers isn’t possible in ‘long long’ data-type as well. So, for that, we need to
represent these integers as strings by making a class called ‘BigInteger’. We now need to
redefine all the operators, so that we can do arithmetic BigIntegers. One important
operation will be, multiplying two BigIntegers. This is same as “multiplying two string
representations of numbers”. Hey! That’s Karatsuba’s Algorithm! We then solve the above
equation using Newton’s method.
http://people.csail.mit.edu/devadas/numerics_demo/sqrt2.html
Now that you have looked at the demo of sqrt of 2, let’s try to compute one high
precision calculation. Suppose, we have a circle of diameter equals to 1 trillion.
AD = AC – CD = 500,000,000,000 − (500,000,000,000 − 1)
http://people.csail.mit.edu/devadas/numerics_demo/chord.html
Well, the only numbers you see in the answer are – just Catalan numbers! For
further explanation, refer:
https://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-006-introduction-to-algorithms-fall-2011/lecture-
videos/MIT6_006F11_lec11.pdf
39
3.3 Fast Exponentiation
First the recursive approach, which is a direct translation of the recursive formula:
However, storing such large exponentiation values isn’t possible. So, many times we
store modular results instead. Typically, we take mod of result using 𝑀 = 10 + 7.
Following is the iterative version of modular exponentiation logic.
40
int modularExponentiation(int x, int n, int M) {
int result=1;
while(n > 0) {
if(n & 1)
result = (result * x) % M;
x = (x*x) % M;
n = n/2;
}
return result;
}
The idea of binary exponentiation is not just valid upto numbers. We can use the above
idea in matrix exponentiation. This reduces the time complexity from 𝑂(𝑛 ) to
𝑂(𝑛 log 𝑛).
// Helper functions
vector<vector<ll>> identity(int n) {
vector<vector<ll>> values(n, vector<ll>(n, 0));
for (int i = 0; i < n; i++)
values[i][i] = 1;
return values;
}
41
void printMatrix (vector<vector<ll>> mat) {
for (int i = 0; i < mat.size(); i++) {
for (int j = 0; j < mat[0].size(); j++) {
cout << mat[i][j] << " ";
}
cout << endl;
}
}
int main() {
vector<vector<ll>> matrix = {
{4, 0, -1},
{2, -3, 0},
{0, 1, -2}
};
int n = 10;
vector<vector<ll>> result = fastExponentiation(matrix, n);
printMatrix(result);
return 0;
}
𝐹 =𝐹 =1
𝐹 =𝐹 +𝐹
42
The first idea that comes to mind for calculating Fibonacci numbers is to run a for
loop. The running time of this method is 𝑂(𝑛). This method is reasonably good
when 𝑛 < 10 . If we want 𝑛 to be upto 10 , we need to switch to faster method.
Suppose we have a vector (matrix with one row and several columns) of (𝐹 , 𝐹 )
and we want to multiply it by some matrix M, so that we get (𝐹 , 𝐹 ) Let’s call
this matrix M:
(𝐹 𝐹 ) ∗ 𝑀 = (𝐹 𝐹)
𝑥 𝑦
(𝐹 𝐹 )∗ = (𝐹 𝐹)
𝑧 𝑤
0 1
𝑀=
1 1
Multiplying this vector with the matrix M will get us to (F1, F2) = (1, 2):
(1 1) ∗ 𝑀 = (1 2)
But we could get the same result by multiplying (1, 1) by M two times:
(1 1) ∗ 𝑀 ∗ 𝑀 = (2 3)
43
Computing Mk takes O((size of M)3 * log(k)) time. In our problem, size of M is 2,
so we can find Nth Fibonacci number in O(23 * log(N)) = O(logN).
Let’s take a look at more general problem than before. Sequence A is a linear
recurrent sequence if it satisfies two properties:
a) 𝐴 = 𝑐 𝐴 + 𝑐 𝐴 + 𝑐 𝐴 + ⋯ + 𝑐 𝐴 for 𝑖 > 𝑘.
b) 𝐴 = 𝑎 , 𝐴 = 𝑎 , … 𝐴 =𝑎 are given integers.
We need to find AN modulo 1000000007, when N is up to 1018 and k up to 50.
Here we will look for a solution that involves matrix multiplication right from the
start. If we obtain matrix M, such that:
The reasoning is the same as with Fibonacci numbers: we multiply matrix with 1
row and k columns by M and get matrix with 1 row and k columns. Therefore, M
has k rows and k columns.
Then we can write down equations, which are based on the definition of matrix
multiplication:
From the first equation it is easy to see, that x21 = 1 and xi1 = 0; i = 1, 3, 4, …, k.
From the second equation we conclude that x32 = 1 and xi2 = 0; i = 1, 2, 4, …, k.
Following this logic up to k-1’th equation, we get xi,i-1 = 1; i = 2, 3, …, k and xij =
0; i = 1, 2, 3, …, k and j ≤ k - 1, j ≠ i - 1. The last equation looks like the definition
of Ai. Based on that, we get xik = ck - i +1:
44
This code will run in 𝑂(𝑘 log 𝑁) if we use fast matrix exponentiation.
The problem is: find PN modulo 1000000007 for N up to 1018. Surprisingly, we can
do it with matrices again. Let’s imagine we have a matrix M, such that:
where
Given a directed, unweighted graph with N vertices and an integer k. The task is
to find the number of paths of length k for each pair of vertices (u, v). Paths don’t
have to be simple i.e. vertices and edges can be visited any number of times in a
single path.
The graph is represented as adjacency matrix where G[i][j] = 1 indicates that there
is an edge from vertex i to vertex j and G[i][j] = 0 indicates no edge from i to j. It
is obvious that given adjacency matrix is the answer to the problem for the case k
= 1. It contains the number of paths of length 1 between each pair of vertices.
Let’s assume that the answer for some 𝑘 is 𝑀𝑎𝑡 and the answer for 𝑘 +
1 is 𝑀𝑎𝑡 .
45
It is easy to see that the formula computes nothing other than the product of the
matrices 𝑀𝑎𝑡 and G. So,
𝑀𝑎𝑡 = 𝑀𝑎𝑡 ∗ 𝐺
𝑀𝑎𝑡 = 𝐺 ∗ 𝐺 ∗ … ∗ 𝐺 (𝑘 𝑡𝑖𝑚𝑒𝑠) = 𝐺
Problem
PK is an astronaut who went to a planet and was very fascinated by the language of its
creatures. The language contains only lowercase English letters and is based on a simple logic
that only certain characters can follow a particular character. Now he is interested in
calculating the number of words of length L and ending at a particular character C. Help PK
to calculate this value.
Input:
The input begins with 26 lines, each containing 26 space-separated integers. The integers can be
either 0 or 1. The jth integer at ith line depicts whether jth English alphabet can follow ith English
alphabet or not.
Next line contains an integer T. T is the number of queries.
Next T lines contains a character C and an integer L.
Output:
For each query output the count of words of length L ending with the character C. Answer to
each query must be followed by newline character.
The answer may be very large so print it modulo 1000000007.
Constraints:
1 <= T <= 100
C is lowercase English alphabet.
2 <= L <= 10000000
Solution:
Complexity (per query): 𝑂(𝑧 log 𝑙) where z is the number of alphabets and l is the length of
word.
Explanation: Suppose we have a graph having each English alphabet as a vertex. There is an edge
between the ith and jth English alphabet if the entry a[i][j] = 1, where a is the input matrix. Now
each word in the language is simply a path from the starting alphabet to the ending alphabet. To
calculate the numbers of words of length l ending at particular alphabet, we need to calculate total
paths of length l-1 ending at that alphabet. This can be found by raising the adjacency matrix to
the power l-1. The jth number in the ith row of this matrix gives the number of words of length l
starting at character i and ending at character j. To find the total number of words ending at a
particular alphabet take the sum of all the numbers in the j th column.
46
3.4 Fast Fourier Transform
This topic, in particular, is my favourite topic in the context of Divide and Conquer
strategy. It’s tricky and contains many Aha! moments. So, we will go through this
topic slowly and steadily, building the base and then finally we shall deal with little
bit of mathematics.
𝐴(𝑥) = 𝑎 + 𝑎 𝑥 + 𝑎 𝑥 + ⋯ + 𝑎 𝑥
A(x)
All curves drawn above can be represented by some quadratic expression of x. Similarly,
all curves with two bumps can be represented as cubic expression of x.
A(x)
47
So, if we plot a time on x-axis and the value of signal on y-axis, then if the signal has
n-bumps, it could be represented in time domain as a polynomial of degree 𝑛 + 1.
A(x)
So, the above random signal has 13 bumps. So, it’s a polynomial of degree 14. Probably,
representing a random signal as a polynomial, especially the one containing noise,
would be a very bad representation, but at least we have some start to represent a
signal. We will improve our idea of representation of signals gradually.
So, for a while, we will think of polynomial 𝐴(𝑥) as some signal varying with 𝑥. Now
usually there are 3 ways of representing a polynomial.
48
In above curve, there are 3 bumps; so, the degree of the polynomial curve will be 4.
Hence, we need 5 samples of this curve, to accurately represent the polynomial. So,
what we need is set of points < (𝑥 , 𝑦 ), (𝑥 , 𝑦 ), … , (𝑥 ,𝑦 ) > where each point
lies on the curve of polynomial. There is no harm if we have more than 𝑛 samples.
But we need at least 𝑛 samples. This argument again reduces to the example, that
we can’t uniquely identify parabolic curve with less than 3 samples.
Now the question is, which representation of polynomials is the best? Well, we will
explore the answer to this question in context of 3 operations which we wished to
perform on polynomials; evaluation, addition and multiplication.
1. Evaluation of Polynomials
So, start with 𝑎 , multiply it with 𝑥 . Add 𝑎 to the result and multiply the
result with 𝑥 . Then take 𝑎 and so on till we reach 𝑎 . Eventually, we will be
doing 1 multiplication and 1 addition 𝑛 times, giving us the final answer in 𝑂(𝑛).
2. Addition of Polynomials
Given two polynomials, 𝐴(𝑥) and 𝐵(𝑥), we want to compute 𝐶(𝑥) = 𝐴(𝑥) + 𝐵(𝑥).
49
form to coefficients form, we won’t be able to convert it back to roots form after
addition. Because it is not possible to find the roots of polynomial with degree more
than 4 using some formula. We have to shift to numerical methods; and even then,
we will get approximate results. So, we say – addition of two polynomials in roots
form takes infinite time. Due to this reason, working with roots form in
computations is near impossible.
3. Multiplication of Polynomials
Given two polynomials, 𝐴(𝑥) and 𝐵(𝑥), we want to compute 𝐶(𝑥) = 𝐴(𝑥) × 𝐵(𝑥).
(𝑎 + 𝑎 𝑥 + 𝑎 𝑥 + ⋯ + 𝑎 𝑥 )(𝑏 + 𝑏 + 𝑏 𝑥 + ⋯ + 𝑏 𝑥 )=
𝑎 𝑏 + (𝑎 𝑏 + 𝑎 𝑏 )𝑥 + (𝑎 𝑏 + 𝑎 𝑏 + 𝑎 𝑏 )𝑥 + ⋯ + 𝑎 𝑏 𝑥
𝐶 = 𝐴𝐵
The value of 𝑘 can go upto 2𝑛 − 2. So, the overall time required to find all 𝐶 ’s is
𝑂(𝑛 ), as each 𝐶 takes 𝑂(𝑛) time.
In roots form, multiplication is very easy. If there are 𝑝 roots of A(x) and 𝑞 roots of
B(x), then C(x) will have 𝑝 + 𝑞 roots. We just need to copy roots of A(x) and B(x),
and put them in a roots vector of 𝐶(𝑥). We need to multiply corresponding constant
terms as well. This takes 𝑂(𝑛) time.
𝐶 = 𝐴 ×𝐵 ∀𝑘
50
Here is the summary of time complexities of various operations in various forms.
Bolded time complexities represent that the operations are inefficient.
So, which representation is the best one? None! All are good at something and bad
at something. Which one is the worst? If we have to select one of them, probably it
will be roots form, because we can’t interconvert between root form and other forms
very well. So, we will be focusing our discussion on coefficients form and the samples
form.
Because it’s very similar to other operation called ‘convolution’ which is used in almost
every signal processing technique. If we are given two vectors, we compute their
convolution by reversing one vector and then computing the dot product of two vectors
for all possible shifts of the reversed vector. So, if 𝐶[𝑥] represents the convoluted vector
of 𝐴[𝑥] and 𝐵[𝑥], then –
𝐶 = 𝐴𝐵
for all possible values of k. Here 𝑗 is the shift of the reversed vector.
51
So, let’s start over journey of how to find the conversion algorithm which takes from
the coefficient land to samples land and vice-versa.
Let’s just find the samples of this polynomial curve at different values of x. We know,
we need 𝑛 samples. So, the values at which we need to evaluate/sample 𝑦 be
< 𝑥 , 𝑥 , 𝑥 ,… , 𝑥 >. So, the sample values are:
𝑦 = 𝑎 + 𝑎 𝑥 + 𝑎 𝑥 + ⋯+ 𝑎 𝑥
𝑦 = 𝑎 + 𝑎 𝑥 + 𝑎 𝑥 + ⋯+ 𝑎 𝑥
𝑦 = 𝑎 +𝑎 𝑥 +𝑎 𝑥 +⋯+𝑎 𝑥
1 𝑥 𝑥 … 𝑥 𝑎 𝑦
⎡ ⎤ 𝑎 𝑦
⎢1 𝑥 𝑥 … 𝑥 ⎥ = …
⎢… … … … … ⎥ …
⎣1 𝑥 𝑥 … 𝑥 ⎦ 𝑎 𝑦
𝑉𝐴 = 𝑌
Computing Vandermonde matrix will take 𝑂(𝑛 ) time, as each row will take 𝑂(𝑛)
amount of computation. Once, we have Vandermonde matrix, we can calculate the
samples for both the polynomials by multiplying it with coefficient vectors in 𝑂(𝑛 )
time.
Let’s say we got the resultant sample vector C by multiplying corresponding samples
of A and B. And now we want to convert samples to coefficients. We can do this as,
𝐶=𝑉 𝑌
52
But even 𝑂(𝑛 ) is of no use to us; as we can’t spend 𝑂(𝑛 ) time converting from one
form to another, if we can multiply polynomials easily in 𝑂(𝑛 ) in coefficients form
only. We need a better conversion algorithm, something of order 𝑂(𝑛 log 𝑛).
Compute A(x) efficiently for all values of x 𝜖 X, where A(x) is a polynomial of x and
X is a vector of positions of x for which we would like to evaluate the polynomial.
Let’s try our first attempt to make this conversion algorithm efficient.
There are two ways to divide a vector. One way is to divide at the middle as we have
seen in case of binary search, merge sort, etc. Another way is to divide the array into
two arrays – one containing the entries at even positions and one containing entries at
odd positions. We will choose the second method.
One thing you should realize is - 𝐴 and 𝐴 only contain coefficients at even and
odd positions. In no way, we should assume that 𝐴 or 𝐴 contain only even or
odd powers of x. They are polynomials of half the degree.
𝐴 (𝑥) = 𝑎 + 𝑎 𝑥 + 𝑎 𝑥 + ⋯ = 𝑎 𝑥
𝐴 (𝑥) = 𝑎 + 𝑎 𝑥 + 𝑎 𝑥 + ⋯ = 𝑎 𝑥
𝐴(𝑥) = 𝐴 (𝑥 ) + 𝑥. 𝐴 (𝑥 )
You can substitute the values to check why above expression is true. So, if you notice,
we have divided a problem of evaluating a polynomial of degree 𝑛 − 1 at x, to two
smaller subproblems of evaluating polynomials of degree half the previous one at 𝑥 .
We can recursively, divide the evaluation of these two sub-problem polynomials into
further smaller problems till we have only 1 term in the polynomial expression i.e. till
degree of the polynomial to be evaluated reduces to 1. So, at next step, 𝐴 and 𝐴
will be further divided into their even and odd parts and they will be evaluated at 𝑥 .
Now let’s see how efficient this approach is.
53
The above algorithm works in the following way:
If you put the value of x = 3 in 𝐴(𝑥), you will get the same result.
I know what are you thinking at this moment – All that for a drop of … Wait! All that
for just finding a single sample. I know, it’s hardly making any sense of why we are
doing all this. But hopefully, we will end up with some marvellous results.
For now, let’s analyse, how much work we are doing. We can write the recurrence
relation as:
𝑛
𝑇(𝑛, |𝑋|) = 2. 𝑇 , |𝑋| + 𝑂(𝑛 + |𝑋|)
2
where 𝑛 is the degree of polynomial, |X| is the size of the vector X i.e. the number of
values of x at which we need to find samples. We saw that – the task of finding samples
at each value of vector X on a polynomial of degree n, reduces to two subproblem
polynomials. We now have to evaluate 2 polynomials of degree n/2, but on the vector
𝑋 that has same size as that of 𝑋. So |X| doesn’t change. This gives us 2. 𝑇 , |𝑋|
term. After solving polynomials, we need to combine them. In combine step, we do 𝑛
additions and |X| products, giving us 𝑂(𝑛 + |𝑋|) term.
Yes, in this case, yes! But what if, you want more samples than required. We already
said, if we have more samples of a polynomial, there is no harm in it. But in this case,
we can consider |X| = n.
54
Now, finding the total amount of work which we are doing is not clear from above
recurrence relation. To get the idea of the recurrence relation, we draw the recurrence
tree. Task at each lower level gets broken down into two subtasks.
(n, n)
/ \
(n/2, n) (n/2, n)
/ \ / \
(n/4, n) (n/4, n) (n/4, n) (n/4, n)
/ \ / \ / \ / \
...
(1, n) (1, n) (1, n) ... (1, n) (1, n) (1, n)
On first layer, the single node represents 𝑂(1 × 𝑛 ) work i.e. 𝑂(𝑛 ) work, which we
already know. We broke this work into smaller problems till we reached the last layer.
In the last layer, there will be 2 leaf nodes and each leaf node represents 𝑂(𝑛)
work. So, the total work is 𝑂 2 × 𝑛 = 𝑂(𝑛 × 𝑛) = 𝑂(𝑛 ).
So, despite all this effort of Divide and Conquer, we couldn’t achieve better than 𝑂(𝑛 ).
Frustrating! Well, we have to look where is the problem?
The problem is in the size of X. If somehow, with the degree, even the size of the vector
X is reduced to half at each step, then the last level will contain 2 leaves and each
leaf will be (1, 1) denoting 𝑂(1) work. Therefore, then the work at the last level will
be 𝑂 2 × 1 = 𝑂(𝑛). And since there are total log 𝑛 levels, total work would be
𝑂(𝑛 log 𝑛).
So somehow, we have to reduce the work by reducing the size of vector X. But how?
Let’s think backwards.
At last level, we want X to contain only 1 value. Let’s say 𝑋 = {1} in last level. At
each next step, the number of elements in X gets reduced to half and we actually
evaluate each next step on square of the vector X of previous level. So, at second last
step, we need two elements in X. If 𝑋 = {−1, 1}, then last layer would work on 𝑋 =
{1}. Ah! So, what we need are square roots as every value has two square roots. And
squaring each value would lead only half the distinct squares! So, what we need in
vector X are actually square roots.
55
So, we are now left with only one task – deciding, what should be the initial vector X?
If you notice, at each next step, the size of vector X reduces to half. So, the first level
must have a vector X, whose size is of the form 2 , where h is height of the recurrence
tree, so that, after every level when size of X gets divided by 2, we eventually get only
1 element in X at the last level. In other words, X should always contain 2 elements
for some 𝑘, at every level.
This means, if the degree of your polynomial is 13, you can’t have 14 values in X for
samples. You will need 16 values. Again, remember what we said, having more samples
of a signal, is infact better. If the degree of polynomial is 47, then the number of x
values required in X for samples will be 64. So, if we require, 64 values in X initially,
what we need is 64 roots of unity. So that, after squaring vector X at each level, we
end up having only {1} in the last level.
BTW, did you notice something strange? X contains the values of variable x plotted
on x-axis, at which we need to sample the polynomial. And suddenly, for the sake of
reducing the complexity of the algorithm, we are allowing X to contain complex
numbers. This means we are actually sampling polynomials at complex values! Here is
the hint: This might not make any sense in the domain of x, i.e. the time domain, if
we imagine the polynomial to be signal and variable x to be time. You can try thinking
it in frequency domain! XD.
Well, we did a lot of talking. Let’s now finally, summarize what we studied and do
some algebra related to that to derive some mathematical results.
From now onwards, I will consider that 𝑛 is actually the number of samples I need. So
𝑛 is the size of vector X. The degree of polynomial is assumed to be less than or equal
to 𝑛 − 1. This is because, now our main focus will be on the vector X, instead of the
polynomial. We also know that this 𝑛 now has to be some power of 2 and X contains
values which are nothing but 𝑛 roots of unity. Let’s start some mathematics by
answering - How do we actually calculate roots of unity?
This is how the pattern works: The square root of 1 is -1 and 1. On complex plane,
these two points divide the unit circle into two semicircles. The 4 roots of unity are
{𝑖, −𝑖, −1, 1}. These points divide unit circle into 4 equal parts. 1 and -1 lie on real axis
and 𝑖 and −𝑖 lie on imaginary axis. 8 roots of unity will be – 4 above roots plus 4 other
roots which will be at angles ±45° and ±135°. So, these 8 roots will divide unit circle
into eight equal parts.
So, for 𝑛 roots of unity, we can write, 𝑊 = 1. Above equation should have n roots.
𝑊 = 1 = 𝑐𝑜𝑠2𝜋 + 𝑖. 𝑠𝑖𝑛2𝜋 = 𝑒
56
The root corresponding to k = 1 is actually termed as principal 𝑛 𝑟𝑜𝑜𝑡, 𝑊 . All other
roots are powers of omega i.e. 𝑊 , 𝑊 , 𝑊 , … , 𝑊 where 𝑊 = 𝑒 .
1 𝑥 𝑥 … 𝑥 𝑎 𝑦
⎡ ⎤ 𝑎 𝑦
⎢ 1 𝑥 𝑥 … 𝑥 ⎥ = …
⎢… … … … … ⎥ …
⎣1 𝑥 𝑥 … 𝑥 ⎦ 𝑎 𝑦
𝑦 1 𝑊 (𝑊 ) … (𝑊 ) 𝑎
𝑦 ⎡ ⎤ 𝑎
= ⎢1 𝑊 (𝑊 ) … (𝑊 ) ⎥
… ⎢… … … … … ⎥ …
𝑦 ⎣1 𝑊 (𝑊 ) … (𝑊 ) ⎦ 𝑎
𝑦 =𝑎 +𝑎 𝑊 +𝑎 𝑊 + ⋯+ 𝑎 𝑊
∴𝑦 = 𝑎 𝑊
.
∴𝑦 = 𝑎𝑒
57
So, the Discrete Fourier Transform (DFT) of a polynomial 𝑨(𝒙) or
equivalently the vector of coefficients < 𝒂𝟎 , 𝒂𝟏 , 𝒂𝟐 , … , 𝒂𝒏 𝟏 > is defined as
the values of the polynomial at the points which are 𝒏 𝒕𝒉
roots of unity.
And the divide and conquer algorithm used to efficiently compute the DFT
sequence, without using Vandermonde matrix, is called FFT (Fast Fourier
Transform).
Again! Did you notice something strange? We know that, discrete time signal is a
signal which is sampled at regular time intervals and DFT of a discrete time signal
takes us to its frequency domain representation. Now here we say, that discrete time
signal is actually the coefficients of polynomial and DFT gives us the samples of that
polynomials, sampled at roots of unity. Well, it’s very hard to imagine what’s
happening here and form a link between these concepts. Imagine. If you can!
Try to keep your calm while reading the next statement: Till now, what we have studied
is – how to efficiently convert from coefficient representation to samples representation.
But we still don’t know how to come back from samples world to coefficients world. A
small part is still pending. 😉
Probably, you guessed it right. To come back to coefficients world, what we need is –
Inverse Discrete Fourier Transform (IDFT). But I would like to take some time, and
reach the result of IDFT by starting from basics – our Vandermonde matrix! Now, it
is assumed that we have samples and what we want is the coefficient vector.
1 𝑊 (𝑊 ) (𝑊 ) … (𝑊 )
⎡ ⎤
⎢1 𝑊 (𝑊 ) (𝑊 ) … (𝑊 ) ⎥
𝑉 = ⎢1 𝑊 (𝑊 ) (𝑊 ) … (𝑊 ) ⎥
⎢… … … … … … ⎥
⎣1 𝑊 (𝑊 ) (𝑊 ) … (𝑊 ) ⎦
𝑊 𝑊 𝑊 𝑊 … 𝑊
⎡ ⎤
⎢𝑊 𝑊 𝑊 𝑊 … 𝑊 ⎥
∴ 𝑉 = ⎢𝑊 ( ) ⎥
𝑊 𝑊 𝑊 … 𝑊
⎢ ⎥
⎢… … …
( )
…
( )
…
(
…
)( )
⎥
⎣𝑊 𝑊 𝑊 𝑊 … 𝑊 ⎦
58
The inverse of above Vandermonde matrix is given by –
𝑉
𝑉 =
𝑛
where 𝑉 is the complex conjugate of V. The conjugate of a complex number 𝑒 is
𝑒 . So, according to above result,
𝑊 𝑊 𝑊 𝑊 … 𝑊
⎡ ( )
⎤
𝑊 𝑊 𝑊 𝑊 … 𝑊
1⎢ ⎥
𝑉 = ⎢𝑊 𝑊 𝑊 𝑊 … 𝑊
( ) ⎥
𝑛⎢ ⎥
⎢… …
( )
…
( )
…
( )
…
(
…
)( )
⎥
⎣𝑊 𝑊 𝑊 𝑊 … 𝑊 ⎦
Let 𝑃 = 𝑉. 𝑉 . Then,
. . . .
= 𝑒 .𝑒
. .( )
= 𝑒
𝑃 = 1= 𝑛
If 𝑗 ≠ 𝑘,
. ( )
𝑃 = 𝑒
. ( )
𝑒 −1
𝑃 =
. ( )
𝑒 −1
Now, 𝑛′𝑠 get cancelled. (𝑗 − 𝑘) is an integer. 𝑒 . .
= 1.
∴𝑃 =0
Hence proved,
𝑉
𝑉 =
𝑛
59
Finally, we can write:
𝑉
𝐴= 𝑌
𝑛
Writing the matrix equations, we can show that,
1
𝑎 = 𝑦𝑊
𝑛
1 . .
𝑎 = 𝑦𝑒
𝑛
The problem of computing IDFT is solved by same FFT algorithm only instead of
𝑊 we have to use 𝑊 , and at the end we need to divide the resulting coefficients by
n. Thus, the computation of IDFT also takes 𝑂(𝑛 log 𝑛) time.
C++ Implementation:
Here we present a simple recursive implementation of the FFT and the inverse FFT,
both in one function, since the difference between the forward and the inverse FFT are
so minimal. To store the complex numbers, we use the complex type in the C++ STL.
The function gets passed a vector of coefficients, and the function will compute the
DFT or inverse DFT and store the result again in this vector. The argument invert
shows whether the direct or the inverse DFT should be computed. Inside the function
we first check if the length of the vector is equal to one, if this is the case then we don’t
have to do anything. Otherwise we divide the vector 𝑎 into two vectors 𝑎0 and 𝑎1 and
compute the DFT for both recursively. Then we initialize the value 𝑤𝑛 and a variable
𝑤, which will contain the current power of 𝑤𝑛. Then the values of the resulting DFT
are computed using the above formulas.
If the flag invert is set, then we replace 𝑤𝑛 with 𝑤𝑛 , and each of the values of the
result is divided by 2 (since this will be done in each level of the recursion, this will
end up dividing the final values by n).
Using this function, we can create a function for multiplying two polynomials. This
function works with polynomials with integer coefficients; however, you can also adjust
it to work with other types.
60
using cd = complex<double>;
const double PI = acos(-1);
vector<int> result(n);
for (int i = 0; i < n; i++)
result[i] = round(fa[i].real());
return result;
}
61
int main() {
vector<int> a = {1, 1, 1, 1};
vector<int> b = {2, 1, 2};
vector<int> c = multiply(a, b);
return 0;
}
There are many programming problems which could be solved using FFT, especially
some problems like string matching (will be discussed in strings section). Here is one
problem, which could simply be solved using FFT.
We are given two arrays a[ ] and b[ ]. We have to find all possible sums a[i] + b[j], and
for each sum count how often it appears.
For example, for a = [1, 2, 3] and b = [2, 4] we get: then sum 3 can be obtained in 1
way, the sum 4 also in 1 way, 5 in 2, 6 in 1, 7 in 1.
We construct for the arrays 𝑎 and 𝑏, two polynomials A and B. The numbers of the
array will act as the exponents in the polynomial 𝑎[𝑖] → 𝑥 []
and the coefficients of
this term will by how often the number appears in the array.
Then, by multiplying these two polynomials in 𝑂(𝑛 log 𝑛) time, we get a polynomial C,
where the exponents will tell us which sums can be obtained, and the coefficients tell
us how often. To demonstrate this on the example:
(1𝑥 + 1𝑥 + 1𝑥 )(1𝑥 + 1𝑥 ) = 1𝑥 + 1𝑥 + 2𝑥 + 1𝑥 + 1𝑥
62
Chapter 4 – Greedy Technique
4.1 Introduction
The difficulty in designing greedy algorithms is to find a greedy strategy that always
produces an optimal solution to the problem. The locally optimal choices in a greedy
algorithm should also be globally optimal. It is often difficult to argue that a greedy
algorithm works.
As an example, we consider a problem where we are given a set of coins and our task
is to form a sum of money 𝑛 using the coins. The values of the coins are coins = {c1,
c2, ..., ck}, and each coin can be used as many times we want. What is the minimum
number of coins needed?
For example, if the coins are the euro coins (in cents) {1, 2, 5, 10, 20, 50, 100, 200}
and n = 520, we need at least four coins. The optimal solution is to select coins 200 +
200 + 100 + 20 whose sum is 520.
A simple greedy algorithm to the problem always selects the largest possible coin, until
the required sum of money has been constructed. This algorithm works in the example
case, because we first select two 200 cent coins, then one 100 cent coin and finally one
20 cent coin. But does this algorithm always work? It turns out that if the coins are
the euro coins, the greedy algorithm always works, i.e., it always produces a solution
with the fewest possible number of coins.
First, each coin 1, 5, 10, 50 and 100 appears at most once in an optimal solution,
because if the solution would contain two such coins, we could replace them by one
coin and obtain a better solution. For example, if the solution would contain coins 5 +
5, we could replace them by coin 10. In the same way, coins 2 and 20 appear at most
twice in an optimal solution, because we could replace coins 2 + 2 + 2 by coins 5 + 1
and coins 20 + 20 + 20 by coins 50 + 10. Moreover, an optimal solution cannot contain
coins 2 + 2 + 1 or 20 + 20 + 10, because we could replace them by coins 5 and 50.
Using these observations, we can show for each coin x that it is not possible to optimally
construct a sum x or any larger sum by only using coins that are smaller than x. For
example, if x = 100, the largest optimal sum using the smaller coins is 50 + 20 + 20
+ 5 + 2 + 2 = 99. Thus, the greedy algorithm that always selects the largest coin
produces the optimal solution.
63
This example shows that it can be difficult to argue that a greedy algorithm works,
even if the algorithm itself is simple.
In the general case, the coin set can contain any coins and the greedy algorithm does
not necessarily produce an optimal solution. We can prove that a greedy algorithm
does not work by showing a counterexample where the algorithm gives a wrong answer.
In this problem we can easily find a counterexample: if the coins are {1, 3, 4} and the
target sum is 6, the greedy algorithm produces the solution 4 + 1 + 1 while the optimal
solution is 3 + 3. It is not known if the general coin problem can be solved using any
greedy algorithm. However, as we will see in one of the upcoming chapters, in some
cases, the general problem can be efficiently solved using a dynamic programming
algorithm that always gives the correct answer.
Many scheduling problems can be solved using greedy algorithms. A classic problem is
as follows: Given n events with their starting and ending times, find a schedule that
includes as many events as possible. It is not possible to select an event partially. For
example, consider the following events:
In this case the maximum number of events is two. For example, we can select events
B and D as follows:
It is possible to invent several greedy algorithms for the problem, but which of them
works in every case?
The first idea is to select as short events as possible. In the example case this algorithm
selects the following events:
64
However, selecting short events is not always a correct strategy. For example, the
algorithm fails in the following case:
If we select the short event, we can only select one event. However, it would be possible
to select both long events.
Another idea is to always select the next possible event that begins as early as possible.
This algorithm selects the following events:
However, we can find a counterexample also for this algorithm. For example, in the
following case, the algorithm only selects one event:
If we select the first event, it is not possible to select any other events. However, it
would be possible to select the other two events.
The third idea is to always select the next possible event that ends as early as possible.
This algorithm selects the following events:
It turns out that this algorithm always produces an optimal solution. The reason for
this is that it is always an optimal choice to first select an event that ends as early as
possible. After this, it is an optimal choice to select the next event using the same
strategy, etc., until we cannot select any more events. One way to argue that the
algorithm works is to consider what happens if we first select an event that ends later
than the event that ends as early as possible. Now, we will have at most an equal
number of choices how we can select the next event. Hence, selecting an event that
ends later can never yield a better solution, and the greedy algorithm is correct.
65
struct Activitiy {
int start, finish;
};
int main() {
Activitiy arr[] = {{5, 9}, {1, 2}, {3, 4}, {0, 6}, {5, 7}, {8, 9}};
int n = sizeof(arr)/sizeof(arr[0]);
printMaxActivities(arr, n);
return 0;
}
Problems
66
Q. 1 Given arrival and departure times of all trains that reach a railway station, the task is
to find the minimum number of platforms required for the railway station so that no train
waits. We are given two arrays which represent arrival and departure times of trains that stop.
Solution: The idea is to consider all events in sorted order. Once the events are in sorted
order, trace the number of trains at any time keeping track of trains that have arrived, but
not departed.
int main() {
int arr[] = { 900, 940, 950, 1100, 1500, 1800 };
int dep[] = { 910, 1200, 1120, 1130, 1900, 2000 };
int n = sizeof(arr) / sizeof(arr[0]);
cout << "Platforms Required = " << minPlatforms(arr, dep, n);
return 0;
}
67
Chapter 5 – Backtracking
68
Chapter 6 – Dynamic Programming
6.1 Introduction
69