fm algorithm
fm algorithm
a) h(x) = 3x + 7 mod 32
h(4) = 3(4) + 7 mod 32 = 19 mod 32 = 19 = (10011)
h(2) = 3(2) + 7 mod 32 = 13 mod 32 = 13 = (01101)
h(5) = 3(5) + 7 mod 32 = 22 mod 32 = 22 = (10110)
h(9) = 3(9) + 7 mod 32 = 34 mod 32 = 2 = (00010)
h(1) = 3(1) + 7 mod 32 = 10 mod 32 = 10 = (01010)
h(6) = 3(6) + 7 mod 32 = 25 mod 32 = 25 = (11001)
h(3) = 3(3) + 7 mod 32 = 16 mod 32 = 16 = (10000)
h(7) = 3(7) + 7 mod 32 = 28 mod 32 = 28 = (11100)
Trailing zero's {0, 0, 1, 1, 1, 0, 4, 2}
R = max [Trailing Zero] = 4
Output = 2R = 24 = 16
b) h(x) = x + 6 mod 32
h(4) = (4) + 6 mod 32 = 10 mod 32 = 10 = (01010)
h(2) = (2) + 6 mod 32 = 8 mod 32 = 8 = (01000)
h(5) = (5) + 6 mod 32 = 11 mod 32 = 11 = (01011)
h(9) = (9) + 6 mod 32 = 15 mod 32 = 15 = (01111)
h(1) = (1) + 6 mod 32 = 7 mod 32 = 7 = (00111)
h(6) = (6) + 6 mod 32 = 12 mod 32 = 12 = (01110)
h(3) = (3) + 6 mod 32 = 9 mod 32 = 9 = (01001)
h(7) = (7) + 6 mod 32 = 13 mod 32 = 13 = (01101)
Trailing zero's {1, 3, 0, 0, 0, 1, 0, 0}
R = max [Trailing Zero] = 3
Output = 2R = 23 = 8
S=1,3,2,1,2,3,4,3,1,2,3,1S=1,3,2,1,2,3,4,3,1,2,3,1
h(x)=(6x+1) mod 5h(x)=(6x+1) mod 5
Assume |b| = 5
1 7 2 00010 1
x h(x) Rem Binary r(a)
3 19 4 00100 2
2 13 3 00011 0
1 7 2 00010 1
2 13 3 00011 0
3 19 4 00100 2
4 25 0 00000 5
3 19 4 00100 2
1 7 2 00010 1
2 13 3 00011 0
3 19 4 00100 2
1 7 2 00010 1
R = max( r(a) ) = 5
We may want to know how many different elements have appeared in the stream.
For example, we wish to know how many distinct users visited the website till now
or in last 2 hours.
If no of distinct elements required to process many streams then keeping data in
main memory is challenge.
FM algorithm gives an efficient way to count the distinct elements in a stream.
It is possible to estimate the no. of distinct elements by hashing the elements of
the universal set to a bit string that is sufficiently long.
The length of the bit string must be sufficient that there are more possible results
of the hash function than there are elements in the universal set.
Whenever we apply a hash function h to a stream element a, the bit string h(a) will
end in some number of oS, possibly none.
Call this as tail length for a hash.
Let R be the maximum tail length of any a seen so far in the stream.
Then we shall use estimate 2R2R for the number of distinct elements seen in the
stream.
Consider a stream as:
S = {1, 2, 1, 3}
Let hash function be 2x + 2 mod 4
When we apply the hash function we get reminder represented in binary as follows:
Here the estimates may be too large or too low depending on hash function.
We may apply multiple hash functions and combine the estimate to get near
accurate values.