Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

fm algorithm

The Flajolet-Martin (FM) Algorithm is used to estimate the number of distinct elements in a stream by hashing elements to integers and analyzing the longest sequence of trailing zeros in their binary representation. The maximum trailing zero count, R, is used to estimate the number of distinct elements as 2^R. The document provides examples using different hash functions and illustrates how to compute the estimates based on the hash values obtained from the stream elements.

Uploaded by

Anbumozhy T.S.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

fm algorithm

The Flajolet-Martin (FM) Algorithm is used to estimate the number of distinct elements in a stream by hashing elements to integers and analyzing the longest sequence of trailing zeros in their binary representation. The maximum trailing zero count, R, is used to estimate the number of distinct elements as 2^R. The document provides examples using different hash functions and illustrates how to compute the estimates based on the hash values obtained from the stream elements.

Uploaded by

Anbumozhy T.S.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Give problem in Flajolet-Martin (FM) Algorithm to

count distinct elements in a stream.


To estimate the number of different elements appearing in a stream, we can
hash elements to integers interpreted as binary numbers. 2 raised to the
power that is the longest sequence of 0's seen in the hash value of any
stream element is an estimate of the number of different elements.
Eg. Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = (ax + b) mod 32
a) h(x) = 3x + 1 mod 32
b) h(x) = x + 6 mod 32

a) h(x) = 3x + 7 mod 32
h(4) = 3(4) + 7 mod 32 = 19 mod 32 = 19 = (10011)
h(2) = 3(2) + 7 mod 32 = 13 mod 32 = 13 = (01101)
h(5) = 3(5) + 7 mod 32 = 22 mod 32 = 22 = (10110)
h(9) = 3(9) + 7 mod 32 = 34 mod 32 = 2 = (00010)
h(1) = 3(1) + 7 mod 32 = 10 mod 32 = 10 = (01010)
h(6) = 3(6) + 7 mod 32 = 25 mod 32 = 25 = (11001)
h(3) = 3(3) + 7 mod 32 = 16 mod 32 = 16 = (10000)
h(7) = 3(7) + 7 mod 32 = 28 mod 32 = 28 = (11100)
Trailing zero's {0, 0, 1, 1, 1, 0, 4, 2}
R = max [Trailing Zero] = 4
Output = 2R = 24 = 16

b) h(x) = x + 6 mod 32
h(4) = (4) + 6 mod 32 = 10 mod 32 = 10 = (01010)
h(2) = (2) + 6 mod 32 = 8 mod 32 = 8 = (01000)
h(5) = (5) + 6 mod 32 = 11 mod 32 = 11 = (01011)
h(9) = (9) + 6 mod 32 = 15 mod 32 = 15 = (01111)
h(1) = (1) + 6 mod 32 = 7 mod 32 = 7 = (00111)
h(6) = (6) + 6 mod 32 = 12 mod 32 = 12 = (01110)
h(3) = (3) + 6 mod 32 = 9 mod 32 = 9 = (01001)
h(7) = (7) + 6 mod 32 = 13 mod 32 = 13 = (01101)
Trailing zero's {1, 3, 0, 0, 0, 1, 0, 0}
R = max [Trailing Zero] = 3
Output = 2R = 23 = 8

S=1,3,2,1,2,3,4,3,1,2,3,1S=1,3,2,1,2,3,4,3,1,2,3,1
h(x)=(6x+1) mod 5h(x)=(6x+1) mod 5
Assume |b| = 5

x h(x) Rem Binary r(a)

1 7 2 00010 1
x h(x) Rem Binary r(a)

3 19 4 00100 2

2 13 3 00011 0

1 7 2 00010 1

2 13 3 00011 0

3 19 4 00100 2

4 25 0 00000 5

3 19 4 00100 2

1 7 2 00010 1

2 13 3 00011 0

3 19 4 00100 2

1 7 2 00010 1

R = max( r(a) ) = 5
 We may want to know how many different elements have appeared in the stream.
 For example, we wish to know how many distinct users visited the website till now
or in last 2 hours.
 If no of distinct elements required to process many streams then keeping data in
main memory is challenge.
 FM algorithm gives an efficient way to count the distinct elements in a stream.
 It is possible to estimate the no. of distinct elements by hashing the elements of
the universal set to a bit string that is sufficiently long.
 The length of the bit string must be sufficient that there are more possible results
of the hash function than there are elements in the universal set.
 Whenever we apply a hash function h to a stream element a, the bit string h(a) will
end in some number of oS, possibly none.
 Call this as tail length for a hash.
 Let R be the maximum tail length of any a seen so far in the stream.
 Then we shall use estimate 2R2R for the number of distinct elements seen in the
stream.
 Consider a stream as:
S = {1, 2, 1, 3}
Let hash function be 2x + 2 mod 4

 When we apply the hash function we get reminder represented in binary as follows:

000, 101, 000 considering bit string length as 3.


 Maximum tail length R will be 3.
 No of distinct elements will be

 Here the estimates may be too large or too low depending on hash function.
 We may apply multiple hash functions and combine the estimate to get near
accurate values.

You might also like