Streaming Algorithm
Streaming Algorithm
MIT
Piotr Indyk
Data Streams
source
– Need to maintain a compressed version of the
matrix
x
Data Streams
• Style: algorithmic/theoretical…
– Background in linear algebra and probability
Topics
• Streaming model. Estimating distinct elements (L0 norm)
8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6 ...
Counting Distinct Elements
• Stream elements: numbers from {1...m}
• Goal: estimate the number of distinct elements DE in
the stream
– Up to 1±ε
– With probability 1-P
• Simpler goal: for a given T>0, provide an algorithm
which, with probability 1-P:
– Answers YES, if DE> (1+ε)T
– Answers NO, if DE< (1-ε)T
• Run, in parallel, the algorithm with
T=1, 1+ε, (1+ε)2,..., n
– Total space multiplied by log1+εn ≈ log(n)/ ε
– Probability of failure multiplied by the same factor
Vector Interpretation
Stream: 8 2 1 9 1 9 2 4 4 9 4 2 5 4 2 5 8 5 2 5
Vector X:
1 2 3 4 5 6 7 8 9
• Initially, x=0
• Insertion of i is interpreted as
xi = xi +1
• Want to estimate DE(x) = ||x||0
Estimating DE(x)
Vector X:
1 2 3 4 5 6 7 8 9
Set S: + ++ (T=4)
– YES, if SumS(x)>0 Pr
0.5
– NO, if SumS(x)=0
• Analysis: 0.4 Series1
DE
Estimating DE(x) ctd.
• We have Algorithm A:
– If DE> (1+ε)T, then Pr<1/e-ε/3
– if DE< (1-ε)T, then Pr>1/e+ε/3
• Algorithm B:
– Select sets S1 … Sk , k=O(log(1/P)/ε2)
– Let Z = number of SumSj(x) that are equal to 0
– By Chernoff bound (define), with probability >1-P
• If DE> (1+ε)T, then Z<k/e
• if DE< (1-ε)T, then Z>k/e
Vector X:
1 2 3 4 5 6 7 8 9
[Alon-Matias-Szegedy’96, Feigenbaum-Kannan-Strauss-Viswanathan’99,
Indyk’00, Coppersmith-Kumar’04, Ganguly’04, Bar-Yossef-Jayram-
Kumar-Sivakumar’02’03, Saks-Sun’03, Indyk-Woodruff’05]