01 Streaming PDF
01 Streaming PDF
John Augustine
Jan 16, 2020
1 2
3 4
24-01-2020
5 6
7 8
24-01-2020
Problem Formulation
Counting • Stream on 𝑛 numbers from {1, 2, … , 𝑚}. Material not covered in class
but required to know (and
• Query: how many different numbers?
Distinct
therefore asked in
tests/quizzes/exams) will be
• Goal: memory 𝑂(log 𝑚) bits mentioned in a cloud like this.
Elements You are free to search other
sources (Internet, textbook).
• Deterministic solutions?
• 𝑛 = 𝑚 + 1 stream elements require m bits.
• Proof:
• Suppose after 𝑚 stream items only m-1 bits.
• 2 − 1 possible subsets could be seen, but only 2 states for memory.
• Thus, some two subsets 𝑆 and 𝑆 represented by same state.
• If those subsets of different sizes, error.
In a Stream • Otherwise, if same size, then what if (𝑚 + 1)th item is from 𝑆 \S ?
9 10
Why 6?
This and other subsequent figures & screen clips are from
Blum, Hopcroft, and Kannan unless specified otherwise.
11 12
24-01-2020
Algorithm Claim:
• Assume availability of two-universal hash function Let 𝑏 , 𝑏 , … , 𝑏 be the 𝑑 distinct items encountered in the stream.
ℎ: 1, 2, … , 𝑚 → {1, 2, … , 𝑀} Then,
where 𝑀 > 𝑚. How to get 2-
𝑑 2 𝑑 1
universal hash
Pr ≤ 𝑑 ≤ 6𝑑 ≥ − >
• Pr ℎ 𝑎 = 𝑥 = 1/𝑀. 6 3 𝑀 2
functions? Why is it
2-universal? (when 𝑀 ≫ 𝑑).
• Pr ℎ 𝑎 = 𝑥 ∧ ℎ 𝑏 = 𝑦 = .
• Algorithm: hash each value in the stream and only remember the
smallest value 𝑠. On being queried, report 𝑑 = 𝑀/𝑠.
13 14
15 16
24-01-2020
17 18
Homework
6𝑀 6𝑀 How to boost probability and achieve the following claim?
Pr 𝑠 ≥ = Pr ∀𝑖, ℎ 𝑏 ≥
𝑑 𝑑
For any 𝑐 > 0,
= Pr 𝑌 = 0 𝑑 1
Pr ≤ 𝑑 ≤ 6𝑑 ≥ 1 −
6 𝑚
≤ Pr 𝑌 − 𝐸 𝑌 ≥ 𝐸 𝑌 Final result follows by the union
bound which says:
Pr 𝐸 ∪ 𝐸 ∪ ⋯ ≤ Pr 𝐸 Hint: Use an appropriate number of repetitions and use the median
𝑉𝑎𝑟 𝑌 1 1 value. For analysis (i.e., proving the above claim), use Chernoff bounds.
≤ ≤ ≤ .
𝐸𝑌 𝐸𝑌 6 (You will only need Equation (5).)
19 20
24-01-2020
Chernoff Bounds (Mitzenmacher and Upfal) Chernoff Bounds (Mitzenmacher and Upfal)
Let 𝑋 , 𝑋 , … , 𝑋 be independent binomial trials with Pr 𝑋 = 1 = 𝑝. Let Let 𝑋 , 𝑋 , … , 𝑋 be independent binomial trials with Pr 𝑋 = 1 = 𝑝.
𝑋 = 𝑋 + 𝑋 + ⋯ + 𝑋 and 𝜇 = 𝐸 𝑋 = 𝑛𝑝. Then, the following hold. Let 𝑋 = 𝑋 + 𝑋 + ⋯ + 𝑋 and 𝜇 = 𝐸 𝑋 = 𝑛𝑝. Then, for 0 < 𝛿 < 1,
1. For any 𝛿 > 0, 𝑒
𝑒 Pr 𝑋 ≤ 1 − 𝛿 𝜇 ≤ , (4)
1−𝛿
Pr 𝑋 ≥ 1 + 𝛿 𝜇 ≤ . (1)
1+𝛿
2. For 0 < 𝛿 ≤ 1, Pr 𝑋 ≤ 1 − 𝛿 𝜇 ≤ 𝑒 , (5)
21 22
𝑓 .
23 24
24-01-2020
𝑝 = ∞ captures frequency of the most frequent element. Alon, Mattias, and Szegedy (AMS)
25 26
Claim:
The AMS Algorithm
• For each 𝑠 ∈ {1,2, … , 𝑚}, let 𝑥 be ±1 with probability ½ each.
𝑎= 𝑥 𝑓
27 28
24-01-2020
Claim:
𝑉𝑎𝑟 𝑋
Pr 𝑋 − 𝐸 𝑋 > 𝜖𝐸 𝑋 ≤ ≤𝛿
If one of 𝑠, 𝑡, 𝑢, or 𝑣 𝜖 𝐸 𝑋
is different from the
others, the
expectation of that
term vanishes. Why?
29 30