[Vldb 2013] skyline operator on anti correlated distributions

Skyline Operator
on Anti-correlated Distribution
Proceedings of the VLDB(2013) Endowment, Vol. 6 No. 9
Haichuan Shang, Masaru Kitsuregawa
Presenter:
WooSung Choi
(ws_choi@korea.ac.kr)
DataKnow. Lab
Korea UNIV.

Preliminaries
• Formal definition of Dominates (≺)
 Given a set of d-dimensional points 𝑇
 We say that a point t1 ∈ 𝑇 DOMINATES another point t2 ∈ 𝑇
 If and only if
 ∀𝑖 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑖 ≤ 𝑡2[𝑖]
 ∃𝑗 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑗 < 𝑡2[𝑗]
 and Denoted by t1 ≺ t2
 (simply saying, t1 이 자명하게 선호됨)
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf
Note that
the meaning of ‘dominates’ may differ
according to type of application
www.caranddriver.com

formal Definition (skyline)
• The Skyline operator
 Input - Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝 𝑁}
 𝑆𝐾𝑌𝐿𝐼𝑁𝐸 𝑃 = {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗
∈ 𝑃 𝑠. 𝑡. 𝑝∗
≺ 𝑝𝑖}
A
B
C
D
E
F
Dominating Area(B)
x axis
yaxis
G
Common misconceptions
“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡 s𝑖𝑛𝑐𝑒 𝐵 ≺ 𝐶 , D, F” , wrong
“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡,
s𝑖𝑛𝑐𝑒 𝑛𝑜 𝑜𝑡ℎ𝑒𝑟 𝑝𝑜𝑖𝑛𝑡 𝑃 ≺ 𝐵”, correct

 Suppose there are n objects in the given set
 𝐷 𝑥 = {𝑜1, 𝑜2, … , 𝑜 𝑛}
 Algorithm -Naïve 1
 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜 𝑥 ∈ 𝐷
 𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒
 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜 𝑦 ∈ 𝐷
 𝑖𝑓 ¬(𝑜 𝑥 = 𝑜 𝑦) 𝐴𝑁𝐷 ¬ 𝑜 𝑦 ≺ 𝑜 𝑥 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;
 𝑒𝑙𝑠𝑒
 𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;
 break;
 𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜 𝑥}
Naïveapproach
NestedLoopStructure
Computational Cost - 𝑂(𝑛2
)

Related Work: Summary
• Worst-case Analysis (2.1)
 worst case complexity on arbitrary data distributions
 Ω(𝑛𝑙𝑜𝑔𝑛)[16], O( N/B logM/B
𝑑−2
N/B )[12]
• Elimination Category (2.2)
 Average Complexity with dimensional independence
 Idea: Eliminate non-skyline objects quickly!
 BNL[7], SFS[9], LESS[12], …
 O(dnm)[20], where 𝑚 is the skyline cardinalityO(dnm)[20], where 𝑚 is the skyline cardinality

Anti-Correlation은 왜 중요한가?

Anti-Correlated (2)
•A relationship in which
 the value in one dimension increases as the values in the other
dimensions decrease
•Skyline Queries
are used to find a set of non-dominated data points
for Multi-Criteria Decision Making
•Data in real world
 is more likely to be anti- correlated

Anti-Correlated (3)
• The anti-correlation significantly limits the practical
usage of the existing algorithms
• and yields the demand of effective mathematical
models and efficient algorithms on anti-correlated data
O(dnm)[20], where 𝑚 is the skyline cardinality
𝑚 tends to increase on anti-correlated distribution
These existing algorithms fall back to O(dn2)

뭘 하겠다는 연구인가?
공헌도

Contribution
• 1) General model for the anti-correlated distribution
• 2) Polynomial Estimation of the lower bound of the
expected value of skyline cardinality
• 3) a “Determination and Elimination Framework” for
efficient computation of skyline on anti-correlated
distribution

3. PRELIMINIARIES
Definition & Expectation of Skyline Cardinality

Model: Anti-Correlated Distribution
0
1000
2000
3000
4000
5000
6000
7000
8000
0 2000 4000 6000 8000 10000 12000
Uniform
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=1
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=0.1
1) General model for the anti-correlated distribution

1K Tuples
0
1000
2000
3000
4000
5000
6000
7000
8000
0 2000 4000 6000 8000 10000 12000
Uniform
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=1
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=0.1
12 57 116
1) General model for the anti-correlated distribution

1K Tuples
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=1
57
𝑆2,1000,1 ≈ 1000 ∗ 𝜋 − 1
= 55.0499122
2) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality

Generalization
• Theorem 3
 The expected value 𝑆 𝑑,𝑛,𝑐 of the skyline cardinality
 𝑆 𝑑,𝑛,1 ≤ 𝑆 𝑑,𝑛,𝑐 ≤ 𝑆 𝑑,𝑛,0 = 𝑛
 𝑆 𝑑,𝑛,1 = 𝑘=1
𝑑
−1 𝑘−1 𝑑−1
𝑘−1
𝑛
Γ
𝑘
𝑑
Γ(n)
Γ(𝑛+
𝑘
𝑑
)
 ≈ 𝑘=1
𝑑
−1 𝑘−1 𝑑−1
𝑘−1
Γ
𝑘
𝑑
𝑛1−
𝑘
𝑑
 when d ≥ 2
• Where Γ 𝑛 =
1
2𝜋 0
∞
𝑒−𝑡
𝑡 𝑛
𝑑𝑡
2) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality
O(dnm)[20], where 𝑚 is the skyline cardinality
𝑚 tends to increase on anti-correlated distribution
These existing algorithms: O(𝑑𝑛(2𝑑−1)/𝑑) ~ O(dn2)

Pearson Correlation Coefficient
or covariance based model

공분산
• 확률론과 통계학에서, 공분산(共分散, 영어: covariance)
은 2개의 확률변수의 상관정도를 나타내는 값
• 만약 2개의 변수중 하나의 값이 상승하는 경향을 보일
때, 다른 값도 상승하는 경향의 상관관계에 있다면, 공분
산의 값은 양수
• 반대로 2개의 변수중 하나의 값이 상승하는 경향을 보일
때, 다른 값이 하강하는 경향을 보인다면 공분산의 값은
음수

[Vldb 2013] skyline operator on anti correlated distributions

More Related Content

[Vldb 2013] skyline operator on anti correlated distributions