Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Skyline Operator
on Anti-correlated Distribution
Proceedings of the VLDB(2013) Endowment, Vol. 6 No. 9
Haichuan Shang, Masaru Kitsuregawa
Presenter:
WooSung Choi
(ws_choi@korea.ac.kr)
DataKnow. Lab
Korea UNIV.
Background
Related work
Preliminaries
• Formal definition of Dominates (≺)
 Given a set of d-dimensional points 𝑇
 We say that a point t1 ∈ 𝑇 DOMINATES another point t2 ∈ 𝑇
 If and only if
 ∀𝑖 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑖 ≤ 𝑡2[𝑖]
 ∃𝑗 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑗 < 𝑡2[𝑗]
 and Denoted by t1 ≺ t2
 (simply saying, t1 이 자명하게 선호됨)
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf
Note that
the meaning of ‘dominates’ may differ
according to type of application
www.caranddriver.com
formal Definition (skyline)
• The Skyline operator
 Input - Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝 𝑁}
 𝑆𝐾𝑌𝐿𝐼𝑁𝐸 𝑃 = {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗
∈ 𝑃 𝑠. 𝑡. 𝑝∗
≺ 𝑝𝑖}
A
B
C
D
E
F
Dominating Area(B)
x axis
yaxis
G
Common misconceptions
“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡 s𝑖𝑛𝑐𝑒 𝐵 ≺ 𝐶 , D, F” , wrong
“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡,
s𝑖𝑛𝑐𝑒 𝑛𝑜 𝑜𝑡ℎ𝑒𝑟 𝑝𝑜𝑖𝑛𝑡 𝑃 ≺ 𝐵”, correct
 Suppose there are n objects in the given set
 𝐷 𝑥 = {𝑜1, 𝑜2, … , 𝑜 𝑛}
 Algorithm -Naïve 1
 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜 𝑥 ∈ 𝐷
 𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒
 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜 𝑦 ∈ 𝐷
 𝑖𝑓 ¬(𝑜 𝑥 = 𝑜 𝑦) 𝐴𝑁𝐷 ¬ 𝑜 𝑦 ≺ 𝑜 𝑥 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;
 𝑒𝑙𝑠𝑒
 𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;
 break;
 𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜 𝑥}
Naïveapproach
NestedLoopStructure
Computational Cost - 𝑂(𝑛2
)
Motivation
Data Distribution
Data Distribution?
Related Work: Summary
• Worst-case Analysis (2.1)
 worst case complexity on arbitrary data distributions
 Ω(𝑛𝑙𝑜𝑔𝑛)[16], O( N/B logM/B
𝑑−2
N/B )[12]
• Elimination Category (2.2)
 Average Complexity with dimensional independence
 Idea: Eliminate non-skyline objects quickly!
 BNL[7], SFS[9], LESS[12], …
 O(dnm)[20], where 𝑚 is the skyline cardinalityO(dnm)[20], where 𝑚 is the skyline cardinality
Anti-Correlation은 왜 중요한가?
Anti-Correlated (2)
•A relationship in which
 the value in one dimension increases as the values in the other
dimensions decrease
•Skyline Queries
are used to find a set of non-dominated data points
for Multi-Criteria Decision Making
•Data in real world
 is more likely to be anti- correlated
Anti-Correlated (3)
• The anti-correlation significantly limits the practical
usage of the existing algorithms
• and yields the demand of effective mathematical
models and efficient algorithms on anti-correlated data
O(dnm)[20], where 𝑚 is the skyline cardinality
𝑚 tends to increase on anti-correlated distribution
These existing algorithms fall back to O(dn2)
뭘 하겠다는 연구인가?
공헌도
Contribution
• 1) General model for the anti-correlated distribution
• 2) Polynomial Estimation of the lower bound of the
expected value of skyline cardinality
• 3) a “Determination and Elimination Framework” for
efficient computation of skyline on anti-correlated
distribution
3. PRELIMINIARIES
Definition & Expectation of Skyline Cardinality
Model: Anti-Correlated Distribution
0
1000
2000
3000
4000
5000
6000
7000
8000
0 2000 4000 6000 8000 10000 12000
Uniform
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=1
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=0.1
1) General model for the anti-correlated distribution
1K Tuples
0
1000
2000
3000
4000
5000
6000
7000
8000
0 2000 4000 6000 8000 10000 12000
Uniform
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=1
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=0.1
12 57 116
1) General model for the anti-correlated distribution
1K Tuples
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000 12000
Anti c=1
57
𝑆2,1000,1 ≈ 1000 ∗ 𝜋 − 1
= 55.0499122
2) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality
Generalization
• Theorem 3
 The expected value 𝑆 𝑑,𝑛,𝑐 of the skyline cardinality
 𝑆 𝑑,𝑛,1 ≤ 𝑆 𝑑,𝑛,𝑐 ≤ 𝑆 𝑑,𝑛,0 = 𝑛
 𝑆 𝑑,𝑛,1 = 𝑘=1
𝑑
−1 𝑘−1 𝑑−1
𝑘−1
𝑛
Γ
𝑘
𝑑
Γ(n)
Γ(𝑛+
𝑘
𝑑
)
 ≈ 𝑘=1
𝑑
−1 𝑘−1 𝑑−1
𝑘−1
Γ
𝑘
𝑑
𝑛1−
𝑘
𝑑
 when d ≥ 2
• Where Γ 𝑛 =
1
2𝜋 0
∞
𝑒−𝑡
𝑡 𝑛
𝑑𝑡
2) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality
O(dnm)[20], where 𝑚 is the skyline cardinality
𝑚 tends to increase on anti-correlated distribution
These existing algorithms: O(𝑑𝑛(2𝑑−1)/𝑑) ~ O(dn2)
Pearson Correlation Coefficient
or covariance based model
공분산
• 확률론과 통계학에서, 공분산(共分散, 영어: covariance)
은 2개의 확률변수의 상관정도를 나타내는 값
• 만약 2개의 변수중 하나의 값이 상승하는 경향을 보일
때, 다른 값도 상승하는 경향의 상관관계에 있다면, 공분
산의 값은 양수
• 반대로 2개의 변수중 하나의 값이 상승하는 경향을 보일
때, 다른 값이 하강하는 경향을 보인다면 공분산의 값은
음수

More Related Content

[Vldb 2013] skyline operator on anti correlated distributions

  • 1. Skyline Operator on Anti-correlated Distribution Proceedings of the VLDB(2013) Endowment, Vol. 6 No. 9 Haichuan Shang, Masaru Kitsuregawa Presenter: WooSung Choi (ws_choi@korea.ac.kr) DataKnow. Lab Korea UNIV.
  • 3. Preliminaries • Formal definition of Dominates (≺)  Given a set of d-dimensional points 𝑇  We say that a point t1 ∈ 𝑇 DOMINATES another point t2 ∈ 𝑇  If and only if  ∀𝑖 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑖 ≤ 𝑡2[𝑖]  ∃𝑗 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑗 < 𝑡2[𝑗]  and Denoted by t1 ≺ t2  (simply saying, t1 이 자명하게 선호됨) Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf Note that the meaning of ‘dominates’ may differ according to type of application www.caranddriver.com
  • 4. formal Definition (skyline) • The Skyline operator  Input - Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝 𝑁}  𝑆𝐾𝑌𝐿𝐼𝑁𝐸 𝑃 = {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗ ∈ 𝑃 𝑠. 𝑡. 𝑝∗ ≺ 𝑝𝑖} A B C D E F Dominating Area(B) x axis yaxis G Common misconceptions “𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡 s𝑖𝑛𝑐𝑒 𝐵 ≺ 𝐶 , D, F” , wrong “𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡, s𝑖𝑛𝑐𝑒 𝑛𝑜 𝑜𝑡ℎ𝑒𝑟 𝑝𝑜𝑖𝑛𝑡 𝑃 ≺ 𝐵”, correct
  • 5.  Suppose there are n objects in the given set  𝐷 𝑥 = {𝑜1, 𝑜2, … , 𝑜 𝑛}  Algorithm -Naïve 1  𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜 𝑥 ∈ 𝐷  𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒  𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜 𝑦 ∈ 𝐷  𝑖𝑓 ¬(𝑜 𝑥 = 𝑜 𝑦) 𝐴𝑁𝐷 ¬ 𝑜 𝑦 ≺ 𝑜 𝑥 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;  𝑒𝑙𝑠𝑒  𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;  break;  𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜 𝑥} Naïveapproach NestedLoopStructure Computational Cost - 𝑂(𝑛2 )
  • 8. Related Work: Summary • Worst-case Analysis (2.1)  worst case complexity on arbitrary data distributions  Ω(𝑛𝑙𝑜𝑔𝑛)[16], O( N/B logM/B 𝑑−2 N/B )[12] • Elimination Category (2.2)  Average Complexity with dimensional independence  Idea: Eliminate non-skyline objects quickly!  BNL[7], SFS[9], LESS[12], …  O(dnm)[20], where 𝑚 is the skyline cardinalityO(dnm)[20], where 𝑚 is the skyline cardinality
  • 10. Anti-Correlated (2) •A relationship in which  the value in one dimension increases as the values in the other dimensions decrease •Skyline Queries are used to find a set of non-dominated data points for Multi-Criteria Decision Making •Data in real world  is more likely to be anti- correlated
  • 11. Anti-Correlated (3) • The anti-correlation significantly limits the practical usage of the existing algorithms • and yields the demand of effective mathematical models and efficient algorithms on anti-correlated data O(dnm)[20], where 𝑚 is the skyline cardinality 𝑚 tends to increase on anti-correlated distribution These existing algorithms fall back to O(dn2)
  • 13. Contribution • 1) General model for the anti-correlated distribution • 2) Polynomial Estimation of the lower bound of the expected value of skyline cardinality • 3) a “Determination and Elimination Framework” for efficient computation of skyline on anti-correlated distribution
  • 14. 3. PRELIMINIARIES Definition & Expectation of Skyline Cardinality
  • 15. Model: Anti-Correlated Distribution 0 1000 2000 3000 4000 5000 6000 7000 8000 0 2000 4000 6000 8000 10000 12000 Uniform 0 1000 2000 3000 4000 5000 6000 0 2000 4000 6000 8000 10000 12000 Anti c=1 0 1000 2000 3000 4000 5000 6000 0 2000 4000 6000 8000 10000 12000 Anti c=0.1 1) General model for the anti-correlated distribution
  • 16. 1K Tuples 0 1000 2000 3000 4000 5000 6000 7000 8000 0 2000 4000 6000 8000 10000 12000 Uniform 0 1000 2000 3000 4000 5000 6000 0 2000 4000 6000 8000 10000 12000 Anti c=1 0 1000 2000 3000 4000 5000 6000 0 2000 4000 6000 8000 10000 12000 Anti c=0.1 12 57 116 1) General model for the anti-correlated distribution
  • 17. 1K Tuples 0 1000 2000 3000 4000 5000 6000 0 2000 4000 6000 8000 10000 12000 Anti c=1 57 𝑆2,1000,1 ≈ 1000 ∗ 𝜋 − 1 = 55.0499122 2) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality
  • 18. Generalization • Theorem 3  The expected value 𝑆 𝑑,𝑛,𝑐 of the skyline cardinality  𝑆 𝑑,𝑛,1 ≤ 𝑆 𝑑,𝑛,𝑐 ≤ 𝑆 𝑑,𝑛,0 = 𝑛  𝑆 𝑑,𝑛,1 = 𝑘=1 𝑑 −1 𝑘−1 𝑑−1 𝑘−1 𝑛 Γ 𝑘 𝑑 Γ(n) Γ(𝑛+ 𝑘 𝑑 )  ≈ 𝑘=1 𝑑 −1 𝑘−1 𝑑−1 𝑘−1 Γ 𝑘 𝑑 𝑛1− 𝑘 𝑑  when d ≥ 2 • Where Γ 𝑛 = 1 2𝜋 0 ∞ 𝑒−𝑡 𝑡 𝑛 𝑑𝑡 2) Polynomal Estimation of the lowerbound of the expected value of skyline cardinality O(dnm)[20], where 𝑚 is the skyline cardinality 𝑚 tends to increase on anti-correlated distribution These existing algorithms: O(𝑑𝑛(2𝑑−1)/𝑑) ~ O(dn2)
  • 19. Pearson Correlation Coefficient or covariance based model
  • 20. 공분산 • 확률론과 통계학에서, 공분산(共分散, 영어: covariance) 은 2개의 확률변수의 상관정도를 나타내는 값 • 만약 2개의 변수중 하나의 값이 상승하는 경향을 보일 때, 다른 값도 상승하는 경향의 상관관계에 있다면, 공분 산의 값은 양수 • 반대로 2개의 변수중 하나의 값이 상승하는 경향을 보일 때, 다른 값이 하강하는 경향을 보인다면 공분산의 값은 음수