07 Kdtrees
07 Kdtrees
07 Kdtrees
T:
Instead of finding an exact match,
find all items whose keys fall k!
between a range of values, e.g.,
between m and z, inclusive! d! p!
!
Example applications:!
Lecture 7: BST Range Search! a! f! m! z!
k-d Trees!
Binary Space Partitioning Trees! l! n!
Nearest-Neighbor Search!
z!
searchrange, subtreerange!
void! n!
rangesearch(T, [m-z], [a-z], results)!
rangesearch(Link root, Key searchrange[],
(T’s range is “whole universe”)! m!
Key subtreerange[], List results)! is k in [m-z]?! [a-z]
! p!
does [a-j] overlap [m-z]?! T: k!
1. if root is in search range," T: does [l-z] overlap [m-z]?!
add root to results! k! search p’s subtree! [a-j] [l-z]
is p in [m-z]? results p!
2. compute range of left subtree! d! p!
does [l-o] overlap [m-z]?!
3. if search range covers all or" d! p! search m’s subtree! [l-o] [q-z]
part of left subtree, search left! is m in [m-z]? results m!
4. compute range of right subtree! does [l-l] overlap [m-z]?! a! f! m! z!
5. if search range covers all or" a! f! m! z! does [n-o] overlap [m-z]?! [l-l] [n-o]
part of right subtree, search right! search n’s subtree!
is n in [m-z]? results n! l! n!
6. return results! does [q-z] overlap [m-z]?!
l! n!
! search z’s subtree!
(Other traversal orders are also ok)! is z in [m-z]? results z!
BST Range Search: Support Functions! BST Range Search: Other Details !
1. if root is in search range, i.e.,!
root->key <= searchrange[MAX], and! How to express range when the keys are floats?!
root->key >= searchrange[MIN]! • be careful with numerical precision and floating point
add node to results! errors [one of this week’s discussion topic]!
!
2. compute subtree’s range: replace upper (lower) bound How to support duplicate keys?!
of left (right) subtree’s range by root->key-1 (+1)! • be consistent about using ≤ or <"
• if ≤, the range for the left subtree would be closed, e.g.,
3. if search range covers all or part of subtree’s range," [–∞, 0], and the range for the right subtree half open,
search subtree! e.g., (0, +∞]!
• each subtree covers a range of key values!
• compute overlap between subtree’s range and search range!
• no overlap if either"
searchrange[MAX] < subtreerange[MIN] or
searchrange[MIN] > subtreerange[MAX]!
20,20! 50,30! 90,60! y! 20,20! 50,30! 90,60! y! 20,20! 50,30! 90,60! y! 20,20! 60,20! 90,60! y!
20,20! 60,20! 90,60! y! 20,20! 60,20! 90,60! y! 20,20! 50,30! 90,60! y! 20,20! 60,20! 90,60! y!
Plane 2
0 object e! a
f
b
1a 1b
Plane 1a
Plane 1b Blue draws a straight line
Plane 0
from itself to e! g
A B C 2
A C Now it needs to know d
c
line! j
Each internal node holds a divider plane! What is the brute force
l
obstructing objects?!
y=30
go to the right subtree! l
j E
B B4 B
d
B4 4B
y=45
d
4 4
y=45
d y=45
d y=45
d y=45
l
• repeat for the two k
k
partitions, cycling through i n
i n
C
f
D
c
C
f
C D
f c
CD
fc
C
D
fc
D
c
the coordinates! e
m
e
m
e
m
e
m
e
m
Collision Detection with BSP! Collision Detection with BSP!
x=30 x=52
1 1 1 1 1 1 3
Blue searches the BSP and x=30 x=30 x=30 x=30 x=30 Blue found out that c " A B C
f
found that at x=30, e is to its is in its way to e! a 4
✗
a, b, g, h, i, j, k! A A A 2 A2
2 2 2
So it needs to navigate
y=30 A y=30 y=30 y=30 y=30
a, b , g a, b , g a, b , g a, b , g a, b , g around c! D
✗
c
can ignore objects l, n! 3 3E 3 E3 E 3 E E d
✗
2 2 D
A 2 A2 2
to compute intersection
A A y=30 A y=30 y=30 y=30
go to e!
a, bwith?!
y=30
,g a, b , g a, b , g a, b , g a, b , g
h, i, j, k h, i, j, k h, i, j, k g
h, i, j, k
Blue is in the left side of the
h, i, j, k
To get to e, it needs to
✗
c
3 E 3 E
x=52 partition, and only
3
x=52
3E
x=52 x=52
E3
x=52 l x=52
E
l
find if there are other d
e
l l l
object d is on the same side! n n n n n objects in its way! h
needs to compute l
E D G F
E D G F E D G F
E D G F
Returning back to A, current (new) search range no In this example, the k-d tree is formed by splitting
longer overlaps A’s right child, prune the right child, planes, similar to an axis-aligned BSP, hence the
return result! internal nodes do not always hold data points!
[wikipedia]! [Sellarès]!
Since the root doesn’t hold a data point, the search As we traverse to the second level of the k-d tree,
range remains at ∞ as we search the right side of we still cannot reduce the search range!
the k-d tree where the reference point is!
[Sellarès]! [Sellarès]!
Nearest Neighbor Search! Nearest Neighbor Search!
We’ve reached a leaf node and found two data points Returning back to the parent node, we found that the
that match our query (assume all data points do), search range overlaps the range of the right subtree,
reduce the search range to the minimum distance search the right subtree"
between the reference point and the two data points"
[Sellarès]! [Sellarès]!
Found a closer neighbor in the left branch of the right Using current search range and each subtree’s range,
subtree, further reduce the search range" prune parts of the tree that could NOT include the
nearest neighbor!
[Sellarès]! [Sellarès]!
Nearest Neighbor Search! k Nearest Neighbors Search!
Using current search range and each subtree’s range, To find k nearest neighbors, maintain k current best
prune parts of the tree that could NOT include the instead of just the best [different k from k-d tree]!
nearest neighbor! Branches are only eliminated when they can't hold
any neighbor closer than any of the k current best!
[Sellarès]! [Sellarès]!
where the lat and lon are floats and! 42.2982 –83.7200 GreatPlainsBurger fastfood
42.3033 –83.7053 Evergreen restaurant
42.2846 –83.7451 Zingerman's restaurant
42.2797 –83.7496 WestendGrill restaurant
name and tag are single words! 42.2785 –83.7413 CometCoffee cafe
42.2808 –83.7486 KaiGarden restaurant
42.2845 –83.7463 Yamato restaurant
42.2909 –83.7178 UMCU bank
To simplify the assignment, we limit acceptable 42.2806 –83.7497 GrizzlyPeak pub
42.2806 –83.7493 CafeZola restaurant
latitude to between 0º and 90º and we limit 42.2810 –83.7486 Vinology restaurant
42.3047 –83.7090 Kroger supermarket
42.3030 –83.7066 TCF bank
42.2830 –83.7467 NoThai fastfood
acceptable longitude to between 0º and –180º 42.2803 –83.7479 ArborBrewingCompany pub
42.3048 –83.7083 AABank bank
(covering North America)! 42.2780 –83.7449 CreditUnion bank
42.2827 –83.7470 CafeVerde cafe
42.2804 –83.7497 Sweetwaters cafe
42.2780 –83.7449 UMCU bank
42.2795 –83.7438 TCF bank
42.2828 –83.7485 Heidelberg pub
42.2792 –83.7409 PotBelly fastfood
Problem Specification! Problem Specification!
You can assume there will be no duplicate records The database is followed by a single blank line and
(all four fields being the same) in the database! one or more queries!
The lat/lon may be duplicated, but as long as the There are three types of queries (not ordered):!
name and/or the tag is different, records with the 1. exact-match query, led by an ‘@’ sign:!
same lat/lon are not considered duplicates! @ 42.2982 –83.7200"
@ 42.2984 –83.7195"
We limit the lat/lon precision to 4 decimal places
@ 42.2980 –83.7109"
(±0.0001°), which translates to about 11 m
longitudinal distance at the equator and about 2. range-match query, led by an ‘r’:!
5.56 m longitudinal distance at latitude 60°!
r 42.2806 –83.7493 0.0004"
r 42.2812 –83.7521 0.002"
Bounding box search range covers the circle whose Bounding box is shrunk as a nearer neighbor is found!
radius is the distance between the reference point !
and the nearest neighbor ! A data point is considered a “neighbor” only if its tag
distance(A, B) = √(Bx – Ax)2 + (By – Ay)2! matches that of the queried tag"
[Sellarès]! [Sellarès]!
Output! Given the above database and queries, Location-Based Search!
the output of your program should be:! Finding exact matches:!
@ 42.2982 –83.7200" 42.2982 -83.7200 GreatPlainsBurger fastfood" • must be implemented using hashing!
@ 42.2984 –83.7195" 42.2984 -83.7195 Panera bakery" • figure out: what hash function to use!
" 42.2984 -83.7195 Qdoba fastfood"
@ 42.2980 –83.7109"
• figure out: how to resolve collisions!
No record found"
r 42.2806 –83.7493 0.0004" 42.2806 -83.7493 CafeZola restaurant" • assume your program will be used all over the
42.2806 -83.7497 GrizzlyPeak pub" acceptable region!
"
" 42.2804 -83.7497 Sweetwaters cafe"
No record found"
r 42.2812 –83.7521 0.002"
Finding range and nearest neighbor matches:!
n 42.2785 –83.7461 bank" 42.2780 -83.7449 CreditUnion bank"
n 42.2785 –83.7461 bookstore" No record found" • must be implemented using k-d tree!
n 42.3033 –83.7078 bank" 42.3030 -83.7066 TCF bank" • search range forms a rectangle/bounding box!
n 42.3034 –83.7078 bank" 42.3036 -83.7090 UMCU bank" • no need to implement node removal!
• may assume database is not sorted!
k-d Tree!
0.-1(0) 42.2893 -83.7391 NorthsideGrill restaurant!
1.0(1) 42.2831 -83.7485 TheBrokenEgg restaurant!
2.0(0) 42.2806 -83.7497 GrizzlyPeak pub! PA2 Grading Criteria!
3.0(1) 42.2804 -83.7497 Sweetwaters cafe!
4.1(0) 42.2797 -83.7496 WestendGrill restaurant!
5.1(1) 42.2806 -83.7493 CafeZola restaurant! Working, efficient solution (75%):!
3.1(1) 42.2810 -83.7486 Vinology restaurant!
4.0(0) 42.2808 -83.7486 KaiGarden restaurant! • autograder will use –O3 compile flag for timing,
4.1(0) 42.2828 -83.7485 Heidelberg pub!
2.1(0) 42.2785 -83.7413 CometCoffee cafe!
so make sure your Makefile also uses the"
3.0(1) 42.2780 -83.7449 CreditUnion bank! –O3 flag!
4.0(0) 42.2780 -83.7449 UMCU bank!
4.1(0) 42.2780 -83.7409 Ashley's pub!
3.1(1) 42.2845 -83.7463 Yamato restaurant!
4.0(0) 42.2803 -83.7479 ArborBrewingCompany pub!
Test cases (20%)!
5.1(1) 42.2830 -83.7467 NoThai fastfood!
6.0(0) 42.2827 -83.7470 CafeVerde cafe!
4.1(0) 42.2795 -83.7438 TCF bank! Code is readable, well-documented (5%):!
5.0(1) 42.2792 -83.7409 PotBelly fastfood!
5.1(1) 42.2846 -83.7451 Zingerman's restaurant! • pay attention to PA1 grade report and avoid
1.1(1) 42.3036 -83.7090 UMCU bank!
2.0(0) 42.2982 -83.7200 GreatPlainsBurger fastfood! being penalized for the same stylistic issues!
3.0(1) 42.2909 -83.7178 UMCU bank!
3.1(1) 42.2984 -83.7195 Qdoba fastfood!
4.0(0) 42.2984 -83.7195 Panera bakery!
4.1(0) 42.3047 -83.7090 Kroger supermarket!
2.1(0) 42.3033 -83.7053 Evergreen restaurant!
3.0(1) 42.3030 -83.7066 TCF bank!
3.1(1) 42.3048 -83.7083 AABank bank!
Files Organization! Time Requirements!
How would you organize your code into files?! How long does it take to do PA2?!
Alternative 1: main.cpp (NOT)! Lines of
Task! % Total Time!
Alternative 2: main.cpp, hash.h, hash.cpp, Code!
kdtree.h, kdtree.cpp, array.h, array.cpp, design (and writing spec)! n/a (2 days)!
linkedlist.h linkedlist.cpp, location.h,
parse input (incl. unit test)! 42! 4!
location.cpp (NOT)!
hashing (incl. unit test)! 132! 27!
Alternative 3: lbs281.cpp, adts.h, hash.h, kdtree insert! 88! 7!
kdtree.h, location.h, location.cpp!
kdtree range search! 142! 23!
Your choice would be different, but try not to split it kdtree nearest neighbor! 130! 38!
up into too many files!! whole globe (excl. nn)! 102! +28!