Spatial Indexing I: Point Access Methods
Spatial Indexing I: Point Access Methods
Spatial Indexing I: Point Access Methods
Query
Grid File
Hashing methods for multidimensional
points (extension of Extensible hashing)
Idea: Use a grid to partition the
space each cell is associated with one
page
Two disk access principle (exact match)
Grid File
Linear scale
Y
Linear scale X
Grid File Search
directory
access the cell directory to retrieve the bucket address (may
Range Queries:
use linear scales to determine the index into the cell directory.
buckets to visit.
Access the buckets.
Grid File Insertions
Size O(n)
Depth O(logn)
Construction time O(nlogn)
Construction of kd-trees
Construction of kd-trees
Construction of kd-trees
Construction of kd-trees
Construction of kd-trees
The complete kd-tree
Region of node v
the subtree(v)
Otherwise:
Search(right(v),R)
Query time analysis
x:x1
(internal)
y:y1 y:y2
directory
y3
x:x2 x:x3
y1 (external)
y4 y:y4
N8 y:y3
N5
y2
N1 N2 N3 N4 N5 N6 N7 N8 buckets
N1 N3 N4
x1 x2 x3
LSD-tree: main points
Split strategies:
Data dependent
Distribution dependent
Paging algorithm
Two types of splits: bucket splits and
internal node splits
PAMs
Point Access Methods
Multidimensional Hashing: Grid File
Exponential growth of the directory
Hierarchical methods: kd-tree based
Storing in external memory is tricky
Space Filling Curves: Z-ordering
Map points from 2-dimensions to 1-dimension.
Use a B+-tree to index the 1-dimensional
points
Z-ordering
Basic assumption: Finite precision in the
representation of each co-ordinate, K bits (2K
values)
The address space is a square (image) and
represented as a 2K x 2K array
Each element is called a pixel
Z-ordering
Impose a linear ordering on the pixels
of the image 1 dimensional problem
A
11
ZA = shuffle(xA, yA) = shuffle(“01”, “11”)
10 = 0111 = (7)10
01 ZB = shuffle(“01”, “01”) = 0011
00
00 01 10 11
B
Z-ordering
Given a point (x, y) and the precision K
find the pixel for the point and then
compute the z-value
Given a set of points, use a B+-tree to
index the z-values
A range (rectangular) query in 2-d is
mapped to a set of ranges in 1-d
Queries
Find the z-values that contained in the
query and then the ranges
QA
QA range [4, 7]
11
10
QB ranges [2,3] and [8,9]
01
00
00 01 10 11
QB
Hilbert Curve
We want points that are close in 2d to
be close in the 1d
Note that in 2d there are 4 neighbors
for each point where in 1d only 2.
Z-curve has some “jumps” that we
would like to avoid
Hilbert curve avoids the jumps :
recursive definition
Hilbert Curve- example
It has been shown that in general Hilbert is better than
the other space filling curves for retrieval [Jag90]
Hi (order-i) Hilbert curve for 2ix2i array
H1
H2 ... H(n+1)
Handling Regions
A region breaks into one or more pieces, each one
with different z-value
Works for raster representations (pixels)
We try to minimize the number of pieces in the
representation: precision/space overhead trade-off
ZR1 = 0010 = (2)
11
ZR2 = 1000 = (8) 10
ZG = 11 01
00
( “11” is the common prefix) 00 01 10 11
Z-ordering for Regions
Break the space into 4 equal quadrants: level-1
blocks
Level-i block: one of the four equal quadrants of a
level-(i-1) block
Pixel: level-K blocks, image level-0 block
For a level-i block: all its pixels have the same prefix
up to 2i bits; the z-value of the block
Quadtree
Object is recursively divided into blocks until:
Blocks are homogeneous
Pixel level
10 11
01 1001
00 1011
00 01 10 11
Region Quadtrees
Implementations
FL (Fixed Length)
FD (Fixed length-Depth)
VL (Variable length)
Use a B+-tree to index the z-values and
answer range queries
Linear Quadtree (LQ)
Assume we use n-bits in each dimension (x,y) (so
we have 2nx2n pixels)
For each object O, compute the z-values of this
object: z1, z2, z3, …, zk (each value can have
between 0 and 2n bits)
For each value zi we append at the end the level l
of this value ( level l =log(|zi|))
We create a value with 2n+l bits for each z-value
and we insert it into a B+-tree (l= log2(h))
Z-value, l | Morton block
A: 00, 01 = 00000001
B: 0110, 10 = 01100010 B C
C: 111000,11 = 11100011
n=3
http://en.wikipedia.org/wiki/How_Long_Is_the_Coast_of_Britain%3F_Statistical_Self-Similarity_and_Fractional_Dimension
Unit: 200 km, 100 km and 50 km in length.
The resulting coastline is about 2350 km, 2775 km and 3425 km
https://commons.wikimedia.org/wiki/File:Britain-fractal-coastline-50km.png#/media/File:Britain-fractal-coastline-combined.jpg
z-ordering - analysis
Q: Should we decompose a region to full
detail (and store in B-tree)?
z-ordering - analysis
Q: Should we decompose a region to full
detail (and store in B-tree)?
A: NO! approximation with 1-5 pieces/z-
values is best [Orenstein90]
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve?
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve?
A: e.g., avg. # of runs, for range queries
4 runs 3 runs
(#runs ~ #disk accesses on B-tree)
z-ordering - analysis
Q: So, is Hilbert really better?
A: 27% fewer runs, for 2-d (similar for 3-d)