Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University
Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University
Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University
(GIST 4302/5302)
Guofeng Cao
Department of Geosciences
Texas Tech University
Outline of This Week
• Last week, we learned:
– spatial point pattern analysis (PPA)
– focus on location distribution of events
– Measure the cluster (spatial autocorrelation)in
point pattern
• This week, we will learn:
– How to measure and detect clusters/spatial
autocorrelation in areal data (regional data)
Spatial Autocorrelation
• Spatial autocorrelationship is everywhere
– Spatial point pattern
• K, G functions
• Kernel functions
– Areal/lattice (this topic)
– Geostatistical data (next topic)
3
Spatial Autocorrelation of Areal
Data
4
Spatial Autocorrelation
• Tobler’s first law of geography
• Spatial auto/cross correlation
- high values
surrounded by nearby low values Grocery store density
9
Spatial Neighbors
• Contiguity-based neighbors
– Zone i and j are neighbors if zone i is contiguity or
adjacent to zone j
– But what constitutes contiguity?
• Distance-based neighbors
– Zone i and j are neighbors if the distance between them
are less than the threshold distance
– But what distance do we use?
10
Contiguity-based Spatial Neighbors
• Sharing a border or boundary
– Rook: sharing a border
– Queen: sharing a border or a point
Which use?
11
Higher-Order Contiguity
1st
order
Nearest
neighbor
12
Distance-based Neighbors
• How to measure distance between
polygons?
• Distance metrics
– 2D Cartesian distance (projected data)
– 3D spherical distance/great-circle distance
(lat/long data)
• Haversine formula
13
Distance-based Neighbors
• k-nearest neighbors
17
A Simple Example for Rook case
• Matrix contains a:
– 1 if share a border
– 0 if do not share a border
21
Row vs. Row standardization
A B C Divide each
number by the
row sum
D E F
Total number of neighbors Row standardized
--some have more than others --usually use this
Row
Row
A B C D E F Sum
A B C D E F Sum
A 0.0 0.5 0.0 0.5 0.0 0.0 1
A 0 1 0 1 0 0 2
B 0.3 0.0 0.3 0.0 0.3 0.0 1
B 1 0 1 0 1 0 3
C 0.0 0.5 0.0 0.0 0.0 0.5 1
C 0 1 0 0 0 1 2
D 1 0 0 0 1 0 2 D 0.5 0.0 0.0 0.0 0.5 0.0 1
E 0 1 0 1 0 1 3 E 0.0 0.3 0.0 0.3 0.0 0.3 1
F 0 0 1 0 1 0 2 F 0.0 0.0 0.5 0.0 0.5 0.0 1
22
General Spatial Weights Based on
Distance
• Decay functions of distance
– Most common choice is the inverse (reciprocal) of the distance
between locations i and j (wij = 1/dij)
– Other functions also used
• inverse of squared distance (wij =1/dij2), or
• negative exponential (wij = e-d or wij = e-d2)
23
Distance-based Spatial Weight Matrix
A B C
D E F
A B C D E F
A 0 2 0 2 1 0
B 2 0.0 2 1 2 1
C 0 2 0 0 1 2
D 2 1 0 0 2 0
E 1 2 1 2 0 2
F 0 1 2 0 2 0
24
Measure of Spatial
Autocorrelation
25
Global Measures and Local Measures
• Global Measures
– A single value which applies to the entire data set
• The same pattern or process occurs over the entire
geographic area
• An average for the entire area
• Local Measures
– A value calculated for each observation unit
• Different patterns or processes may occur in different
parts of the region
• A unique number for each location
• Global measures usually can be decomposed
into a combination of local measures
26
Global Measures and Local Measures
• Global Measures
– Moran’s I
• Local Measures
– Local Moran’s I
27
Moran’s I
• The most common measure of Spatial Autocorrelation
• Use for points or polygons
• Where:
N is the number of observations (points or polygons)
x is the mean of the variable
Xi is the variable value at a particular location
Xj is the variable value at another location
Wij is a weight indexing location of i relative to j 29
Moran’s I
• Varies on a scale between –1 through 0* to + 1
*technically it is:
-1 0 +1 –1/(n-1)
CLUSTERED
UNIFORM/
å i å i
numerator (top) to the measures
(y - y) 2
(x - x) 2
of spatial association discussed
earlier if we view Yi as being the
i =1 i =1 Xi for the neighboring polygon
n n
(see next slide)
n n n n
n n
N åå w ij (x i - x)(x j - x)
i =1 j=1
ååw
i =1 j=1
ij (x i - x)(x j - x)/ åå w ij
i =1 j=1
n n n
(åå w ij )å (x i - x) 2
= n n
å i - 2
å i - 2
i =1 j=1 i =1
(x x) (x x)
Spatial i =1 i =1
auto-correlation n n 32
Source: Ron Briggs of UT Dallas
n Correlation
å1(y
i =1
i - y)(x i - x)/n
Coefficient
n n
å i
(y -
i =1
y) 2
å i
(x
i =1
- x) 2 Spatial
weights
n n
Yi is the Xi for the
n n n n
neighboring polygon
n n
N åå w ij (x i - x)(x j - x)
ååw
i =1 j=1
ij (x i - x)(x j - x)/ åå w ij
i =1 j=1
i =1 j=1
n n n =
(åå w ij )å (x i - x) 2
n n
i =1 j=1 i =1
å i
(x
i =1
- x) 2
å i
(x -
i =1
x) 2
Moran’s I n n 33
Source: Ron Briggs of UT Dallas
Moran Scatter Plots
We can draw a scatter diagram between these two variables (in
standardized form): X and lag-X (or W_X)
Low/Low High/Low
positive SA negative SA Q2 (values [-], nearby values [+]): L-H
35
Moran Scatterplot: Example
36
Statistical Significance Tests for Moran’s I
• Based on the normal frequency distribution with
37
Test Statistic for Normal Frequency Distribution
*technically –1/(n-1)
2.5% 2.5% 1%
–1/(n-1) 1.96 2.54
Reject null -1.96 0
Reject null at 5%
Null Hypothesis: no spatial autocorrelation Reject null at 1%
*Moran s I = 0
Alternative Hypothesis: spatial autocorrelation exists
*Moran s I > 0
Reject Null Hypothesis if Z test statistic > 1.96 (or < -1.96)
---less than a 5% chance that, in the population, there is no
spatial autocorrelation 38
---95% confident that spatial auto correlation exits
Null Hypothesis: no spatial autocorrelation
*Moran s I = 0
Alternative Hypothesis: spatial autocorrelation exists
*Moran s I > 0
Reject Null Hypothesis if Z test statistic > 1.96 (or < -1.96)
---less than a 5% chance that, in the population, there is no
spatial autocorrelation
---95% confident that spatial auto correlation exits
39
Spatial Autocorrelation vs Correlation
Spatial Autocorrelation: Standard Correlation
shows the association or shows the association or
relationship between the relationship between two
same variable in “near- different variables
by” areas.
40
Bivariate Moran Scatter Plot
High/High
Low/High positive SA
negative SA
Low/Low High/Low
positive SA negative SA
41
Local Measures of
Spatial Autocorrelation
42
Local Indicators of Spatial Association (LISA)
See:
Luc Anselin 1995 Local Indicators of Spatial
Association-LISA Geographical Analysis 27: 93-115
43
Local Indicators of Spatial Association (LISA)
44
Example:
45
Calculating Anselin’s LISA
• The local Moran statistic for areal unit i is:
I i = zi å wij z j
j
47
Source: Ron Briggs of UT Dallas
Contiguity Matrix 1 2 3 4 5 6 7
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum Neighbors Illiteracy
5
4
1
6 7
2
3
48
Source: Ron Briggs of UT Dallas
Contiguity Matrix and
Row Standardized Spatial Weights Matrix
Contiguity Matrix 1 2 3 4 5 6 7
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum
Anhui 1 0 1 1 1 1 1 0 5
Zhejiang 2 1 0 1 1 0 0 1 4
Jiangxi 3 1 1 0 0 0 1 0 3
Jiangsu 4 1 1 0 0 0 0 1 3
Henan 5 1 0 0 0 0 1 0 2
Hubei 6 1 0 1 0 1 0 0 3
Shanghai 7 0 1 0 1 0 0 0 2 1/3
Row Standardized Spatial Weights Matrix
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum
wij
Jiangxi 3 0.33 0.33 0.00 0.00 0.00 0.33 0.00
Jiangsu 4 0.33 0.33 0.00 0.00 0.00 0.00 0.33
Henan 5 0.50 0.00 0.00 0.00 0.00 0.50 0.00
Hubei 6 0.33 0.00 0.33 0.00 0.33 0.00 0.00
Shanghai
I i = zi å wij z j
7 0.00 0.50 0.00 0.50 0.00 0.00 0.00
Z-Scores for row Province and its potential neighbors
Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai
Zi
Anhui 2.101 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414) j
Zhejiang
zj
0.387 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Jiangxi (0.572) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Jiangsu (0.051) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Henan (0.281) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Hubei (0.171) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Shanghai (1.414) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
53
Bivariate LISA Moran Scatter Plot for GDI vs AL
54
Bivariate LISA
and the Correlation Coefficient
• Correlation Coefficient is the
relationship between two
different variables in the same
area
• Bivariate LISA is a correlation
between two different
variables in an area and in
nearby areas.
55
Consequences of Ignoring Spatial
Autocorrelation
• correlation coefficients and coefficients of
determination appear bigger than they really are
•You think the relationship is stronger than it really is
•the variables in nearby areas affect each other
• Standard errors appear smaller than they really are
•exaggerated precision
•You think your predictions are better than they really are
since standard errors measure predictive accuracy
•More likely to conclude
relationship is statistically significant.
56
Diagnostic of Spatial Dependence
• For correlation
– calculate Moran’s I for each variable and test its statistical
significance
– If Moran’s I is significant, you may have a problem!
• For regression
– calculate the residuals
map the residuals: do you see any spatial patterns?
– Calculate Moran’s I for the residuals: is it statistically
significant?
57
Summary
• Spatial autocorrelation of areal data
• Spatial weight matrix
• Measures of spatial autocorrelation
• Global Measure
– Moran s I
• Consequences of ignoring spatial
autocorrelation
• Significance test
58
• Please read O’S & Unwin Ch. 7 and Ch. 8.1
and 8.2
• End of this topic
59