Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University

Spatial Analysis and Modeling
(GIST 4302/5302)
Guofeng Cao
Department of Geosciences
Texas Tech University
Outline of This Week
• Last week, we learned:
– spatial point pattern analysis (PPA)
– focus on location distribution of events
– Measure the cluster (spatial autocorrelation)in
point pattern
• This week, we will learn:
– How to measure and detect clusters/spatial
autocorrelation in areal data (regional data)
Spatial Autocorrelation
• Spatial autocorrelationship is everywhere
– Spatial point pattern
• K, G functions
• Kernel functions
– Areal/lattice (this topic)
– Geostatistical data (next topic)
3
Spatial Autocorrelation of Areal
Data
4
• Tobler’s first law of geography
• Spatial auto/cross correlation
If like values If there is no apparent If like values tend

tend to cluster relationship between to be located
together, attribute value and away from each
location then there is other, then there
then the field
zero spatial
exhibits is negative
autocorrelation
high positive spatial
spatial autocorrelation
5
autocorrelation
Positive spatial autocorrelation
2002 population
- high values density
surrounded by nearby high values

- intermediate values surrounded
by nearby intermediate values
- low values surrounded by
nearby low values
Source: Ron Briggs of UT Dallas 6

Negative spatial autocorrelation competition for space
- high values
surrounded by nearby low values Grocery store density
- intermediate values surrounded

by nearby intermediate values
- low values surrounded by
nearby high values

Measuring Spatial Autocorrelation:
the problem of measuring nearness
To measure spatial autocorrelation, we must
know the nearness of our observations as we
did for point pattern case
• Which points or polygons are near or next to
other points or polygons?
– Which states are near Texas?
–How to measure this?
Seems simple and obvious,
but it is not!
8
Spatial Weight Matrix
• Core concept in statistical analysis of areal data
• Two steps involved:
– define which relationships between observations are to
be given a nonzero weight, i.e., define spatial
neighbors
– assign weights to the neighbors
9
Spatial Neighbors
• Contiguity-based neighbors
– Zone i and j are neighbors if zone i is contiguity or
adjacent to zone j
– But what constitutes contiguity?
• Distance-based neighbors
– Zone i and j are neighbors if the distance between them
are less than the threshold distance
– But what distance do we use?
10
Contiguity-based Spatial Neighbors
• Sharing a border or boundary
– Rook: sharing a border
– Queen: sharing a border or a point
rook queen Hexagons Irregular
Which use?
11
Higher-Order Contiguity
1st
order
Nearest
neighbor
rook hexagon queen

2nd
order
Next
nearest
neighbor
12
Distance-based Neighbors
• How to measure distance between
polygons?
• Distance metrics
– 2D Cartesian distance (projected data)
– 3D spherical distance/great-circle distance
(lat/long data)
• Haversine formula
13
• k-nearest neighbors
Source: Bivand and Pebesma and Gomez-Rubio

14
• thresh-hold distance (buffer)

15
Neighbor/Connectivity
Histogram

16
Spatial Weight Matrix
• Spatial weights can be seen as a list of
weights indexed by a list of neighbors
• If zone j is not a neighbor of zone i, weights
Wij will set to zero
– The weight matrix can be
illustrated as an image
– Sparse matrix
17
A Simple Example for Rook case
• Matrix contains a:
– 1 if share a border
– 0 if do not share a border
4 areal units 4x4 matrix

A B C D
A B A 0 1 1 0
W =B 1 0 0 1
C D C 1 0 0 1
Common border D 0 1 1 0
18
19
Sparse Contiguity Matrix for US States -- obtained from Anselin's web site (see powerpoint for link)
Name Fips Ncount N1 N2 N3 N4 N5 N6 N7 N8
Alabama 1 4 28 13 12 47
Arizona 4 5 35 8 49 6 32
Arkansas 5 6 22 28 48 47 40 29
California 6 3 4 32 41
Colorado 8 7 35 4 20 40 31 49 56
Connecticut 9 3 44 36 25
Delaware 10 3 24 42 34
District of Columbia 11 2 51 24
Florida 12 2 13 1
Georgia 13 5 12 45 37 1 47
Idaho 16 6 32 41 56 49 30 53
Illinois 17 5 29 21 18 55 19
Indiana 18 4 26 21 17 39
Iowa 19 6 29 31 17 55 27 46
Kansas 20 4 40 29 31 8
Kentucky 21 7 47 29 18 39 54 51 17
Louisiana 22 3 28 48 5
Maine 23 1 33
Maryland 24 5 51 10 54 42 11
Massachusetts 25 5 44 9 36 50 33
Michigan 26 3 18 39 55
Minnesota 27 4 19 55 46 38
Mississippi 28 4 22 5 1 47
Missouri 29 8 5 40 17 21 47 20 19 31
Montana 30 4 16 56 38 46
Nebraska 31 6 29 20 8 19 56 46
Nevada 32 5 6 4 49 16 41
New Hampshire 33 3 25 23 50
New Jersey 34 3 10 36 42
New Mexico 35 5 48 40 8 4 49
New York 36 5 34 9 42 50 25
North Carolina 37 4 45 13 47 51
North Dakota 38 3 46 27 30
Ohio 39 5 26 21 54 42 18
Oklahoma 40 6 5 35 48 29 20 8
Oregon 41 4 6 32 16 53
Pennsylvania 42 6 24 54 10 39 36 34
Rhode Island 44 2 25 9
South Carolina 45 2 13 37
South Dakota 46 6 56 27 19 31 38 30
Tennessee 47 8 5 28 1 37 13 51 21 29
Texas 48 4 22 5 35 40
Utah 49 6 4 8 35 56 32 16
Vermont 50 3 36 25 33
Virginia 51 6 47 37 24 54 11 21
Washington 53 2 41 16
West Virginia 54 5 51 21 24 39 42
Wisconsin 55 4 26 17 19 27
Wyoming 56 6 49 16 31 8 46 30 20
Style of Spatial Weight Matrix
• Row
– a weight of unity for each neighbor relationship
• Row standardization
– Symmetry not guaranteed
– can be interpreted as allowing the calculation of
average values across neighbors
• General spatial weights based on distances
21
Row vs. Row standardization
A B C Divide each
number by the
row sum
D E F
Total number of neighbors Row standardized
--some have more than others --usually use this
Row
Row
A B C D E F Sum
A B C D E F Sum
A 0.0 0.5 0.0 0.5 0.0 0.0 1
A 0 1 0 1 0 0 2
B 0.3 0.0 0.3 0.0 0.3 0.0 1
B 1 0 1 0 1 0 3
C 0.0 0.5 0.0 0.0 0.0 0.5 1
C 0 1 0 0 0 1 2
D 1 0 0 0 1 0 2 D 0.5 0.0 0.0 0.0 0.5 0.0 1
E 0 1 0 1 0 1 3 E 0.0 0.3 0.0 0.3 0.0 0.3 1
F 0 0 1 0 1 0 2 F 0.0 0.0 0.5 0.0 0.5 0.0 1
22
General Spatial Weights Based on
Distance
• Decay functions of distance
– Most common choice is the inverse (reciprocal) of the distance
between locations i and j (wij = 1/dij)
– Other functions also used
• inverse of squared distance (wij =1/dij2), or
• negative exponential (wij = e-d or wij = e-d2)
23
Distance-based Spatial Weight Matrix
A B C
D E F
A B C D E F
A 0 2 0 2 1 0
B 2 0.0 2 1 2 1
C 0 2 0 0 1 2
D 2 1 0 0 2 0
E 1 2 1 2 0 2
F 0 1 2 0 2 0
24
Measure of Spatial
Autocorrelation
25
Global Measures and Local Measures
• Global Measures
– A single value which applies to the entire data set
• The same pattern or process occurs over the entire
geographic area
• An average for the entire area
• Local Measures
– A value calculated for each observation unit
• Different patterns or processes may occur in different
parts of the region
• A unique number for each location
• Global measures usually can be decomposed
into a combination of local measures
26
Global Measures and Local Measures
• Global Measures
– Moran’s I
• Local Measures
– Local Moran’s I
27
Moran’s I
• The most common measure of Spatial Autocorrelation
• Use for points or polygons
Patrick Alfred Pierce Moran (1917-1988)

28
Formula for Moran’s I
n n
N åå w ij (x i - x)(x j - x)
i =1 j=1
I= n n n
(åå w ij )å (x i - x) 2
i =1 j=1 i =1
• Where:
N is the number of observations (points or polygons)
x is the mean of the variable
Xi is the variable value at a particular location
Xj is the variable value at another location
Wij is a weight indexing location of i relative to j 29
Moran’s I
• Varies on a scale between –1 through 0* to + 1
*technically it is:
-1 0 +1 –1/(n-1)
high negative spatial no spatial high positive spatial

autocorrelation autocorrelation* autocorrelation
Can also use it as an index for dispersion/random/cluster patterns.

Dispersed Pattern Random Pattern Clustered Pattern
DISPERSED
CLUSTERED
UNIFORM/
Briggs Henan University 2010 30

Moran’s I and Correlation Coefficient
• Correlation Coefficient [-1, 1]

– Relationship between two different variables
• Moran s I [-1, 1]
– Spatial autocorrelation and often involves one (spatially indexed)
variable only
– Correlation between observations of a spatial variable at location
X and spatial lag of X formed by averaging all the observation
at neighbors of X
n Correlation
å1(yi =1
i - y)(x i - x)/n Coefficient
Note the similarity of the
n n
å i å i
numerator (top) to the measures
(y - y) 2
(x - x) 2
of spatial association discussed
earlier if we view Yi as being the
i =1 i =1 Xi for the neighboring polygon
n n
(see next slide)
n n n n
n n
i =1 j=1
ååw
i =1 j=1
ij (x i - x)(x j - x)/ åå w ij
i =1 j=1
n n n
(åå w ij )å (x i - x) 2
= n n
å i - 2
å i - 2
i =1 j=1 i =1
(x x) (x x)
Spatial i =1 i =1
auto-correlation n n 32
Source: Ron Briggs of UT Dallas
n Correlation
å1(y
i =1
i - y)(x i - x)/n
Coefficient
n n
å i
(y -
i =1
y) 2
å i
(x
i =1
- x) 2 Spatial
weights
n n
Yi is the Xi for the
n n n n
neighboring polygon
n n
ååw
i =1 j=1
ij (x i - x)(x j - x)/ åå w ij
i =1 j=1
i =1 j=1
n n n =
(åå w ij )å (x i - x) 2
n n
i =1 j=1 i =1
å i
(x
i =1
- x) 2
å i
(x -
i =1
x) 2
Moran’s I n n 33
Moran Scatter Plots
We can draw a scatter diagram between these two variables (in
standardized form): X and lag-X (or W_X)
The slope of this regression line is

Moran’s I
34
Moran Scatter Plots
Locations of positive spatial association
(“I’m similar to my neighbors”).
Low/High High/High
negative SA positive SA
Q1 (values [+], nearby values [+]): H-H
Q3 (values [-], nearby values [-]): L-L
Locations of negative spatial association

(“I’m different from my neighbors”).
Low/Low High/Low
positive SA negative SA Q2 (values [-], nearby values [+]): L-H
Q4 (values [+], nearby values [-]): H-L
35
Moran Scatterplot: Example
36
Statistical Significance Tests for Moran’s I
• Based on the normal frequency distribution with
Where: I is the calculated value for Moran’s I

I - E(I ) from the sample
Z=
Serror ( I ) E(I) is the expected value if random
S is the standard error
• Statistical significance test
– Monte Carlo test, as we did for spatial pattern analysis
– Permutation test
• Non-parametric
• Data-driven, no assumption of the data
• Implemented in GeoDa
37
Test Statistic for Normal Frequency Distribution
*technically –1/(n-1)
2.5% 2.5% 1%
–1/(n-1) 1.96 2.54
Reject null -1.96 0
Reject null at 5%
Null Hypothesis: no spatial autocorrelation Reject null at 1%
*Moran s I = 0
Alternative Hypothesis: spatial autocorrelation exists
*Moran s I > 0
Reject Null Hypothesis if Z test statistic > 1.96 (or < -1.96)
---less than a 5% chance that, in the population, there is no
spatial autocorrelation 38
---95% confident that spatial auto correlation exits
Null Hypothesis: no spatial autocorrelation
*Moran s I = 0
Alternative Hypothesis: spatial autocorrelation exists
*Moran s I > 0
Reject Null Hypothesis if Z test statistic > 1.96 (or < -1.96)
---less than a 5% chance that, in the population, there is no
spatial autocorrelation
---95% confident that spatial auto correlation exits
39
Spatial Autocorrelation vs Correlation
Spatial Autocorrelation: Standard Correlation
shows the association or shows the association or
relationship between the relationship between two
same variable in “near- different variables
by” areas.
40
Bivariate Moran Scatter Plot
High/High
Low/High positive SA
negative SA
Low/Low High/Low
positive SA negative SA
41
Local Measures of
42
Local Indicators of Spatial Association (LISA)
• Local versions of Moran’s I

• Moran’s I is most commonly used, and the local version
is often called Anselin’s LISA, or just LISA
See:
Luc Anselin 1995 Local Indicators of Spatial
Association-LISA Geographical Analysis 27: 93-115
43
Local Indicators of Spatial Association (LISA)
• The statistic is calculated for each areal unit in the data

• For each polygon, the index is calculated based on neighboring
polygons with which it shares a border
• A measure is available for each polygon, these can be mapped
to indicate how spatial autocorrelation varies over the study
region
• Each index has an associated test statistic, we can also map
which of the polygons has a statistically significant relationship
with its neighbors, and show type of relationship
44
Example:
45
Calculating Anselin’s LISA
• The local Moran statistic for areal unit i is:
I i = zi å wij z j
j
where zi is the original variable xi in z = xi - x

i
SDx
“standardized form”
or it can be in “deviation form” x - x
i
and wij is the spatial weight
The summation åj is across each row i of the
spatial weights matrix.
An example follows
46
Example using seven China provinces
--caution: “edge effects” will strongly influences the
results because we have a very small number of
observations
47
Contiguity Matrix 1 2 3 4 5 6 7
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum Neighbors Illiteracy
Anhui 1 0 1 1 1 1 1 0 5 65432 14.49

Zhejiang 2 1 0 1 1 0 0 1 4 7431 9.36
Jiangxi 3 1 1 0 0 0 1 0 3 621 6.49
Jiangsu 4 1 1 0 0 0 0 1 3 721 8.05
Henan 5 1 0 0 0 0 1 0 2 61 7.36
Hubei 6 1 0 1 0 1 0 0 3 135 7.69
Shanghai 7 0 1 0 1 0 0 0 2 24 3.97
5
4
1
6 7
2
3
48
Contiguity Matrix and
Row Standardized Spatial Weights Matrix
Contiguity Matrix 1 2 3 4 5 6 7
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum
Anhui 1 0 1 1 1 1 1 0 5
Zhejiang 2 1 0 1 1 0 0 1 4
Jiangxi 3 1 1 0 0 0 1 0 3
Jiangsu 4 1 1 0 0 0 0 1 3
Henan 5 1 0 0 0 0 1 0 2
Hubei 6 1 0 1 0 1 0 0 3
Shanghai 7 0 1 0 1 0 0 0 2 1/3
Row Standardized Spatial Weights Matrix
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum
Anhui 1 0.00 0.20 0.20 0.20 0.20 0.20 0.00 1

Zhejiang 2 0.25 0.00 0.25 0.25 0.00 0.00 0.25 1
Jiangxi 3 0.33 0.33 0.00 0.00 0.00 0.33 0.00 1
Jiangsu 4 0.33 0.33 0.00 0.00 0.00 0.00 0.33 1
Henan 5 0.50 0.00 0.00 0.00 0.00 0.50 0.00 1
Hubei 6 0.33 0.00 0.33 0.00 0.33 0.00 0.00 1
Shanghai 7 0.00 0.50 0.00 0.50 0.00 0.00 0.00 1
Calculating standardized (z) scores
xi - x
zi =
Deviations from Mean and z scores.
X X-Xmean X-Mean2 z
SDx
Anhui 14.49 6.29 39.55 2.101
Zhejiang 9.36 1.16 1.34 0.387
Jiangxi 6.49 (1.71) 2.93 (0.572)
Jiangsu 8.05 (0.15) 0.02 (0.051)
Henan 7.36 (0.84) 0.71 (0.281)
Hubei 7.69 (0.51) 0.26 (0.171)
Shanghai 3.97 (4.23) 17.90 (1.414)
Mean and Standard Deviation

Sum 57.41 0.00 62.71
Mean 57.41 / 7 = 8.20
Variance 62.71 / 7 = 8.96
SD √ 8.96 = 2.99
50
Row Standardized Spatial Weights
Matrix
Calculating LISA
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai
Anhui 1 0.00 0.20 0.20 0.20 0.20 0.20 0.00

Zhejiang 2 0.25 0.00 0.25 0.25 0.00 0.00 0.25
wij
Jiangxi 3 0.33 0.33 0.00 0.00 0.00 0.33 0.00
Jiangsu 4 0.33 0.33 0.00 0.00 0.00 0.00 0.33
Henan 5 0.50 0.00 0.00 0.00 0.00 0.50 0.00
Hubei 6 0.33 0.00 0.33 0.00 0.33 0.00 0.00
Shanghai
I i = zi å wij z j
7 0.00 0.50 0.00 0.50 0.00 0.00 0.00
Z-Scores for row Province and its potential neighbors
Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai
Zi
Anhui 2.101 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414) j
Zhejiang
zj
0.387 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Jiangxi (0.572) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Jiangsu (0.051) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Henan (0.281) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Hubei (0.171) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Shanghai (1.414) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Spatial Weight Matrix multiplied by Z-Score Matrix (cell by cell multiplication)

Anhui Zhejiang Jiangxi Jiangsu Henan Hubei
wijzj
Shanghai SumWijZj LISA Lisa from
Zi 0.000 GeoDA
Anhui 2.101 - 0.077 (0.114) (0.010) (0.056) (0.034) - (0.137) -0.289 -0.248
Zhejiang 0.387 0.525 - (0.143) (0.013) - - (0.353) 0.016 0.006 0.005
Jiangxi (0.572) 0.700 0.129 - - - (0.057) - 0.772 -0.442 -0.379
Jiangsu (0.051) 0.700 0.129 - - - - (0.471) 0.358 -0.018 -0.016
Henan (0.281) 1.050 - - - - (0.085) - 0.965 -0.271 -0.233
Hubei (0.171) 0.700 - (0.191) - (0.094) - - 0.416 -0.071 -0.061
Shanghai (1.414) - 0.194 - (0.025) - - - 0.168 -0.238 -0.204

Results
Moran’s I = -.01889
Raw Data I expected Anhui to be
High-Low!
(high illiteracy
surrounded by low)
Low
Significance levels are calculated by

simulations. They may differ each
High
time software is run.
Province Literacy % LISA Significance
Anhui 14.49 -0.25 0.12
Zhejiang 9.36 0.01 0.46
Jiangxi 6.49 -0.38 0.04
Jiangsu 8.05 -0.02 0.32
Henan 7.36 -0.23 0.14
Low-High
Hubei 7.69 -0.06 0.28
Shanghai 3.97 -0.20 0.37
Example: Nepal Data
53
Bivariate LISA Moran Scatter Plot for GDI vs AL
• Moran s I is the correlation between X

and Lag-X--the same variable but in
nearby areas
– Univariate Moran s I
• Bivariate Moran s I is a correlation
between X and a different variable in
nearby areas.
Moran Significance Map for GDI vs. AL
54
Bivariate LISA
and the Correlation Coefficient
• Correlation Coefficient is the
relationship between two
different variables in the same
area
• Bivariate LISA is a correlation
between two different
variables in an area and in
nearby areas.
55
Consequences of Ignoring Spatial
Autocorrelation
• correlation coefficients and coefficients of
determination appear bigger than they really are
•You think the relationship is stronger than it really is
•the variables in nearby areas affect each other
• Standard errors appear smaller than they really are
•exaggerated precision
•You think your predictions are better than they really are
since standard errors measure predictive accuracy
•More likely to conclude
relationship is statistically significant.
56
Diagnostic of Spatial Dependence
• For correlation
– calculate Moran’s I for each variable and test its statistical
significance
– If Moran’s I is significant, you may have a problem!
• For regression
– calculate the residuals
map the residuals: do you see any spatial patterns?
– Calculate Moran’s I for the residuals: is it statistically
significant?
57
Summary
• Spatial autocorrelation of areal data
• Spatial weight matrix
• Measures of spatial autocorrelation
• Global Measure
– Moran s I
• Consequences of ignoring spatial
autocorrelation
• Significance test
58
• Please read O’S & Unwin Ch. 7 and Ch. 8.1
and 8.2
• End of this topic
59

Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University

Uploaded by

Copyright:

Available Formats

Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University

Uploaded by

Copyright:

Available Formats

Spatial Analysis and Modeling

If like values If there is no apparent If like values tend

surrounded by nearby high values

Source: Ron Briggs of UT Dallas 6

- intermediate values surrounded

Source: Ron Briggs of UT Dallas 7

rook queen Hexagons Irregular

rook hexagon queen

Source: Bivand and Pebesma and Gomez-Rubio

Source: Bivand and Pebesma and Gomez-Rubio

Source: Bivand and Pebesma and Gomez-Rubio

4 areal units 4x4 matrix

Patrick Alfred Pierce Moran (1917-1988)

high negative spatial no spatial high positive spatial

Can also use it as an index for dispersion/random/cluster patterns.

Briggs Henan University 2010 30

• Correlation Coefficient [-1, 1]

The slope of this regression line is

Q3 (values [-], nearby values [-]): L-L

Locations of negative spatial association

Q4 (values [+], nearby values [-]): H-L

Where: I is the calculated value for Moran’s I

• Local versions of Moran’s I

• The statistic is calculated for each areal unit in the data

where zi is the original variable xi in z = xi - x

Anhui 1 0 1 1 1 1 1 0 5 65432 14.49

Anhui 1 0.00 0.20 0.20 0.20 0.20 0.20 0.00 1

Mean and Standard Deviation

Anhui 1 0.00 0.20 0.20 0.20 0.20 0.20 0.00

Spatial Weight Matrix multiplied by Z-Score Matrix (cell by cell multiplication)

Source: Ron Briggs of UT Dallas 51

Significance levels are calculated by

• Moran s I is the correlation between X

You might also like