Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University

Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

Spatial Analysis and Modeling

(GIST 4302/5302)
Guofeng Cao
Department of Geosciences
Texas Tech University
Outline of This Week
• Last week, we learned:
– spatial point pattern analysis (PPA)
– focus on location distribution of events
– Measure the cluster (spatial autocorrelation)in
point pattern
• This week, we will learn:
– How to measure and detect clusters/spatial
autocorrelation in areal data (regional data)
Spatial Autocorrelation
• Spatial autocorrelationship is everywhere
– Spatial point pattern
• K, G functions
• Kernel functions
– Areal/lattice (this topic)
– Geostatistical data (next topic)

3
Spatial Autocorrelation of Areal
Data

4
Spatial Autocorrelation
• Tobler’s first law of geography
• Spatial auto/cross correlation

If like values If there is no apparent If like values tend


tend to cluster relationship between to be located
together, attribute value and away from each
location then there is other, then there
then the field
zero spatial
exhibits is negative
autocorrelation
high positive spatial
spatial autocorrelation
5
autocorrelation
Positive spatial autocorrelation
2002 population
- high values density

surrounded by nearby high values


- intermediate values surrounded
by nearby intermediate values
- low values surrounded by
nearby low values

Source: Ron Briggs of UT Dallas 6


Negative spatial autocorrelation competition for space

- high values
surrounded by nearby low values Grocery store density

- intermediate values surrounded


by nearby intermediate values
- low values surrounded by
nearby high values

Source: Ron Briggs of UT Dallas 7


Measuring Spatial Autocorrelation:
the problem of measuring nearness
To measure spatial autocorrelation, we must
know the nearness of our observations as we
did for point pattern case
• Which points or polygons are near or next to
other points or polygons?
– Which states are near Texas?
–How to measure this?
Seems simple and obvious,
but it is not!
8
Spatial Weight Matrix
• Core concept in statistical analysis of areal data
• Two steps involved:
– define which relationships between observations are to
be given a nonzero weight, i.e., define spatial
neighbors
– assign weights to the neighbors

9
Spatial Neighbors
• Contiguity-based neighbors
– Zone i and j are neighbors if zone i is contiguity or
adjacent to zone j
– But what constitutes contiguity?
• Distance-based neighbors
– Zone i and j are neighbors if the distance between them
are less than the threshold distance
– But what distance do we use?

10
Contiguity-based Spatial Neighbors
• Sharing a border or boundary
– Rook: sharing a border
– Queen: sharing a border or a point

rook queen Hexagons Irregular

Which use?
11
Higher-Order Contiguity

1st
order

Nearest
neighbor

rook hexagon queen


2nd
order
Next
nearest
neighbor

12
Distance-based Neighbors
• How to measure distance between
polygons?
• Distance metrics
– 2D Cartesian distance (projected data)
– 3D spherical distance/great-circle distance
(lat/long data)
• Haversine formula

13
Distance-based Neighbors
• k-nearest neighbors

Source: Bivand and Pebesma and Gomez-Rubio


14
Distance-based Neighbors
• thresh-hold distance (buffer)

Source: Bivand and Pebesma and Gomez-Rubio


15
Neighbor/Connectivity
Histogram

Source: Bivand and Pebesma and Gomez-Rubio


16
Spatial Weight Matrix
• Spatial weights can be seen as a list of
weights indexed by a list of neighbors
• If zone j is not a neighbor of zone i, weights
Wij will set to zero
– The weight matrix can be
illustrated as an image
– Sparse matrix

17
A Simple Example for Rook case
• Matrix contains a:
– 1 if share a border
– 0 if do not share a border

4 areal units 4x4 matrix


A B C D
A B A 0 1 1 0
W =B 1 0 0 1
C D C 1 0 0 1
Common border D 0 1 1 0
18
19
Sparse Contiguity Matrix for US States -- obtained from Anselin's web site (see powerpoint for link)
Name Fips Ncount N1 N2 N3 N4 N5 N6 N7 N8
Alabama 1 4 28 13 12 47
Arizona 4 5 35 8 49 6 32
Arkansas 5 6 22 28 48 47 40 29
California 6 3 4 32 41
Colorado 8 7 35 4 20 40 31 49 56
Connecticut 9 3 44 36 25
Delaware 10 3 24 42 34
District of Columbia 11 2 51 24
Florida 12 2 13 1
Georgia 13 5 12 45 37 1 47
Idaho 16 6 32 41 56 49 30 53
Illinois 17 5 29 21 18 55 19
Indiana 18 4 26 21 17 39
Iowa 19 6 29 31 17 55 27 46
Kansas 20 4 40 29 31 8
Kentucky 21 7 47 29 18 39 54 51 17
Louisiana 22 3 28 48 5
Maine 23 1 33
Maryland 24 5 51 10 54 42 11
Massachusetts 25 5 44 9 36 50 33
Michigan 26 3 18 39 55
Minnesota 27 4 19 55 46 38
Mississippi 28 4 22 5 1 47
Missouri 29 8 5 40 17 21 47 20 19 31
Montana 30 4 16 56 38 46
Nebraska 31 6 29 20 8 19 56 46
Nevada 32 5 6 4 49 16 41
New Hampshire 33 3 25 23 50
New Jersey 34 3 10 36 42
New Mexico 35 5 48 40 8 4 49
New York 36 5 34 9 42 50 25
North Carolina 37 4 45 13 47 51
North Dakota 38 3 46 27 30
Ohio 39 5 26 21 54 42 18
Oklahoma 40 6 5 35 48 29 20 8
Oregon 41 4 6 32 16 53
Pennsylvania 42 6 24 54 10 39 36 34
Rhode Island 44 2 25 9
South Carolina 45 2 13 37
South Dakota 46 6 56 27 19 31 38 30
Tennessee 47 8 5 28 1 37 13 51 21 29
Texas 48 4 22 5 35 40
Utah 49 6 4 8 35 56 32 16
Vermont 50 3 36 25 33
Virginia 51 6 47 37 24 54 11 21
Washington 53 2 41 16
West Virginia 54 5 51 21 24 39 42
Wisconsin 55 4 26 17 19 27
Wyoming 56 6 49 16 31 8 46 30 20
Style of Spatial Weight Matrix
• Row
– a weight of unity for each neighbor relationship
• Row standardization
– Symmetry not guaranteed
– can be interpreted as allowing the calculation of
average values across neighbors
• General spatial weights based on distances

21
Row vs. Row standardization

A B C Divide each
number by the
row sum
D E F
Total number of neighbors Row standardized
--some have more than others --usually use this
Row
Row
A B C D E F Sum
A B C D E F Sum
A 0.0 0.5 0.0 0.5 0.0 0.0 1
A 0 1 0 1 0 0 2
B 0.3 0.0 0.3 0.0 0.3 0.0 1
B 1 0 1 0 1 0 3
C 0.0 0.5 0.0 0.0 0.0 0.5 1
C 0 1 0 0 0 1 2
D 1 0 0 0 1 0 2 D 0.5 0.0 0.0 0.0 0.5 0.0 1
E 0 1 0 1 0 1 3 E 0.0 0.3 0.0 0.3 0.0 0.3 1
F 0 0 1 0 1 0 2 F 0.0 0.0 0.5 0.0 0.5 0.0 1
22
General Spatial Weights Based on
Distance
• Decay functions of distance
– Most common choice is the inverse (reciprocal) of the distance
between locations i and j (wij = 1/dij)
– Other functions also used
• inverse of squared distance (wij =1/dij2), or
• negative exponential (wij = e-d or wij = e-d2)

23
Distance-based Spatial Weight Matrix

A B C
D E F

A B C D E F
A 0 2 0 2 1 0
B 2 0.0 2 1 2 1
C 0 2 0 0 1 2
D 2 1 0 0 2 0
E 1 2 1 2 0 2
F 0 1 2 0 2 0
24
Measure of Spatial
Autocorrelation

25
Global Measures and Local Measures
• Global Measures
– A single value which applies to the entire data set
• The same pattern or process occurs over the entire
geographic area
• An average for the entire area
• Local Measures
– A value calculated for each observation unit
• Different patterns or processes may occur in different
parts of the region
• A unique number for each location
• Global measures usually can be decomposed
into a combination of local measures
26
Global Measures and Local Measures
• Global Measures
– Moran’s I
• Local Measures
– Local Moran’s I

27
Moran’s I
• The most common measure of Spatial Autocorrelation
• Use for points or polygons

Patrick Alfred Pierce Moran (1917-1988)


28
Formula for Moran’s I
n n
N åå w ij (x i - x)(x j - x)
i =1 j=1
I= n n n
(åå w ij )å (x i - x) 2
i =1 j=1 i =1

• Where:
N is the number of observations (points or polygons)
x is the mean of the variable
Xi is the variable value at a particular location
Xj is the variable value at another location
Wij is a weight indexing location of i relative to j 29
Moran’s I
• Varies on a scale between –1 through 0* to + 1

*technically it is:
-1 0 +1 –1/(n-1)

high negative spatial no spatial high positive spatial


autocorrelation autocorrelation* autocorrelation

Can also use it as an index for dispersion/random/cluster patterns.


Dispersed Pattern Random Pattern Clustered Pattern
DISPERSED

CLUSTERED
UNIFORM/

Briggs Henan University 2010 30


Moran’s I and Correlation Coefficient

• Correlation Coefficient [-1, 1]


– Relationship between two different variables
• Moran s I [-1, 1]
– Spatial autocorrelation and often involves one (spatially indexed)
variable only
– Correlation between observations of a spatial variable at location
X and spatial lag of X formed by averaging all the observation
at neighbors of X
n Correlation
å1(yi =1
i - y)(x i - x)/n Coefficient
Note the similarity of the
n n

å i å i
numerator (top) to the measures
(y - y) 2
(x - x) 2
of spatial association discussed
earlier if we view Yi as being the
i =1 i =1 Xi for the neighboring polygon
n n
(see next slide)
n n n n
n n
N åå w ij (x i - x)(x j - x)
i =1 j=1
ååw
i =1 j=1
ij (x i - x)(x j - x)/ åå w ij
i =1 j=1
n n n
(åå w ij )å (x i - x) 2
= n n

å i - 2
å i - 2
i =1 j=1 i =1

(x x) (x x)
Spatial i =1 i =1

auto-correlation n n 32
Source: Ron Briggs of UT Dallas
n Correlation
å1(y
i =1
i - y)(x i - x)/n
Coefficient
n n

å i
(y -
i =1
y) 2
å i
(x
i =1
- x) 2 Spatial
weights

n n
Yi is the Xi for the
n n n n
neighboring polygon
n n
N åå w ij (x i - x)(x j - x)
ååw
i =1 j=1
ij (x i - x)(x j - x)/ åå w ij
i =1 j=1
i =1 j=1
n n n =
(åå w ij )å (x i - x) 2
n n
i =1 j=1 i =1

å i
(x
i =1
- x) 2
å i
(x -
i =1
x) 2

Moran’s I n n 33
Source: Ron Briggs of UT Dallas
Moran Scatter Plots
We can draw a scatter diagram between these two variables (in
standardized form): X and lag-X (or W_X)

The slope of this regression line is


Moran’s I
34
Moran Scatter Plots
Locations of positive spatial association
(“I’m similar to my neighbors”).
Low/High High/High
negative SA positive SA
Q1 (values [+], nearby values [+]): H-H

Q3 (values [-], nearby values [-]): L-L

Locations of negative spatial association


(“I’m different from my neighbors”).

Low/Low High/Low
positive SA negative SA Q2 (values [-], nearby values [+]): L-H

Q4 (values [+], nearby values [-]): H-L

35
Moran Scatterplot: Example

36
Statistical Significance Tests for Moran’s I
• Based on the normal frequency distribution with

Where: I is the calculated value for Moran’s I


I - E(I ) from the sample
Z=
Serror ( I ) E(I) is the expected value if random
S is the standard error
• Statistical significance test
– Monte Carlo test, as we did for spatial pattern analysis
– Permutation test
• Non-parametric
• Data-driven, no assumption of the data
• Implemented in GeoDa

37
Test Statistic for Normal Frequency Distribution

*technically –1/(n-1)

2.5% 2.5% 1%
–1/(n-1) 1.96 2.54
Reject null -1.96 0
Reject null at 5%
Null Hypothesis: no spatial autocorrelation Reject null at 1%
*Moran s I = 0
Alternative Hypothesis: spatial autocorrelation exists
*Moran s I > 0
Reject Null Hypothesis if Z test statistic > 1.96 (or < -1.96)
---less than a 5% chance that, in the population, there is no
spatial autocorrelation 38
---95% confident that spatial auto correlation exits
Null Hypothesis: no spatial autocorrelation
*Moran s I = 0
Alternative Hypothesis: spatial autocorrelation exists
*Moran s I > 0
Reject Null Hypothesis if Z test statistic > 1.96 (or < -1.96)
---less than a 5% chance that, in the population, there is no
spatial autocorrelation
---95% confident that spatial auto correlation exits

39
Spatial Autocorrelation vs Correlation
Spatial Autocorrelation: Standard Correlation
shows the association or shows the association or
relationship between the relationship between two
same variable in “near- different variables
by” areas.

40
Bivariate Moran Scatter Plot

High/High
Low/High positive SA
negative SA

Low/Low High/Low
positive SA negative SA
41
Local Measures of
Spatial Autocorrelation

42
Local Indicators of Spatial Association (LISA)

• Local versions of Moran’s I


• Moran’s I is most commonly used, and the local version
is often called Anselin’s LISA, or just LISA

See:
Luc Anselin 1995 Local Indicators of Spatial
Association-LISA Geographical Analysis 27: 93-115

43
Local Indicators of Spatial Association (LISA)

• The statistic is calculated for each areal unit in the data


• For each polygon, the index is calculated based on neighboring
polygons with which it shares a border
• A measure is available for each polygon, these can be mapped
to indicate how spatial autocorrelation varies over the study
region
• Each index has an associated test statistic, we can also map
which of the polygons has a statistically significant relationship
with its neighbors, and show type of relationship

44
Example:

45
Calculating Anselin’s LISA
• The local Moran statistic for areal unit i is:
I i = zi å wij z j
j

where zi is the original variable xi in z = xi - x


i
SDx
“standardized form”
or it can be in “deviation form” x - x
i
and wij is the spatial weight
The summation åj is across each row i of the
spatial weights matrix.
An example follows
46
Example using seven China provinces
--caution: “edge effects” will strongly influences the
results because we have a very small number of
observations

47
Source: Ron Briggs of UT Dallas
Contiguity Matrix 1 2 3 4 5 6 7
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum Neighbors Illiteracy

Anhui 1 0 1 1 1 1 1 0 5 65432 14.49


Zhejiang 2 1 0 1 1 0 0 1 4 7431 9.36
Jiangxi 3 1 1 0 0 0 1 0 3 621 6.49
Jiangsu 4 1 1 0 0 0 0 1 3 721 8.05
Henan 5 1 0 0 0 0 1 0 2 61 7.36
Hubei 6 1 0 1 0 1 0 0 3 135 7.69
Shanghai 7 0 1 0 1 0 0 0 2 24 3.97

5
4
1
6 7
2
3
48
Source: Ron Briggs of UT Dallas
Contiguity Matrix and
Row Standardized Spatial Weights Matrix
Contiguity Matrix 1 2 3 4 5 6 7
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum

Anhui 1 0 1 1 1 1 1 0 5
Zhejiang 2 1 0 1 1 0 0 1 4
Jiangxi 3 1 1 0 0 0 1 0 3
Jiangsu 4 1 1 0 0 0 0 1 3
Henan 5 1 0 0 0 0 1 0 2
Hubei 6 1 0 1 0 1 0 0 3
Shanghai 7 0 1 0 1 0 0 0 2 1/3
Row Standardized Spatial Weights Matrix
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum

Anhui 1 0.00 0.20 0.20 0.20 0.20 0.20 0.00 1


Zhejiang 2 0.25 0.00 0.25 0.25 0.00 0.00 0.25 1
Jiangxi 3 0.33 0.33 0.00 0.00 0.00 0.33 0.00 1
Jiangsu 4 0.33 0.33 0.00 0.00 0.00 0.00 0.33 1
Henan 5 0.50 0.00 0.00 0.00 0.00 0.50 0.00 1
Hubei 6 0.33 0.00 0.33 0.00 0.33 0.00 0.00 1
Shanghai 7 0.00 0.50 0.00 0.50 0.00 0.00 0.00 1
Source: Ron Briggs of UT Dallas 49
Calculating standardized (z) scores
xi - x
zi =
Deviations from Mean and z scores.
X X-Xmean X-Mean2 z
SDx
Anhui 14.49 6.29 39.55 2.101
Zhejiang 9.36 1.16 1.34 0.387
Jiangxi 6.49 (1.71) 2.93 (0.572)
Jiangsu 8.05 (0.15) 0.02 (0.051)
Henan 7.36 (0.84) 0.71 (0.281)
Hubei 7.69 (0.51) 0.26 (0.171)
Shanghai 3.97 (4.23) 17.90 (1.414)

Mean and Standard Deviation


Sum 57.41 0.00 62.71
Mean 57.41 / 7 = 8.20
Variance 62.71 / 7 = 8.96
SD √ 8.96 = 2.99
50
Source: Ron Briggs of UT Dallas
Row Standardized Spatial Weights
Matrix
Calculating LISA
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai

Anhui 1 0.00 0.20 0.20 0.20 0.20 0.20 0.00


Zhejiang 2 0.25 0.00 0.25 0.25 0.00 0.00 0.25

wij
Jiangxi 3 0.33 0.33 0.00 0.00 0.00 0.33 0.00
Jiangsu 4 0.33 0.33 0.00 0.00 0.00 0.00 0.33
Henan 5 0.50 0.00 0.00 0.00 0.00 0.50 0.00
Hubei 6 0.33 0.00 0.33 0.00 0.33 0.00 0.00
Shanghai

I i = zi å wij z j
7 0.00 0.50 0.00 0.50 0.00 0.00 0.00
Z-Scores for row Province and its potential neighbors
Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai
Zi
Anhui 2.101 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414) j
Zhejiang

zj
0.387 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Jiangxi (0.572) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Jiangsu (0.051) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Henan (0.281) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Hubei (0.171) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Shanghai (1.414) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)

Spatial Weight Matrix multiplied by Z-Score Matrix (cell by cell multiplication)


Anhui Zhejiang Jiangxi Jiangsu Henan Hubei
wijzj
Shanghai SumWijZj LISA Lisa from
Zi 0.000 GeoDA
Anhui 2.101 - 0.077 (0.114) (0.010) (0.056) (0.034) - (0.137) -0.289 -0.248
Zhejiang 0.387 0.525 - (0.143) (0.013) - - (0.353) 0.016 0.006 0.005
Jiangxi (0.572) 0.700 0.129 - - - (0.057) - 0.772 -0.442 -0.379
Jiangsu (0.051) 0.700 0.129 - - - - (0.471) 0.358 -0.018 -0.016
Henan (0.281) 1.050 - - - - (0.085) - 0.965 -0.271 -0.233
Hubei (0.171) 0.700 - (0.191) - (0.094) - - 0.416 -0.071 -0.061
Shanghai (1.414) - 0.194 - (0.025) - - - 0.168 -0.238 -0.204

Source: Ron Briggs of UT Dallas 51


Results
Moran’s I = -.01889
Raw Data I expected Anhui to be
High-Low!
(high illiteracy
surrounded by low)
Low

Significance levels are calculated by


simulations. They may differ each
High
time software is run.
Province Literacy % LISA Significance
Anhui 14.49 -0.25 0.12
Zhejiang 9.36 0.01 0.46
Jiangxi 6.49 -0.38 0.04
Jiangsu 8.05 -0.02 0.32
Henan 7.36 -0.23 0.14
Low-High
Hubei 7.69 -0.06 0.28
Shanghai 3.97 -0.20 0.37
Example: Nepal Data

53
Bivariate LISA Moran Scatter Plot for GDI vs AL

• Moran s I is the correlation between X


and Lag-X--the same variable but in
nearby areas
– Univariate Moran s I
• Bivariate Moran s I is a correlation
between X and a different variable in
nearby areas.
Moran Significance Map for GDI vs. AL

54
Bivariate LISA
and the Correlation Coefficient
• Correlation Coefficient is the
relationship between two
different variables in the same
area
• Bivariate LISA is a correlation
between two different
variables in an area and in
nearby areas.

55
Consequences of Ignoring Spatial
Autocorrelation
• correlation coefficients and coefficients of
determination appear bigger than they really are
•You think the relationship is stronger than it really is
•the variables in nearby areas affect each other
• Standard errors appear smaller than they really are
•exaggerated precision
•You think your predictions are better than they really are
since standard errors measure predictive accuracy
•More likely to conclude
relationship is statistically significant.

56
Diagnostic of Spatial Dependence
• For correlation
– calculate Moran’s I for each variable and test its statistical
significance
– If Moran’s I is significant, you may have a problem!
• For regression
– calculate the residuals
map the residuals: do you see any spatial patterns?
– Calculate Moran’s I for the residuals: is it statistically
significant?

57
Summary
• Spatial autocorrelation of areal data
• Spatial weight matrix
• Measures of spatial autocorrelation
• Global Measure
– Moran s I
• Consequences of ignoring spatial
autocorrelation
• Significance test
58
• Please read O’S & Unwin Ch. 7 and Ch. 8.1
and 8.2
• End of this topic

59

You might also like