Section4 Exercise1 Detect Patterns
Section4 Exercise1 Detect Patterns
Exercise
Detect patterns
Section 4 Exercise 1
03/2020
Spatial Data Science MOOC
Detect patterns
Time to complete
45 minutes
Introduction
Statistical cluster analysis can help you minimize the subjectivity in your maps by identifying
meaningful clusters in your data. The Hot Spot Analysis and Outlier Analysis tools use
statistics to detect spatial patterns in your data, but each provide slightly different information
about these patterns.
Hot Spot Analysis uses the Getis-Ord Gi* statistic to identify statistically significant spatial
clusters of high values (hot spots) and low values (cold spots).
Outlier Analysis uses the Anselin Local Moran's I statistic to identify statistically significant
clusters of high and low values and to detect spatial outliers, or features with values
significantly dissimilar from their neighbors.
ArcGIS provides traditional and optimized statistical cluster analysis tools. The optimized
statistical cluster analysis tools interrogate your data to provide smart default values,
optimizing the analysis workflow. The traditional statistical cluster analysis tools allow you
more flexibility in defining the spatial relationships in your data, providing you with more
control of your analysis. In this exercise, you will use the optimized statistical cluster analysis
tools to explore the spatial patterns in the data.
Exercise scenario
Supplemental Nutrition Assistance Program (SNAP) is a federal program that helps families
buy nutritional food to maintain their health and well-being. In this exercise, you will complete
a Hot Spot Analysis and Outlier Analysis to find meaningful patterns of high and low SNAP
participation. This information can help decision makers distribute resources in a more
efficient and equitable way, ensuring that healthy food is accessible to all SNAP recipients.
e Click OK.
Your ArcGIS Pro project includes a map of the counties in the contiguous United States. Each
county is symbolized by the rate of the population that participated in SNAP during 2016.
g Click Run.
The result of your analysis is a layer displaying hot spots in three shades of red and cold spots
in three shades of blue. The varying shades correspond to three confidence intervals,
indicating how confident you can be that these patterns are meaningful and not the result of
random chance. You will review the analysis details to ensure that the parameters were
appropriate for your question.
If you choose the default values for the Optimized Hot Spot Analysis tool, review
the geoprocessing details to identify the default parameter values. Ensure that
these values are appropriate for the scale of your analysis.
_______________________________________________________________________________
The tool chose a default distance band of approximately 200 kilometers (km) based on the
average distance to 30 nearest neighbors. This default is a good place to start exploring your
data, but it may not represent the scale at which you want to analyze patterns in your dataset.
In this example, a 200 km distance band is too large because you want to analyze more local
patterns in SNAP participation. You will reduce the distance band to 100 km to detect more
local patterns in this county-level dataset.
e Under Distance Band, type 100 , and then next to Feet, click the down arrow and choose
Kilometers.
f Click Run.
Reducing the size of the distance band identified more detailed patterns. This scale is more
appropriate for this particular analysis.
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
The results of this statistical analysis provide a measure of confidence that can help you
identify areas with clusters of high SNAP participation. You can use this information to
investigate these areas and their access to stores that accept SNAP and carry healthy foods.
The Performance Adjustment field defines the number of permutations to create a random
distribution. The tool will then compare your data's spatial distribution with the randomly
generated values. To balance precision and processing time, you will leave the default. For
more information regarding permutations, see ArcGIS Pro Help: How Cluster and Outlier
Analysis (Anselin Local Moran's I) works.
When you compare the results of a Hot Spot Analysis and Outlier Analysis, use
the same distance band in the analysis.
g Click Run.
Note: The permutations in the Optimized Outlier Analysis tool compare your data values to a
set of randomly generated values. This means that your results may vary slightly from the
preceding graphic.
The bright red and blue features represent spatial outliers. Features with high values
surrounded by areas with low values are called High-Low outliers and are displayed in red.
Features with low values surrounded by areas with high values are called Low-High outliers
and are displayed in blue. The pink and light blue colors indicate clusters of features with
statistically significantly high values (pink) and statistically significantly low values (light blue).
These clusters typically align with the hot and cold spots from the Optimized Hot Spot
Analysis tool.
e In the map, click and drag to the left, to the right, or up and down to compare the Hot
Spot Analysis and Outlier Analysis results.
Using Hot Spot Analysis and Outlier Analysis, you located statistically significant clusters of
high SNAP participation. This information can help in the allocation of SNAP resources to
areas of higher food insecurities. The results can help drive the decision to distribute
resources in a more efficient and equitable way.
f At the top of the map view, next to Pattern Detection, click the X to close the map.
1. In the Notebook, from the Top Ribbon, click the Add button.
2. Search ArcGIS Online for the following layers:
• Analysis_field="SNAPRate"
• Output_name="SNAPHotSpots_<your initial's and today's date>"
• Distance_band=100
• Distance_band_unit='kilometers'
8. Assign the analysis result to a variable named HS_result.
9. Run the code cell.
10. In a new code cell, create a map in the notebook that displays HS_result.
• Analysis_field="SNAPRate"
• Output_name= "SNAPOutlier_<your initial's and today's date>"
• Distance_band=100
• Distance_band_unit='kilometers'
4. Assign the analysis result to a variable named OA_result.
5. Run the code cell.
6. In a new code cell, create a map in the notebook that displays OA_result using
OA_result['outliers_result_layer'].
Use the Lesson Forum to post your questions, observations, and syntax examples. Be sure to
include the #stretch hashtag in the posting title.
2. What statistically significant spatial patterns can you detect from this analysis?
Generally, the southeastern areas of the contiguous Unites States have statistically
significantly high SNAP participation, and the north central areas of the contiguous
Unites States have statistically significantly low SNAP participation.