Exercise: Explore Data Using Data Visualization Techniques
Exercise: Explore Data Using Data Visualization Techniques
Exercise
Explore data using data visualization
techniques
Section 1 Exercise 2
November 3, 2020
Spatial Data Science MOOC
Time to complete
80 minutes
Introduction
Data visualization helps you digest information by using symbols to visually represent
quantities and categories. You can quickly make comparisons and perceive relative
proportions, patterns, relationships, and trends. Data visualization is important throughout the
analysis process, from exploring your data, to interpreting your results, to communicating your
findings. There are various data visualization techniques available in ArcGIS. In this exercise,
you will use these techniques to explore your data and look for any interesting relationships
that may be useful in a predictive analysis.
Exercise scenario
Because voting is voluntary in the United States, the level of voter participation (referred to as
"voter turnout") has a significant impact on the election results and resulting public policy.
Modeling voter turnout, and understanding where low turnout is prevalent, can inform
outreach efforts to increase voter participation. In this exercise, you will use various
visualization techniques to explore relationships and patterns of voter turnout and to identify
potential variables to use in your predictive analysis.
c In the bottom-left corner of the Start page, click Open Another Project.
Note: If you have configured ArcGIS Pro to start without a project template or with a default
project, you will not see the Start page. On the Project tab, click Open, and then click Open
Another Project.
A Data Visualization map tab opens to a gray basemap with a map layer that contains the
2016 election results for each county.
b In the attribute table, scroll to the right to familiarize yourself with the variables included
in the dataset.
c Scroll back to the left, if necessary, and then right-click the Voter_Turnout attribute column
and choose Statistics.
e In the Chart Properties pane, under Variable, check the Show Normal Distribution box.
The overlays appear on the histogram. Using these overlays, you can see that the histogram
has a fairly symmetrical bell shape with nearly identical mean and median values. This
outcome indicates that the Voter_Turnout variable is normally distributed. In a normal
distribution, the mean, median, and mode values are equal. This outcome means that most
values fall near the average in the center of the distribution, with fewer and fewer values
appearing as you move farther from the center into the left and right tails.
You can also use the histogram to interactively select features on the map based on their
voter turnout values.
g In the chart window, drag a box around the bins with the lowest voter turnout values in
the left tail of the histogram, as shown in the following graphic.
The counties with the lowest voter turnout are highlighted in the histogram and in the map.
You can review these records in the attribute table.
i Drag the attribute table tab to the center of the map until the docking target appears, as
shown in the following graphic.
The attribute table is represented by a blue shadow, and docking targets appear in the center
of the map view and at the edges of the application window. Each target represents an area
where the window can be positioned. The blue shadow displays where the attribute table will
be docked when you release the click.
j Pause over the right docking target, and then release the click to dock the attribute table
to the right of the map and above the histogram.
k At the bottom of the attribute table window, click the Show Selected Records button .
The attribute table shows the records for the selected counties. You can review these records
in the table to verify their voter turnout values.
l In the right tail of the histogram, drag a box around the bins with the four highest voter
turnout values, as shown in the following graphic.
The counties with the highest voter turnout are selected in the histogram, map, and table.
n Close the attribute table and the chart window by clicking the Close button on each
window tab, as specified in the following graphic.
You have visualized the distribution of voter turnout values and identified where the lowest
and highest values fall on the map. Next, you will use layer symbology to visualize the spatial
distribution of voter turnout across the country.
b In the Symbology pane, under Primary Symbology, click the down arrow and choose
Graduated Colors.
Graduated Colors classifies the data into different ranges based on the values of a specified
attribute field. Each class is assigned a shade of color to show the relative difference between
the feature values. To learn more about graduated colors, see ArcGIS Pro Help: Graduated
colors.
You will specify the field and color ramp so that the symbology represents ranges of voter
turnout.
f Below Classes, drag the pane divider (as specified in the following graphic) down to see
the Color Scheme parameter.
h In the Color Scheme window, check the Show Names box, as specified in the following
graphic.
j In the Symbology pane, below the pane divider, click the Histogram tab.
Note: You may need to widen the Symbology pane to better see the histogram.
By applying a Graduated Colors symbology, you created what is commonly referred to as a
choropleth, or thematic, map. Choropleth maps visualize low-to-high values using light-to-
dark colors. Because you are using the Standard Deviation classification method, you applied
a diverging color scheme. A diverging color scheme classifies values based on how far they
are from the average. On the Histogram tab, you can see how the distribution of values
corresponds to the classes of color. The counties with below-average voter turnout are
represented in shades of purple, and the counties with above-average voter turnout are
represented in shades of green.
k In the Symbology pane, next to Color Scheme, click the Color Scheme Options button
and choose Apply To Fill And Outline.
Combining the Graduated Colors symbology with the state layer overlay, you can begin to
get a sense of how states compare to each other in terms of voter turnout. States like West
Virginia (WV) and Tennessee (TN) stand out as having low voter turnout, and states like
Colorado (CO) and Minnesota (MN) stand out as having high voter turnout. You will use a bar
chart to summarize and compare voter turnout by state.
a In the Contents pane, right-click CountyElections2016, point to Create Chart, and view
the available options.
d Click Apply.
The bar chart summarizes county voter turnout values by state. Each bar represents a state,
and the height of the bar corresponds to the mean voter turnout value. For more information
about bar charts, see ArcGIS Pro Help: Bar chart.
e At the bottom of the Chart Properties pane, under Sort, click the down arrow and choose
Y-Axis Descending.
Sorting the bars by value makes it easier to visually rank the states from highest to lowest
voter turnout.
f In the chart window, select the three states with the lowest average voter turnout.
Hawaii, West Virginia, and Tennessee have the lowest average voter turnout, which confirms
what you observed in the map. The bar chart summarizes the voter turnout for each state into
a single average value. Within each state, however, there can be quite a bit of variation in
voter turnout. To examine the individual county voter turnout within each state, you can use a
filtered bar chart.
b Click Apply.
c At the bottom of the Chart Properties pane, under Sort, click the down arrow and choose
Y-Axis Descending.
d In the chart window, on the toolbar next to Filter, click the Filter By Selection button .
Filter By Selection filters the chart to only show selected features. Because no features are
selected yet, the bar chart is empty.
e Dock the state bar chart window to the right of the map, above the county bar chart
window.
You now have a bar chart visualizing average voter turnout by state and a bar chart visualizing
individual county voter turnout values of selected features.
The county bar chart populates to show the individual county values within Georgia. Each bar
in the county bar chart corresponds to a single feature on the map, so the colors of the bars
match the map symbology.
You can use this interactive selection to see the range of individual county values within each
state. To compare the county values to the national average voter turnout value of 0.59
(identified in the histogram), you will add a guide to your chart.
g Click the county bar chart window tab to activate that chart.
The Chart Properties pane updates to show county information.
h At the top of the Chart Properties pane, click the Guides tab.
k For Line Color, click the line and choose a bright blue color.
A line appears in the county bar chart marking the national average voter turnout value.
Guides allow you to reference or highlight significant values or thresholds in your charts.
m In the state bar chart, click other states to see how their county voter turnout values vary
within the state and to compare the state's average voter turnout to the national average.
Note: If you previously used the Zoom Mode button , click the Select Interaction Mode
button so that you can select other states.
You have used bar charts and Filter By Selection to explore state voter turnout averages and
to examine the individual county voter turnout values for each state.
a Create a box plot for the CountyElections2016 layer using the following parameters:
b Click Apply.
The box plot chart allows you to visualize and compare the entire distribution of county voter
turnout values for each state. Box plots split numeric values into four equal quartiles and
visualize five key statistics for each distribution: minimum, first quartile, median, third quartile,
and maximum. The whiskers extending from the boxes span from the minimum value to the
maximum value, illustrating the full range of values found in each state. The boxes span from
the first quartile to the third quartile, illustrating the range of the middle half of values, or the
interquartile range (IQR). The IQR indicates the size of spread, or variability, in voter turnout
values in each state. For more information about box plots, see ArcGIS Pro Help: Box plot.
The ToolTip displays the key voter turnout statistics for the state. Texas has a relatively low
voter turnout average as a state. However, there is a wide range of county voter turnout
values, spanning from approximately 0.32 to approximately 0.88. The counties with voter
turnout values that are very different from the state average are considered outliers and are
displayed as dots beyond the plot's whiskers.
d In the box plot chart, select the Texas outliers, as shown in the following graphic.
b In the Chart Properties pane, under Numeric Fields, check the box for the following fields:
• Voter_Turnout
• 2019 Median Age
• 2019 Per Capita Income
• Own A Selfie Stick : Percent
• 2019 Education: High School/No Diploma : Percent (Note: Scroll down to the second
group of education field options.)
• Buying American Is Not Important To Me : Percent
A scatter plot matrix is a grid of scatter plots, also referred to as mini-plots, used to visualize
bivariate relationships between combinations of variables. Each scatter plot in the matrix
visualizes the relationship between a pair of variables, allowing many relationships to be
explored in one chart. A histogram visualizing the distribution of each individual variable can
also be included in the matrix. For more information about scatter plot matrices, see ArcGIS
Pro Help: Scatter plot matrix.
d In the Chart Properties pane, check the Show Linear Trend box.
A linear trend line is added to each scatter plot in the matrix. The direction of the trend line
indicates whether the variables have a positive or negative relationship, and the R-squared
(R2) value indicates the strength of the relationship. For more information about scatter plots,
see ArcGIS Pro Help: Scatter plot.
The mini-plots in the matrix are now visualized with a color gradient corresponding to the
strength of the R-squared value. You can select any mini-plot to view the relationship in more
detail using the larger preview plot. While every pairwise combination of variables is plotted
in the matrix, you are specifically interested in how each variable relates to voter turnout. The
column of mini-plots on the far left includes the relationships between voter turnout and the
other variables.
f In the scatter plot matrix, select the mini-plot comparing Voter_Turnout and 2019 Median
Age.
Median age has a positive relationship with voter turnout, where a higher median age
corresponds to a higher voter turnout. However, the R-squared value for this trend is 0.15,
which means that median age alone can only explain about 15 percent of the variability in the
voter turnout values.
g Select the mini-plot comparing Voter_Turnout and 2019 Per Capita Income.
Per capita income also has a positive relationship with voter turnout, where a higher per
capita income corresponds to a higher voter turnout. The R-squared value for this trend is
0.34, which means that per capita income can explain about 34 percent of the variability in
voter turnout values. Within the preview plot, you can see where some of the points deviate
from the trend. You can investigate those points using a selection.
h In the preview plot, select points that deviate from the trend to see where they fall on the
map, as shown in the following graphic.
The selected counties are highlighted on the map. To see if the variable relationships vary
spatially, you will filter the chart by the map extent.
a In the scatter plot matrix window, on the toolbar next to Filter, click the Filter By Extent
button .
b From the Map tab, in the Navigate group, click Bookmarks and choose the WV, VA, MD
bookmark.
Note: The R-squared values will vary based on the size of your map and chart windows.
The chart updates to calculate the relationships between the variables of the counties visible
in the map extent.
If you compare the R-squared values at a national scale to this local scale, you can see that the
relationships between voter turnout and per capita income and voter turnout and owning a
selfie stick has increased. However, the relationship between voter turnout and median age
has disappeared.
c Zoom and pan around the map to explore how variable relationships vary by scale and
location.
The changes in R-squared values indicate that the linear relationships between the variables
vary spatially. In the next step, you will explore and quantify different types of local
relationships using the Local Bivariate Relationships tool.
d Close the scatter plot matrix window and the Chart Properties pane.
c Open the Local Bivariate Relationships (Spatial Statistics Tools) tool and set the following
parameters:
e In the Contents pane, drag the US_States layer above the VoterTurnout_PerCapitaIncome
layer.
The Local Bivariate Relationships tool identifies not only linear relationships but also concave/
convex and other undefined complex relationships. The colors in the map correspond to the
type of relationship found in that area. Based on this output, the relationship between per
capita income and voter turnout varies by location. In some areas, there is no statistically
significant relationship between the two variables. In most areas, however, there is a
statistically significant positive linear relationship to voter turnout, where voter turnout
increases as per capita income increases.
f Click one of the counties symbolized as Positive Linear to open a pop-up window
visualizing the relationship.
g Scroll down in the pop-up window or expand it until you see the scatter plot.
You can use the pop-up windows to review a scatter plot of the selected county and its
neighbors, indicating the strength and type of the local variable relationship.
h When you are finished reviewing the information, close the pop-up window.
i Click a county symbolized as Convex to examine the shape of the relationship in the
scatter plot.
You have verified that voter turnout is related to per capita income across the majority of the
country. Based on this information, per capita income will likely be useful in your prediction
model. To see how the relationship between voter turnout and education varies spatially, you
will run Local Bivariate Relationships again.
k From the Geoprocessing pane, in the Local Bivariate Relationships tool, set the following
parameters:
l Click Run.
m In the Contents pane, drag the US_States layer above the VoterTurnout_HSDiploma layer.
Generally, the percent of the population with a high school diploma does not have a
statistically significant relationship with voter turnout. However, there is a statistically
significant negative linear relationship in most of Maine.
Based on this information, this variable would likely be useful if you were to create a
prediction model of voter turnout in Maine. This example demonstrates how the scale of your
analysis—in this case, country versus state—can impact which variables are relevant.
q If you would like to continue with this analysis, proceed to the optional stretch goal;
otherwise, close the map and the Geoprocessing pane, and then save the project and exit
ArcGIS Pro.
Use the Lesson Forum to post your questions, observations, and syntax examples. Be sure to
include the #stretch hashtag in the posting title.
When you are finished, close the map and Geoprocessing pane, and then save the project
and exit ArcGIS Pro.