Data Visualization and Communication Introduction
Data Visualization and Communication Introduction
Data visualization is the process of showing data using different visuals like graphs and charts. Organizing
the data in a pattern helps you analyze, evaluate, and generate reports.
• Analyze the concept of data visualization and its role in data analysis and report generation
• Evaluate different methods of data visualization and determine the appropriate method to use
based on the type of data
• Apply knowledge of data visualization methods to create and interpret tables, graphs, and charts
• Synthesize insights from data visualizations to make data-driven decisions and communicate
findings effectively
• Skills 4.2a and 4.3a: Create and derive conclusions from visualizations that compare one or more
categories of data
• Skills 4.2b and 4.3b: Create and derive conclusions from visualizations that show how individual
parts make up the whole
• Skills 4.2c and 4.3c: Create and derive conclusions from visualizations that analyze trends
• Skills 4.2d and 4.3d: Create and derive conclusions from visualizations that determine the
distribution of data
• Skills 4.2e and 4.3e: Create and derive conclusions from visualizations that analyze the
relationship between sets of values
Data reporting is the process of collecting and organizing raw data and representing it with a suitable
visualization to analyze the data. There are different visualization methods to organize and represent data
for analysis. This section helps you understand how data can be organized by using tables and charts.
• Disaggregate data
For example, tables can be typed directly into Excel. Table 4-1 illustrates that student data can be
organized using tables:
Table 4-1
ID Name Score
1182 James 80
3701 Matthew 66
3853 Robert 69
4461 Joseph 87
4641 Thomas 75
6001 Mike 85
6637 Anee 76
6701 Alen 88
8159 John 82
9225 Daniel 63
In Table 4-1, students’ id numbers, names, and scores are organized in rows and columns. It is easier to
interpret and compare data in tables than it is in regular text.
R can also be used to organize data into a table. First, you load each column into a list, and then use
a data.frame command to organize the lists into a table. One way to do this is shown in the following code
and Figure 4-1
ID <- c(1182, 3701, 3853, 4461, 4641, 6001, 6637, 6701, 8159, 9225)
Name <- c("James", "Matthew", "Robert", "Joseph", "Thomas", "Mike", "Anee", "Alen",
"John", "Daniel")
Score <- c(80, 66, 69, 87, 75, 85, 76, 88, 82, 63)
ScoreTable
Figure 4-1
Charts are another way of representing data. Using charts, data can be represented with different color
codes and patterns, which makes it easier to analyze the data. There are different types of graphical
representations used to visualize data, such as column charts, bar charts, pie charts, line charts, etc. The
students’ scores can be represented in the form of an Excel chart, using the following steps.
After entering the data in Table 4-1, next highlight data and headers (here in A1: C:11), click on Insert and
choose the 2-D column chart. Next, click on Select Data, and change the chart data range to
“=Sheetx!$B1:$C$11” where x is the sheet containing the data, then click OK like in Figure 4-2.
Figure 4-2
Data labels can be added to this chart by clicking Add Chart Element, then Data Labels, and finally Center.
The chart that is generated is shown in Figure 4-3.
Figure 4-3
In R, a basic chart can be made after loading the data into a data frame by using the barplot() command,
as below.
In these commands, height is the response variable (the y-axis values), names is the predictor variable
(the x-axis variable), ylim scales the y-axis, ylab gives the axis labels, the space command positions the
bars next to one another, and the las command rotates the x-axis labels to vertical. Because of the long
names, the xlab label command is left blank, and a text line is added below the chart for axis labeling
with mtext. The result is shown in Figure 4-4:
Figure 4-4
Both tables and charts can be used to visualize data. Depending on the purpose, different visualization
methods (tables or charts) can be used to display and analyze the data. If the purpose of the data analysis
is to sort or search, tables can be used. However, charts can be better suited to interpreting the data
visually.
Example: The following data report of a retail shop is represented using both a table and a chart.
First, detailed information about the items, unit price, units sold, purchase date, revenue, total cost, and
total price can be organized and formatted in a table.
In Excel, this is accomplished by entering the data in rows and columns and highlighting the dataset. On
the Home tab highlight Format as table and choose the desired style. Then check the data range and click
the “My data has headers” box, as shown in Figure 4-5:
Figure 4-5
This produces an easy-to-read table of retail data, as shown in Table 4-2 and provided as an Excel file
named “Table 4_2_Retail Data.xlsx” in the course downloads.
Table 4-2
Items Unit Price ($) Units Purchase Revenue Total Cost ($) Profit
Sold Date ($) ($)
Vegetables 150 2550 2/2/2022 450000 382500 67500
Fruits 200 3000 2/3/2022 750000 600000 150000
Grains 125 2250 4/2/2023 400250 281250 119000
Dairy 350 5000 1/15/2022 2550000 1750000 800000
Cosmetics 500 4000 3/22/2023 2750000 2000000 750000
Toys 120 3500 12/1/2022 550000 420000 130000
Stationery 50 3250 6/2/2022 245000 162500 82500
Items
Using this table, the profit for each item can be easily compared by using the Profit column to sort the
dataset from highest to lowest (or vice versa). However, the relationship between cost, profit and
revenue cannot be easily seen without a graphical representation.
Using a chart, the relationship between these key economic measurements is visualized. For instance, we
can assess
• how profit levels relate to revenue levels for this retail shop.
By asking these questions and using a graphical representation to answer these questions, we can begin
to start to make conclusions about how these economic measurements are related to each other, if they
are related at all.
In Excel, highlight the dataset and on the Insert tab choose 2-D Column. Right click on the chart that
appears and choose Select Data to see the window in Figure 4-6. Then under Horizontal (Category) Axis
Labels, choose the data for the labels (here A2:A8).
Figure 4-6
The resulting chart makes it much easier to visually inspect trends and comparisons in data, as shown
in Figure 4-7:
Figure 4-7
In R, a chart like this can be created by rearranging the data slightly and using the graphics library ggplot2.
First, the data is entered by using one column for Revenue, Cost, and Profit amounts and a second column
to indicate the category. This code can be run in your browser at rdrr.io or with RGui installed on your
machine. The generated table is shown in Figure 4-8.
Items <- c("Vegetables", "Fruits", "Grains", "Dairy", "Cosmetics", "Toys", "Stationery
Items", "Vegetables", "Fruits", "Grains", "Dairy", "Cosmetics", "Toys", "Stationery
Items", "Vegetables", "Fruits", "Grains", "Dairy", "Cosmetics", "Toys", "Stationery
Items")
UnitPrice <- c(150, 200, 125, 350, 500, 120, 50, 150, 200, 125, 350, 500, 120, 50, 150,
200, 125, 350, 500, 120, 50)
UnitsSold <- c(2500, 3000, 2250, 5000, 4000, 3500, 3250, 2500, 3000, 2250, 5000,
4000, 3500, 3250, 2500, 3000, 2250, 5000, 4000, 3500, 3250)
Dollars <- c(450000, 750000, 400250, 2550000, 2750000, 550000, 245000, 328500,
600000, 281250, 1750000, 2000000, 420000, 162500, 67500, 150000, 119000, 800000,
750000, 130000, 82500)
StoreTable
Figure 4-8
Immediately after creating the table, you can build the table with the ggplot2 package with the code
below.
library (ggplot2)
If you are using RGui and you get an error, you might need to install the ggplot2 package using the
following code. Fortunately, this line of code only needs to be run once.
install.packages("ggplot2")
In the ggplot command, you call up the data frame, then choose the fill (categories for separate bars), y
values (response variable), and x values (predictor variable). Then you use the command ‘dodge’, to plot
the bars next to one another for each value of x, as shown in Figure 4-9:
Figure 4-9
From these charts (R or Excel), you can notice that the profit for the two highest cost categories is a fair
bit higher than those for the lower cost categories. This can help direct efforts in the retail operations. A
column chart is an appropriate visualization in this scenario because it makes it easier for the audience to
compare profit, cost, and revenue (sales) across multiple categories by comparing the heights of the bars.
For example, in this scenario, the business can see that, although cosmetics generate more sales than
dairy, the higher cost makes dairy a slightly more profitable product.
Disaggregate data
Disaggregate data is aggregate data (sums, totals, averages, rates, etc.) that retains some of its original
information about different subgroups (gender, age, economic status, etc.) linked to these aggregated
measures. Analyzing disaggregated data allows you to retain the simplicity of summarized (aggregate)
data metrics but still makes available the ability to compare these measurements between and within
these subgroups.
A large data set can have a number of factors or attributes for each data point. For example, in Table 4-3a,
the aggregated data provides information on the average annual income of individuals based solely on
their country. However, by disaggregating this data, you can retain information about additional factors
such as skill level and gender. This disaggregated view enables you to gain insights into the variations and
differences based on these specific factors, offering a more detailed analysis of the income patterns and
disparities.
Table 4-3a
Table 4-3b
Code Sector
afs Accommodation, Food, and Service Activities
atp Air Transport
b_t Beverages and Tobacco Products
bph Basic Pharmaceutical Products
c_b Sugar Cane, Sugar Beet
Plotting and analyzing the data in the complete table is complicated because of the volume of data that
is available. In a disaggregated dataset, the data is broken down into smaller, more manageable subsets
based on specific criteria. It allows you to extract detailed insights by diving deeper into specific groups or
categories.
In Excel, you would highlight the entire data set and from the Insert tab choose the Pivot Table button.
When the table pops up, drag the Country and Sector fields to the Rows box, Gender and Skill Level to
the Columns box, and the Annual Income field to the Values box. Then click on the Annual Income field,
chose Value Field Settings, and the Average option, per Figure 4-10:
Figure 4-10:
Figure 4-10
You can divide into subgroups by using the filtering that you just set up with the headers. Using the down
arrow found in the Sector heading, choose the AFS (Accommodation, Food and Service Activities) as
in Figure 4-11, with the resulting table as in Figure 4-12:
Figure 4-11
Figure
4-12
Figure 4-12
This data can be displayed graphically, as in Figure 4-13. The filtered dataset is provided below. We can see
from this dataset that US workers earn a vastly larger wage than workers in Mexico, in all subgroups.
Furthermore, there is minimal difference between Female and Male Skilled Workers in Mexico, but a
significant difference between these two subgroups in the US.
Figure 4-13