Lesson 05 Data Visualization
Lesson 05 Data Visualization
Data Visualization
Learning Objectives
ABC is an organization that runs a social media platform. The organization wants
to present the platform data to its senior management and clients. The
presentation is supposed to provide insights into the users and their likes.
The organization decides to present the data with charts that show the required
data and convey the meaning.
To do so, it must explore different data visualization tools and techniques and
create the presentation using these.
Introduction to Data Visualization
Discussion
Discussion
Maps Charts
Visual aids
Tables Graphs
Data Visualization Tools
They provide an easy way to understand datasets in terms of data patterns, such as:
Presence of
Trends
outliers
These tools help reinforce descriptive information and enhance assimilation by readers.
Newspapers Magazines
Information technology advancements have also increased the volume and variety of data.
Discussion
It is a graphical representation of data attributes using rectangular bars with heights or lengths
proportional to the values they represent.
For example, in the image above, the heights are proportional to the
frequencies of occurrences.
Bar Chart
Blood
groups
A O
The number of plastic bags supplied for each group is shown in the table below:
A 26
B 21
O 37
Total: 84
Bar Chart
Shown below is the bar chart representing the data from the table.
40 37
35
30
15
10
0
A B O
Blood Group
The x-axis denotes the blood group, and the y-axis denotes the number of bags supplied.
Pie Chart
Example: The pie chart for the blood group data is shown below.
O A
A 26
31%
44%
B 21
O 37
25%
Total: 84 B
Histogram
Frequency
Class Interval
They visually display data points organized into specified ranges determined by the user.
Histogram
Example: Consider a frequency distribution table that depicts the occupancy rate for daily
wages and the number of workers
10–20 10
20–30 18
30–40 11
40–50 5
50–60 4
60–70 2
Total 50
Histogram
Shown below is the histogram that represents the data from the table.
18
16
14
Number of workers
12
10
18
8
6 11
10
4
5
2 4
2
0
10 20 30 40 50 60 70
Daily wages
Box Plot Chart
A box plot, also known as a box chart or box and whisker chart, is a graphical
representation that showcases groups of numerical data based on their
quartiles.
They summarize data spread using five important values namely minimum,
maximum, first quartile, third quartile, and median.
Box Plot Chart: Example
Determine the maximum, minimum, median, first, second, and third quartiles for the following dataset:
23, 42, 12, 10, 15, 14, and 9.
500
400
300
200
100
0 10 20 30 40 50 60
Scatter Plot
It can be determined
by observing whether
the data points are
scattered across the
graph or if they form
a band between two
variables.
Scattered data
A band indicates
indicates that the
that the variables
variables are
are related.
unrelated.
Scatter Plot
The following data represents the sales of two products at a retail outlet over a 10-day period
Product 1 10 15 21 27 28 33 41 44 51 52
Product 2 15 19 27 30 35 39 46 60 58 59
Scatter Plot
Shown below is the scatter plot chart depicting the data given in the table.
70
Y-Values
60
Sales of product 2
50
40
30
20
10
0
0 10 20 30 40 50 60
Sales of product 1
Values of sales for the first product are shown on the X-axis and sales for the second product
are shown on the Y-axis.
Scatter Plot
A narrow band indicates a relationship between the two variables, such as the sales of two products on
different days.
70
Y-Values
60
Sales of product 2
50
40
30
20
10
0
0 10 20 30 40 50 60
Sales of product 1
Scatter Plot
Plotted values are scattered across the chart and values do not fall in the band.
90
80
70
Sales of product 2
60
50
40
30
20
10
0
0 10 20 30 40 50 60
Sales of product 1
This is an extension of the scatter plot used to identify relationships between three
numerical variables.
Y-Values
4
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5
Y-Values
Bubble Plot
It is a data visualization that uses bubbles to represent data points, with the size of the bubble
indicating a third dimension of data.
Example: Plot a bubble chart using the three variables in the given data
Variable 1 78 80 88 78 70 75
Variable 2 82 79 77 74 72 76
Variable 3 87 79 77 80 78 74
Bubble Plot
The size of the bubbles should be proportional to the third variable's value.
84
The first variable varies over a broader
82
range, and hence the Y-axis is spread
over a wider range.
80
78
76
74
The reduced spread of the second
72 variable results in a narrower range
70 on the X-axis.
0 20 40 60 80 100 120
Interpretation of the Charts
Identifying Attributes in Charts
Values that occur more frequently in the attributes can be easily identified.
40 37
35
30
No. of bags supplied
26
25
21 31%
20
44%
15
10
5
0 25%
A B O
Blood Group
The heights of bars in a bar chart and the areas of sectors in pie charts are proportional to
the frequencies of the data they represent.
Presence or Absence of a Pattern
Here, the pattern is obtained as the frequency initially increases, reaches a peak, and
then gradually decreases.
When a line is drawn through the center of a symmetry histogram, its two halves
are identical.
The mean, median, and mode values are identical, and all fall within the center
of the distribution for a symmetric histogram.
Degree of Symmetry
The similarity or diversity of the set of observed values for a specific variable
is described by measures of spread.
A single data point that significantly deviates from the average value of a set of
statistics is referred to as an outlier.
A variety of charts have been developed and used owing to varied requirements in different situations.
Bar and pie charts are used to represent qualitative data, while histograms and box plots are used for
quantitative data.
Histogram
They visually represent the distribution of numerical data as well as any skewness present in the data.
The diagram incorporates information on extremes and quartiles which helps to project outliers.
Bar and Pie Charts
Bar and pie charts can be used to display multiple attributes of a characteristic.
Category 4 2.8
1.2
1.4
Category 3 1.8
3.2 8.2
Category 2 4.4
Category 1 2.4
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0 1 2 3 4 5 6
Series 3 Series 2 Series 1
When two or more datasets with different characteristics need simultaneous study, the
following charts are helpful:
3.5
3.2
4
3
2.7
3.5
2.5 3.2
3
2.7
2 2.5
1.5 2
1.5
1 0.8
1
0.8
0.5
0.5
0 0
0 0 .5 1 1 .5 2 2 .5 3 0 0.5 1 1.5 2 2.5 3 3.5
The bubble plot incorporates three attributes, while the scatter plot incorporates just two.
Uses of Charts
Dispersion
Central tendency
They also enable the user to identify the relationships between the variables.
Charts for Quality Control
The following set of quality improvement activities has been successfully implemented by
numerous businesses across different industries.
Number of Revenue
rooms generated
Decide the Variables
A strategically planned differential pricing system could ensure good room occupancy.
The use of data on rooms occupied The use of data on monetary values
helps to determine room occupancy. helps to assess financial performance.
There are two ways to assess the quality of wires produced in terms of diameters:
Histograms are used to present data Bar charts are used to display data
from measurements. based on the classification.
Identify the Heterogeneity in Datasets
Determine the factors that could make datasets heterogenous and avoid using such data directly
Identify the Heterogeneity in Datasets
4
Season 1
3
Season 2
2 Season 3
0
Room 1 Room 2 Room 3 Room 4
For the viewer to have an idea of the reliability of the information portrayed, the
following terms can be included:
Period of data
Sample size used
collection
Data Collection and Chart Construction
After a thorough understanding of the context and the scope of the study, plans should be
synchronized for:
Data visualization is recognized as the process of displaying data to provide insights that will support
better decisions, that is, telling the story behind the data.
Case Study: Deciding Variables
The coach focused on a few frequently occurring faults and offered suggestions.
On observing again, the frequency of faults had decreased in the player’s performance.
The coach wanted the statistician to present the results in an illustrative way to promote his
services to potential players.
The statistician felt bar diagrams could be a powerful way of highlighting this data.
Case Study: Deciding Variables
In the coach's visualization for each fault, the two rectangles were positioned adjacently.
The differences in the height for each type of fault communicated the improvement.
Case Study: Data Visualization
Case Study
Investigate the use of data visualization to analyze the number of journals and
publications by the physics departments of three universities
University A
University B
University C
Tasks to Perform
The table shows the data collected from the three universities:
University code A B C A B A C
No. of publications 3 2 7 6 7 6 9
University code A B A B C A C
Journal code IV IV V V V VI VI
No. of publications 3 2 7 6 7 6 9
Tasks to Perform
Use data visualization to construct bar diagrams and derive insights from the collected data
Attribute Data
University code
Journal code
Solution
Effective data visualization enables the analysis of the number of journals used by each University to
disseminate its research work.
Frequency Distribution Table I
The first frequency distribution table has the number of journals used by each University to
disseminate the research as shown:
A 6
B 4
C 4
Total 14
Bar Chart for Number of Journals Used
It is evident from the bar chart that University A used the highest number of journals to disseminate its
research.
7
6
5
Frequency
4
3
2
1
0 A B C
University code
Frequency Distribution Table II
The second frequency distribution table depicts the total number of publications incorporating all the
journals is as shown:
Frequency
University code
(no. of publications)
A 31
B 17
C 32
Total 80
Bar Chart for Number of Publications
This bar chart clearly shows that University C has the highest number of publications.
40
35
30
25
20
15
10
5
0 A B C
Comparison of Journals and Publications
The following bar charts clearly show that the total number of papers published by University A was
fewer than that of University C.
7 40
35
6
30
5
25
4
20
3
15
2 10
1 5
0 A B C 0 A B C
Bar chart for the number of journals Bar chart for the number of publications
Frequency Distribution Table III
The third frequency distribution table depicts the total number of universities with publications in each
journal.
Frequency
Journal code
(no. of universities)
I 3
II 3
III 1
IV 2
V 3
VI 2
TOTAL 14
Bar Chart for Number of Campuses Using All Journals
The bar chart illustrates the publication count for different universities across various journals,
indicating their utilization of each journal for research dissemination.
0
I II III IV V VI
Inference
The bar graph reveals that the first, second, and fifth journals have publications from all universities.
Other journals have publications from fewer universities.
0 I II III IV V VI
Frequency Distribution Table IV
The frequency distribution table highlighting the number of publications in each journal is as shown:
Frequency
Journal code
(no. of publications)
I 12
II 22
III 6
IV 8
V 25
VI 11
Total 84
Bar Chart for Number of Publications in Each Journal
The bar chart displayed below illustrates the number of publications and the utilization of each journal
for research dissemination.
30
25
20
15
10
0
I II III IV V VI
Data Visualization
The four frequency distribution tables and the respective charts together illustrate that the data chosen
must be based on the:
Issues to be addressed
Insights to be derived
Data Visualization
The data selected for the examination should facilitate insightful decision-making and provide
comprehension of its effects.
When statistics and data visualization integrate, it improves exploratory data and enables users to
generate significant discoveries.
Key Takeaways
Outliers in data are values that significantly differ from most other
observations in a dataset.
Knowledge Check
Knowledge
Check Which of the following displays many pairs of observations to highlight the relationship
1 between the two sets of data?
A. Bar chart
B. Box plot
C. Scatter plot
D. Bubble plot
Knowledge
Check Which of the following displays many pairs of observations to highlight the relationship
1 between the two sets of data?
A. Bar chart
B. Box plot
C. Scatter plot
D. Bubble plot
A scatter plot displays many pairs of observations to highlight the relationship between the two sets
of data.
Knowledge
Check Which of the following is used to identify the relationship between three numerical
2 variables?
A. Box chart
B. Bar plot
C. Bubble plot
D. Scatter plot
Knowledge
Check Which of the following is used to identify the relationship between three numerical
2 variables?
A. Box chart
B. Bar plot
C. Bubble plot
D. Scatter plot
A bubble plot is used to identify the relationships between three numerical variables.
Knowledge
Check
___________________ are rectangles of equal widths that represent a set of attribute data.
3
A. Bar charts
B. Pie charts
C. Histograms
D. Box plots
Knowledge
Check
___________________ are rectangles of equal widths that represent a set of attribute data.
3
A. Bar charts
B. Pie charts
C. Histograms
D. Box plots
Bar charts are rectangles of equal widths that represent a set of attribute data.
Thank You