Lesson 08 Data Visualization With Python
Lesson 08 Data Visualization With Python
You are a Sales Manager in a leading global organization. The organization plans to study the sales details of each product
across all regions and countries. This is to identify the product which has the highest sales in a particular region and up the
production. This research will enable the organization to increase the manufacture of that product in that particular region.
Data Visualization
The data involved in this research might be huge and complex. Manual research on this large numeric
data is difficult and time-consuming.
Data Visualization
Clarity includes ensuring that the dataset is complete and relevant. This enables the Data Scientist
to use the new patterns obtained from the data in the relevant places.
Considerations of Data Visualization
Accuracy includes ensuring that you use appropriate graphical representation to convey the
intended message.
Considerations of Data Visualization
Efficiency includes the use of efficient visualization techniques that highlight all the data points.
Factors of Data Visualization
Factors of Data Visualization
There are some basic factors that one needs to be aware of, before visualizing the data:
The data types and scale choose the type of data; for
example, numeric or categorical.
matplotlib
vispy pygal
bokeh folium
seaborn networkx
Python Data
Visualization Libraries
Python’s Matplotlib
Using Python’s matplotlib, the data visualization of large and complex data becomes easy.
matplotlib
There are several advantages of using matplotlib to visualize data. They are as follows:
Scripting Layer
(pyplot)
Artist Layer
(Artist)
Back-End Layer
(FigureCanvas, Renderer, Event)
Matplotlib Architecture
Comprised mainly of pyplot, a Comprised of one main object:: Artist Comprised of three built-in abstract
• Title, lines, tick labels, and images, all interface classes:
scripting interface then is lighter that
correspond to individual Artist
the Artist layer instances. 1. FigureCanvas: Encompasses the area
• Two types of Artist objects: onto which the figure is drawn
1. Primitive: Line2D, Rectangle, Circle,
2. Renderer: Knows how to draw on the
and Text
FigureCanvas
2. Composite: Axis, Tick, Axes, and
Figure
3. Event: Handles user inputs such as
• Each composite artist may contain other keyboard strokes and mouse clicks
Title
First Plot
1.1
Legend
1.0
0.9
0.8 Grid
Numbers
Y-axis 0.7
0.6
0.5
0.4
0.
0.3
2 0 1 3 4 5 6 7
Range
X-axis
Steps to Create a Plot
Import the
required libraries
Plot the numbers pyplot Step 01
First Plot
1.1
1.0
0.9
0.8
Numbers
0.7
0.6
0.5
0.4
0.3
0.2
0 1 3 4 5 6 7
Range
Create Your First Plot Using Matplotlib
Objective: Use the given FIFA 19 dataset, containing the detailed attributes for every player registered in the
latest edition of FIFA 19 database, to load the data and create a plot between Name and Potential of 10 players.
A leading global organization wants to know how many people visit its website in a particular time. This
analysis helps it control and monitor the website traffic.
2D plot
Users
Time
Plot with (X,Y)
List of users
Time
Website traffic
1800
1600
Number of users 1400
1200
1000
800
600
400
200
0
6 8 10 12 14 16 18
Hours
Controlling Line Patterns and Colors
Website traffic
180
0
1600
Number of users
140
0
1200
1000
80
0
60
0
40
0
20
00
6 8 1 1 1 1 1
0 2 4 6 8
Hours
Set Axis, Labels, and Legend Property
Using matplotlib, it is also possible to set the desired axis to interpret the result.
Website traffic
200
0 Web traffic
1500
Number of
users
1000
500
0
8 1 1 14 1
0 2 6
Hours
Create a Line Plot for Football Analytics
Objective: Use the given FIFA 19 dataset to create a line plot between Name and Sliding Tackle of 10 players. Also,
set the axis, labels, and legend property of the plot.
Annotate() method is used to annotate the graph. It has several attributes which help annotate the plot.
Monday
Website traffic
2000
Web traffic
1500
Number of users
1000
500
0
8 10 12 14 1
6
Hrs
Multiple Plots
Website traffic
2000
Monday
Tuesday
Wednesday
1500
Number of users
1000
500
0
8 10 12 14 1
6
Hrs
Create a Plot with Annotation
Objective: Use the given FIFA 19 dataset to create a plot of ShotPower of the first ten players. Also, annotate the
point of maximum ShotPower.
Objective: Use the given FIFA 19 dataset to create multiple plots of skills of 15 players. Use labels, legend, colors,
and linewidth to visualize the plot.
Reading a dataset
Retrieving fifteen columns from the
dataframe
For example,
Subplot(2,2,1) Subplot(2,2,2)
Subplot(2,1,1)
Grid divided
into two
vertically Grid divided
stacked plots Subplot(2,1,2) into four plots
Subplot(2,2,3) Subplot(2,2,4)
Layout
Layout and Spacing adjustments are two important factors to be considered while creating subplots.
Use the plt.subplots_adjust() method with the parameters hspace and wspace to adjust the distances
between the subplots and to move them around on the grid.
hspace
Top
Bottom
wspace
Create Multiple Subplots Using plt.subplots
Objective: Use the given FIFA 19 dataset to create four subplots to analyze the skills like ball control, strength,
penalties, and interceptions of ten players. Also, add legend for each plot.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots
Types of Plots: Histogram
Histogram
Scatter Plot
Histograms are graphical representations of a
Heat Map probability distribution. A histogram is a kind of
a bar chart.
Pie Chart bins
Using matplotlib and its bar chart function, you bins
Frequency
can create histogram charts.
Error Bar
Advantages of histogram charts:
Area plots
• Display the number of values within a
Word Clouds specified interval
• Are suitable for large datasets as they can be
Bar Charts grouped within the intervals
Box Plots Age
Waffle Charts
Types of Plots: Histogram
Dataset recap:
Types of Plots: Histogram
Types of Plots: Histogram
Create a Stacked Histogram
Objective: Use the given FIFA 19 dataset to create a stacked histogram plot of the attributes like potential and
composure of 10 players. Indicate the potential and composure plot using legend.
Histogram
A scatter plot is used to graphically display the relationships between variables.
Scatter Plot
Scatter() method is also recommended to control a plot.
Heat Map
Pie Chart
Advantages of scatter plot:
Error Bar
• Shows the correlation between variables
• Is suitable for large datasets
Area plots
• Is easy to find clusters
• Is possible to represent each piece of data as a
Word Clouds point on the plot
Bar Charts
Box Plots
Waffle Charts
Types of Plots: Scatter Plot
df_total
year total
1980 99137
1981 110563
1982 104271
1983 75550
1984 73417
. .
. .
Types of Plots: Scatter Plot
Create a Scatter Plot of Pretest Scores and Posttest Scores
Objective: Create a dataframe from following data: 'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 'female': [0, 1, 1, 0, 1], 'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, 2, 3],
'postTestScore': [25, 94, 57, 62, 70]
Draw a Scatterplot of preTestScore and postTestScore, with the size of each point determined by age.
Histogram
Scatter Plot A heat map is a way to visualize two-dimensional data. Using heat maps, you can gain
deeper and faster insights about data than other types of plots.
Heat Map
Advantages of heat map:
Pie Chart
• Draws attention to the risk-prone area
Error Bar • Uses the entire dataset to draw meaningful insights
• Is used for cluster analysis and can deal with large
Area plots datasets
Word Clouds
Bar Charts
Box Plots
Waffle Charts
Create a Heat Map to Analyze the Sepal Width, Petal Length,
and Petal Width of an Iris Dataset
Objective: Use an iris.csv to create a heat map to analyze the sepal width, petal length, and petal width. Indicate
the plot values using annotations.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Pie Chart
Histogram
Scatter Plot
Pie charts are used to show percentage or proportional data.
matplotlib provides the pie() method to create pie charts.
Heat Map
Bar Charts
Box Plots
Waffle Charts
Types of Plots: Pie Chart
Types of Plots: Pie Chart
Create a Pie Chart
Objective: Use BigMartSalesData.csv to plot a pie chart of the sales of the countries for the year 2011. Find the
country which contributes to the highest sales.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Error Bar
Histogram
An error bar is used to graphically represent the variability of data. It is used mainly to
Scatter Plot identify errors. It builds confidence about the data analysis by revealing the statistical
difference between the two groups of data.
Heat Map
Pie Chart
Area plots • Shows the variability in data and indicates the errors
• Depicts precision in the data analysis
Word Clouds • Demonstrates how well a function and model are
used in the data analysis
Bar Charts • Describes the underlying data
Box Plots
Waffle Charts
Create an Error Bar
Histogram
Scatter Plot
Heat Map
Pie Chart
Area (also known as area chart or area
Error Bar graph) is based on the line plot. This
plot is commonly used to represent
Area plot cumulated totals using numbers or
percentages over time.
Word Clouds
Bar Charts
Box Plots
Waffle Charts
Types of Plots: Area Plot
Types of Plots: Area Plot
Types of Plots: Area Plot
Area Chart to Display the Skills of the Players
Objective: Use fifa-data.csv dataset to create an area chart of the skills like SlidingTackle and StandingTackle of
the players.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Word Cloud
Histogram
A word cloud is a depiction of the frequency of different words in some textual data.
Scatter Plot
Heat Map
Pie Chart
Error Bar
Area plots
Word Cloud
Bar Charts
Box Plots
Waffle Charts
Create a Word Cloud of a Random Data
Objective: Install word cloud using pip install wordcloud and generate a random word cloud.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Bar Chart
Histogram
Scatter Plot
Heat Map
Pie Chart
Unlike a histogram, a bar chart is
Error Bar
commonly used to compare the values
Area plots of a variable at a given point in time.
Word Clouds
Bar Chart
Box Plots
Waffle Charts
Types of Plots: Bar Chart
Create a Bar Chart
Objective: Use fifa-data.csv dataset and create a bar chart to analyze the agility skill of any ten players.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Box Plot
Histogram
Scatter Plot
Heat Map
Pie Chart
Box plots are used for graphical
Error Bar
display of numerical data through
Area plots their quartiles.
Word Clouds
Bar Chart
Box Plots
Waffle Charts
Types of Plots: Box Plot
Create Box Plots
Objective: Use iris.csv dataset to create box plots using the following inputs:
1. Analyze the petal lengths of all the varieties of flowers
2. Study the distribution of several numerical variables, let’s say sepal length and sepal width
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Waffle Chart
Histogram
Scatter Plot
A waffle chart is an interesting visualization that is normally
Heat Map created to display progress toward goals.
Pie Chart
Country Total
Error Bar Immigrants
Denmark 3901
Area plots
Norway 2327
Word Clouds
Sweden 5866
Bar Chart
Box Plots
Waffle Charts
Create a Waffle Chart
Objective: Use Immigrants to Canada.csv dataset to create a waffle chart using REG as a field.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Seaborn and Regression Plots
Seaborn
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface to draw
attractive statistical graphics.
Advantages of seaborn:
A plot used to force fit independent variables against a dependent variable is a regression plot.
df_total
year total
1980 99137
1981 110563
1982 104271
1983 75550
1984 73417
. .
. .
Regression Plots
df_total
year total
1980 99137
1981 110563
1982 104271
1983 75550
1984 73417
. .
. .
Regression Plots
df_total
year total
1980 99137
1981 110563
1982 104271
1983 75550
1984 73417
. .
. .
Introduction to Folium
What Is Folium?
▪ Folium is a powerful Python library that helps you create several types of
Leaflet maps.
Density plot
Data points
KDE with Pandas and Seaborn
A diabetes dataset and KDE plot to visualize the insights of the dataset.
cancer_df['Target'].replace([0], 'malignant',
inplace=True)
cancer_df['Target'].replace([1], 'benign',
inplace=True)
There are two types of variables: numerical variables and categorical variables.
Variable
Numeric Categorical
Using the shape attribute to see the size of the new DataFrame
Understanding the Main Variable
Let’s understand the main variable, the SalePrice of the housing dataset.
The first thing to do with a categorical variable is to know their descriptive statistics:
a. Plot()
b. Plt.title()
c. Plot.title()
d. Title()
Knowledge
Check
Which of the following methods is used to set the title?
1
a. Plot()
b. Plt.title()
c. Plot.title()
d. Title()
a. plot.subplots_adjust()
b. plt.subplots_adjust()
c. subplots_adjust()
d. plt.subplots.adjust()
Knowledge
Check
Which of the following methods is used to adjust the distances between the subplots?
2
a. plot.subplots_adjust()
b. plt.subplots_adjust()
c. subplots_adjust()
d. plt.subplots.adjust()
a. %matplotlib
b. %matplotlib inline
c. import matplotlib
d. import style
Knowledge
Check
Which of the following libraries needs to be imported to display the plot on Jupyter notebook?
3
a. %matplotlib
b. %matplotlib inline
c. import matplotlib
d. import style
a. Legend
b. Alpha
c. Animated
d. Annotation
Knowledge
Check
Which of the following keywords is used to decide the transparency of the plot line?
4
a. Legend
b. Alpha
c. Animated
d. Annotation
Alpha decides the line transparency in line properties while plotting line plot/ chart.
Knowledge
Check
Which of the following plots is used to represent data in a two-dimensional manner?
5
a. Histogram
b. Heat Map
c. Pie Chart
d. Scatter Plot
Knowledge
Check
Which of the following plots is used to represent data in a two-dimensional manner?
5
a. Histogram
b. Heat Map
c. Pie Chart
d. Scatter Plot
Problem Statement:
BigMart is one of the biggest retailers of Europe and has operations across
multiple countries. You are a Data Analyst in the IT team of BigMart.
Invoice and SKU wise sales data for the years 2010 and 2011 is shared with
you. You need to prepare meaningful charts to showcase the various sales
trends for 2010 and 2011, to the top management.
Instructions to perform the assignment:
Download the dataset “BigMartSalesData.csv”. Use the data provided to
create visualizations of the trends.
Visualize the Sales Data
Steps to Perform:
• Plot Total Sales Per Month for the year 2011. How has the total sales
increased over the months? Which month has the lowest sales?
• Plot Total Sales Per Month for the year 2011 in a bar chart. Is bar chart
better to visualize than a simple plot?
• Plot a pie chart for the year 2010, country wise. Which country contributes
the highest and lowest towards sales? Create a pandas series with indexes
of the country-wise sales.
Thank You