Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
190 views

Lesson 08 Data Visualization With Python

This document discusses data visualization with Python. It explains that data visualization is a technique to present data graphically and makes complex data easier to understand. Python is considered one of the best tools for data visualization due to libraries like Matplotlib. Matplotlib allows users to create various plots and graphs with high quality and customization. The document outlines the key components of Matplotlib's architecture and the basic steps for creating a plot, which include importing libraries, defining data, setting plot parameters, and displaying the plot.

Uploaded by

Sumanta Sinhatal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views

Lesson 08 Data Visualization With Python

This document discusses data visualization with Python. It explains that data visualization is a technique to present data graphically and makes complex data easier to understand. Python is considered one of the best tools for data visualization due to libraries like Matplotlib. Matplotlib allows users to create various plots and graphs with high quality and customization. The document outlines the key components of Matplotlib's architecture and the basic steps for creating a plot, which include importing libraries, defining data, setting plot parameters, and displaying the plot.

Uploaded by

Sumanta Sinhatal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 125

Data Analytics with Python

Data Visualization with Python


Learning Objectives

By the end of this lesson, you will be able to:

Explain data visualization and its importance in today’s world

Understand why Python is considered one of the best data


visualization tools

Describe matplotlib and its data visualization features in


Python

List the types of plots and the steps involved in creating


these plots
Data Visualization
Data Visualization

Data visualization is a technique to present the data in a pictorial or graphical format.


Data Visualization

You are a Sales Manager in a leading global organization. The organization plans to study the sales details of each product
across all regions and countries. This is to identify the product which has the highest sales in a particular region and up the
production. This research will enable the organization to increase the manufacture of that product in that particular region.
Data Visualization

The data involved in this research might be huge and complex. Manual research on this large numeric
data is difficult and time-consuming.
Data Visualization

When these numeric data are plotted on a graph or converted to charts, it is


easy to identify the patterns and predict the result accurately.
Data Visualization

The main benefits of data visualization are as follows:

Explores new patterns and Simplifies the complex


reveals hidden patterns quantitative information

Identifies the relationship between Analyzes and explores


data points and variables Big Data easily
Considerations of Data Visualization
Considerations of Data Visualization

Three major considerations for data visualization are:

Clarity includes ensuring that the dataset is complete and relevant. This enables the Data Scientist
to use the new patterns obtained from the data in the relevant places.
Considerations of Data Visualization

Three major considerations for data visualization are:

Accuracy includes ensuring that you use appropriate graphical representation to convey the
intended message.
Considerations of Data Visualization

Three major considerations for data visualization are:

Efficiency includes the use of efficient visualization techniques that highlight all the data points.
Factors of Data Visualization
Factors of Data Visualization

There are some basic factors that one needs to be aware of, before visualizing the data:

The visual effect includes the usage of appropriate shapes,


colors, and sizes to represent the analyzed data.

The coordinate system helps organize the data points


within the provided coordinates.

The data types and scale choose the type of data; for
example, numeric or categorical.

The informative interpretation helps create visuals in an effective and


easily interpretable manner using labels, titles, legends, and pointers.
Python Libraries
Python Libraries

Many Python data visualization libraries are being introduced recently.

matplotlib

vispy pygal

bokeh folium

seaborn networkx

Python Data
Visualization Libraries
Python’s Matplotlib

Using Python’s matplotlib, the data visualization of large and complex data becomes easy.

matplotlib

Python 2D plotting library


Python Libraries: Matplotlib

There are several advantages of using matplotlib to visualize data. They are as follows:

Has high-quality With Jupyter notebook


Can work well with graphics and plots to integration, the
many operating print and view for a developers are free to Has large
systems and graphics range of graphs spend their time community support
back-ends implementing features and cross platform
support as it is an
Is a multi-platform open source tool
data visualization
tool; therefore, it is Has full control
fast and efficient over graphs or
plot styles
Matplotlib Architecture

Scripting Layer
(pyplot)

Artist Layer
(Artist)

Back-End Layer
(FigureCanvas, Renderer, Event)
Matplotlib Architecture

Back-end layer (FigureCanvas,


Scripting layer (pyplot) Artist layer (Artist)
Renderer, Event)

Comprised mainly of pyplot, a Comprised of one main object:: Artist Comprised of three built-in abstract
• Title, lines, tick labels, and images, all interface classes:
scripting interface then is lighter that
correspond to individual Artist
the Artist layer instances. 1. FigureCanvas: Encompasses the area
• Two types of Artist objects: onto which the figure is drawn
1. Primitive: Line2D, Rectangle, Circle,
2. Renderer: Knows how to draw on the
and Text
FigureCanvas
2. Composite: Axis, Tick, Axes, and
Figure
3. Event: Handles user inputs such as
• Each composite artist may contain other keyboard strokes and mouse clicks

composite artists as well as primitive


artists.
The Plot

A plot is a graphical representation of data which shows the


relationship between two variables or the distribution of data.

Title

First Plot
1.1
Legend
1.0
0.9
0.8 Grid
Numbers

Y-axis 0.7
0.6
0.5
0.4
0.
0.3
2 0 1 3 4 5 6 7
Range

X-axis
Steps to Create a Plot

You can create a plot using four simple steps:

Step 04: Display the created plot


Step 03: Set the plot parameters

Step 02: Define or import the required


dataset
Step 01: Import the required libraries
Steps to Create Plot: Example

Import the
required libraries
Plot the numbers pyplot Step 01

set the grid style


style

used numpy random method


to generate random numbers
Define or import the
required dataset
view the created
Step 02
random numbers

Set the plot


ggplot Set the style parameters
Step 03
Set the legend

Set line width


Set coordinates labels

Set the title Display the


created plot
Plot the graph Step 04
Display the created plot
Steps to Create Plot: Example

First Plot
1.1

1.0

0.9

0.8
Numbers

0.7

0.6

0.5

0.4

0.3
0.2
0 1 3 4 5 6 7
Range
Create Your First Plot Using Matplotlib

Objective: Use the given FIFA 19 dataset, containing the detailed attributes for every player registered in the
latest edition of FIFA 19 database, to load the data and create a plot between Name and Potential of 10 players.

Access: To execute the practice, follow these steps:


• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Line Properties
Line Properties

Line Properties Plot Graphics

linestyle linewidth marker


1 alpha 2 animated 1 2 3
style

set the set the


transparency of transparency of
the line the line

matplotlib also offers various line colors.


Line Properties

Property Value Type


alpha float
Alias Color
animated [True | False] b Blue
antialiased or aa [True | False] r Red
c Cyan
clip_box a matplotlib.transform.Bbox instance m Magenta
g Green
clip_on [True | False]
y Yellow
a Path instance and a Transform k Black
clip_path
instance, a Patch
w White
color or c any matplotlib color
contains the hit testing function
dash_capstyle ['butt' | 'round' | 'projecting']
linestyle or ls [ '-' | '--' | '-.' | ':' | 'steps' | ...]
linewidth or lw float value in points
marker [ '+' | ',' | '.' | '1' | '2' | '3' | '4' ]
Plot with (X,Y)

A leading global organization wants to know how many people visit its website in a particular time. This
analysis helps it control and monitor the website traffic.

2D plot

Users

Time
Plot with (X,Y)

List of users
Time

Use %matplotlib inline to display or view the plot on Jupyter notebook.


Plot with (X,Y)

Website traffic
1800

1600
Number of users 1400

1200

1000

800

600

400

200
0
6 8 10 12 14 16 18
Hours
Controlling Line Patterns and Colors

Line Color (blue) Dashed (--)

Website traffic
180
0
1600

Number of users
140
0
1200
1000
80
0
60
0
40
0
20
00
6 8 1 1 1 1 1
0 2 4 6 8

Hours
Set Axis, Labels, and Legend Property

Using matplotlib, it is also possible to set the desired axis to interpret the result.

Axis is used to define the range on the x-axis and y-axis.

Set the axis

Website traffic
200
0 Web traffic
1500

Number of
users
1000

500

0
8 1 1 14 1
0 2 6
Hours
Create a Line Plot for Football Analytics

Objective: Use the given FIFA 19 dataset to create a line plot between Name and Sliding Tackle of 10 players. Also,
set the axis, labels, and legend property of the plot.

Access: To execute the practice, follow these steps:


• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Multiple Plots and Subplots
Alpha and Annotation

Alpha is an attribute that controls the transparency of the line.


The lower the alpha value, the more transparent the line is.
Alpha and Annotation

Annotate() method is used to annotate the graph. It has several attributes which help annotate the plot.

“Max” denotes the annotation text,


“ha” indicates the horizontal alignment,
“va” indicates the vertical alignment,
“xytext” indicates the text position,
“xy” indicates the arrow position, and
“arrowprops” indicates the properties of the arrow.
Multiple Plots

Monday

Website traffic
2000
Web traffic

1500
Number of users

1000

500

0
8 10 12 14 1
6
Hrs
Multiple Plots

Web traffic data

Set different colors and line


widths for different days
Multiple Plots

Website traffic

2000
Monday
Tuesday
Wednesday
1500
Number of users

1000

500

0
8 10 12 14 1
6
Hrs
Create a Plot with Annotation

Objective: Use the given FIFA 19 dataset to create a plot of ShotPower of the first ten players. Also, annotate the
point of maximum ShotPower.

Access: To execute the practice, follow these steps:


• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Create Multiple Plots to Analyze the Skills of the Players

Objective: Use the given FIFA 19 dataset to create multiple plots of skills of 15 players. Use labels, legend, colors,
and linewidth to visualize the plot.

Access: To execute the practice, follow these steps:


• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Create Multiple Plots to Analyze the Skills of the Players

Reading a dataset
Retrieving fifteen columns from the
dataframe

Creating plots for the skills like


Marking, SprintSpeed,
Aggression, and Dribbling

Output without the legends, xticks


values, and labels
Create Multiple Plots to Analyze the Skills of the Players

Using legend to indicate the multiple plots

Setting the label names


Setting the x-axis tick values

Final output with the legends, xticks


values, and labels
Subplots

Subplots are used to display multiple plots in the same window.

With subplot, you can arrange plots in a regular grid.

The syntax for subplot is

subplot(m,n,p). It divides the current window into an m-by-n grid and


creates an axis for a subplot in the position specified by p.
Subplots

For example,

subplot(2,1,2) creates two subplots which are stacked vertically on a grid.

subplot(2,1,4) creates four subplots in one window.

Subplot(2,2,1) Subplot(2,2,2)
Subplot(2,1,1)
Grid divided
into two
vertically Grid divided
stacked plots Subplot(2,1,2) into four plots

Subplot(2,2,3) Subplot(2,2,4)
Layout

Layout and Spacing adjustments are two important factors to be considered while creating subplots.

Use the plt.subplots_adjust() method with the parameters hspace and wspace to adjust the distances
between the subplots and to move them around on the grid.

hspace
Top

Bottom
wspace
Create Multiple Subplots Using plt.subplots

Objective: Use the given FIFA 19 dataset to create four subplots to analyze the skills like ball control, strength,
penalties, and interceptions of ten players. Also, add legend for each plot.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots
Types of Plots: Histogram

You can create different types of plots using matplotlib:

Histogram

Scatter Plot
Histograms are graphical representations of a
Heat Map probability distribution. A histogram is a kind of
a bar chart.
Pie Chart bins
Using matplotlib and its bar chart function, you bins

Frequency
can create histogram charts.
Error Bar
Advantages of histogram charts:
Area plots
• Display the number of values within a
Word Clouds specified interval
• Are suitable for large datasets as they can be
Bar Charts grouped within the intervals
Box Plots Age

Waffle Charts
Types of Plots: Histogram

Dataset recap:
Types of Plots: Histogram
Types of Plots: Histogram
Create a Stacked Histogram

Objective: Use the given FIFA 19 dataset to create a stacked histogram plot of the attributes like potential and
composure of 10 players. Indicate the potential and composure plot using legend.

Access: To execute the practice, follow these steps:


• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Scatter Plot

You can create different types of plots using matplotlib:

Histogram
A scatter plot is used to graphically display the relationships between variables.
Scatter Plot
Scatter() method is also recommended to control a plot.
Heat Map

Pie Chart
Advantages of scatter plot:
Error Bar
• Shows the correlation between variables
• Is suitable for large datasets
Area plots
• Is easy to find clusters
• Is possible to represent each piece of data as a
Word Clouds point on the plot

Bar Charts

Box Plots

Waffle Charts
Types of Plots: Scatter Plot

df_total

year total

1980 99137

1981 110563

1982 104271

1983 75550

1984 73417

. .
. .
Types of Plots: Scatter Plot
Create a Scatter Plot of Pretest Scores and Posttest Scores

Objective: Create a dataframe from following data: 'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 'female': [0, 1, 1, 0, 1], 'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, 2, 3],
'postTestScore': [25, 94, 57, 62, 70]
Draw a Scatterplot of preTestScore and postTestScore, with the size of each point determined by age.

Access: To execute the practice, follow these steps:


• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Heat Map

You can create different types of plots using matplotlib:

Histogram

Scatter Plot A heat map is a way to visualize two-dimensional data. Using heat maps, you can gain
deeper and faster insights about data than other types of plots.
Heat Map
Advantages of heat map:
Pie Chart
• Draws attention to the risk-prone area
Error Bar • Uses the entire dataset to draw meaningful insights
• Is used for cluster analysis and can deal with large
Area plots datasets

Word Clouds

Bar Charts

Box Plots

Waffle Charts
Create a Heat Map to Analyze the Sepal Width, Petal Length,
and Petal Width of an Iris Dataset

Objective: Use an iris.csv to create a heat map to analyze the sepal width, petal length, and petal width. Indicate
the plot values using annotations.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Pie Chart

You can create different types of plots using matplotlib:

Histogram

Scatter Plot
Pie charts are used to show percentage or proportional data.
matplotlib provides the pie() method to create pie charts.
Heat Map

Pie Chart Advantages of pie chart:

Error Bar • Summarizes a large dataset in visual form


• Displays the relative proportions of multiple
Area plots classes of data
• Size of the circle is made proportional to the total
Word Clouds quantity

Bar Charts

Box Plots

Waffle Charts
Types of Plots: Pie Chart
Types of Plots: Pie Chart
Create a Pie Chart

Objective: Use BigMartSalesData.csv to plot a pie chart of the sales of the countries for the year 2011. Find the
country which contributes to the highest sales.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Error Bar

Histogram
An error bar is used to graphically represent the variability of data. It is used mainly to
Scatter Plot identify errors. It builds confidence about the data analysis by revealing the statistical
difference between the two groups of data.
Heat Map

Pie Chart

Error Bar Advantages of error bar:

Area plots • Shows the variability in data and indicates the errors
• Depicts precision in the data analysis
Word Clouds • Demonstrates how well a function and model are
used in the data analysis
Bar Charts • Describes the underlying data

Box Plots

Waffle Charts
Create an Error Bar

Objective: Use the given data to plot the error bar:


Let the x-axis data points and y-axis data points be x = [1,2,3,4], y = [20, 21, 20.5, 20.8]
• Draw a simple plot and configure the line and markers in the plot
• Configure the axes, provide title to the graph, and label the x axis and y axis
• Give error bar if y_error = [0.12, 0.13, 0.2, 0.1]
• Define width and height as figsize=(4,5) DPI, adjust plot dpi=100, and change font size to 14
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Area Plot

Histogram

Scatter Plot

Heat Map

Pie Chart
Area (also known as area chart or area
Error Bar graph) is based on the line plot. This
plot is commonly used to represent
Area plot cumulated totals using numbers or
percentages over time.
Word Clouds

Bar Charts

Box Plots

Waffle Charts
Types of Plots: Area Plot
Types of Plots: Area Plot
Types of Plots: Area Plot
Area Chart to Display the Skills of the Players

Objective: Use fifa-data.csv dataset to create an area chart of the skills like SlidingTackle and StandingTackle of
the players.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Word Cloud

Histogram
A word cloud is a depiction of the frequency of different words in some textual data.
Scatter Plot

Heat Map

Pie Chart

Error Bar

Area plots

Word Cloud

Bar Charts

Box Plots

Waffle Charts
Create a Word Cloud of a Random Data

Objective: Install word cloud using pip install wordcloud and generate a random word cloud.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Bar Chart

Histogram

Scatter Plot

Heat Map

Pie Chart
Unlike a histogram, a bar chart is
Error Bar
commonly used to compare the values
Area plots of a variable at a given point in time.

Word Clouds

Bar Chart

Box Plots

Waffle Charts
Types of Plots: Bar Chart
Create a Bar Chart

Objective: Use fifa-data.csv dataset and create a bar chart to analyze the agility skill of any ten players.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Box Plot

Histogram

Scatter Plot

Heat Map

Pie Chart
Box plots are used for graphical
Error Bar
display of numerical data through
Area plots their quartiles.

Word Clouds

Bar Chart

Box Plots

Waffle Charts
Types of Plots: Box Plot
Create Box Plots

Objective: Use iris.csv dataset to create box plots using the following inputs:
1. Analyze the petal lengths of all the varieties of flowers
2. Study the distribution of several numerical variables, let’s say sepal length and sepal width
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Types of Plots: Waffle Chart

Histogram

Scatter Plot
A waffle chart is an interesting visualization that is normally
Heat Map created to display progress toward goals.

Pie Chart
Country Total
Error Bar Immigrants
Denmark 3901
Area plots
Norway 2327
Word Clouds
Sweden 5866
Bar Chart

Box Plots

Waffle Charts
Create a Waffle Chart

Objective: Use Immigrants to Canada.csv dataset to create a waffle chart using REG as a field.
Access: To execute the practice, follow these steps:
• Go to the PRACTICE LABS tab on your LMS
• Click the START LAB button
• Click the LAUNCH LAB button to start the lab
Seaborn and Regression Plots
Seaborn

Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface to draw
attractive statistical graphics.

Advantages of seaborn:

Has built-in statistical Has functions to


Possesses built-in functions which reveal visualize matrices
themes for better hidden patterns in the of data
visualizations dataset
Regression Plots

A plot used to force fit independent variables against a dependent variable is a regression plot.

df_total

year total

1980 99137

1981 110563

1982 104271

1983 75550

1984 73417

. .
. .
Regression Plots

df_total

year total

1980 99137

1981 110563

1982 104271

1983 75550

1984 73417

. .
. .
Regression Plots

df_total

year total

1980 99137

1981 110563

1982 104271

1983 75550

1984 73417

. .
. .
Introduction to Folium
What Is Folium?

▪ Folium is a powerful Python library that helps you create several types of
Leaflet maps.

▪ It enables both binding of data to a map for choropleth visualizations as


well as passing visualizations as markers on the map.

▪ The library has a number of built-in tilesets from OpenStreetMap, Mapbox,


and Stamen and supports custom tilesets with Mapbox API keys.
Creating a World Map
Creating a Map of Canada
Map Styles: Stamen Toner
Map Styles: Stamen Terrain
Maps with Markers
Add a Marker
Add a Marker
Label the Marker
Kernel Density Estimate Plots
Kernel Density Estimate Plots

A density plot is a smooth and continuous version of a histogram estimated


from the data. It shows the distribution of a numerical variable.

Density plot

A kernel density estimate (KDE) is used


for visualizing the Probability Density of Individual
kernels
a continuous variable. It depicts the
probability density at different data
points in a continuous variable.

Data points
KDE with Pandas and Seaborn

A diabetes dataset and KDE plot to visualize the insights of the dataset.

About the dataset:

Target: malignant(0), benign(1)


Dimensions: 569 rows × 31 columns
One-dimensional KDE Plot

Visualize the probability distribution of a sample against a single continuous attribute.

# importing the required libraries


from sklearn import datasets
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# setting up the DataFrame


cancer = datasets.load_breast_cancer()
cancer_df = pd.DataFrame(cancer.data, columns=['mean radius', 'mean
texture', 'mean perimeter', 'mean area',
'mean smoothness', 'mean compactness', 'mean concavity',
'mean concave points', 'mean symmetry', 'mean fractal dimension',
'radius error', 'texture error', 'perimeter error', 'area error',
'smoothness error', 'compactness error', 'concavity error',
'concave points error', 'symmetry error',
'fractal dimension error', 'worst radius', 'worst texture',
'worst perimeter', 'worst area', 'worst smoothness',
'worst compactness', 'worst concavity', 'worst concave points',
'worst symmetry', 'worst fractal dimension'])
cancer_df['Target'] = cancer.target
One-dimensional KDE Plot

Visualize the probability distribution of a sample against a single continuous attribute.

cancer_df['Target'].replace([0], 'malignant',
inplace=True)
cancer_df['Target'].replace([1], 'benign',
inplace=True)

#Plotting the KDE plot


sns.kdeplot(cancer_df.loc[(cancer_df['Target']==
'malignant'),
'mean radius'], color='b',
shade=True, Label='malignant')
plt.xlabel('mean radius')
plt.ylabel('Probability density')
One-dimensional KDE Plot

Visualize the probability distribution of multiple samples in a single plot.

#Plotting the KDE plot


sns.kdeplot(cancer_df.loc[(cancer_df['Target']==
'malignant'),
'mean radius'], color='b',
shade=True, Label='malignant')
sns.kdeplot(cancer_df.loc[(cancer_df['Target']==
'benign'),
'mean radius'], color='r',
shade=True, Label='benign')
plt.xlabel('mean radius')
plt.ylabel('Probability density')
Two-dimensional KDE Plot

Visualize the probability distribution of a sample against multiple


continuous attributes.

# Setting up the samples


malignant =
cancer_df.query("Target=='malignant'")
benign = cancer_df.query("Target=='benign'")

# Plotting the KDE Plot


sns.kdeplot(malignant['mean radius'],
malignant['mean texture'],
color='r', shade=True,
Label='malignant',
cmap="Reds", shade_lowest=False)
Two-dimensional KDE Plot

visualize the probability distribution of multiple samples in a single plot.

# Plotting the KDE Plot


sns.kdeplot(malignant['mean radius'],
malignant['mean texture'],
color='r', shade=True,
Label='malignant',
cmap="Reds", shade_lowest=False)
# Plotting the KDE Plot
sns.kdeplot(benign['mean radius'],
benign['mean texture'],
color='b', shade=True,
Label='benign',
cmap="Blues", shade_lowest=False)
Analyzing Variables Individually
Types of Variables

There are two types of variables: numerical variables and categorical variables.

Numerical variables: variables for which the values are numbers


Categorical variables: variables for which the values are categories

Variable

Numeric Categorical

Continuous Discrete Original Nominal


Analyzing Variables Individually

Redefining the housing DataFrame:

Using the shape attribute to see the size of the new DataFrame
Understanding the Main Variable

Let’s understand the main variable, the SalePrice of the housing dataset.

The first thing to do with a categorical variable is to know their descriptive statistics:

Range of values for the variable


Understanding the Main Variable

Let’s understand the variable visually with a histogram:


Key Takeaways

You are now able to:

Explain data visualization and its importance in today’s world

Understand why Python is considered one of the best data


visualization tools

Describe matplotlib and its data visualization features in


Python

List the types of plots and the steps involved in creating


these plots
Knowledge Check
Knowledge
Check
Which of the following methods is used to set the title?
1

a. Plot()

b. Plt.title()

c. Plot.title()

d. Title()
Knowledge
Check
Which of the following methods is used to set the title?
1

a. Plot()

b. Plt.title()

c. Plot.title()

d. Title()

The correct answer is b

Plt.title() is used to set the title.


Knowledge
Check
Which of the following methods is used to adjust the distances between the subplots?
2

a. plot.subplots_adjust()

b. plt.subplots_adjust()

c. subplots_adjust()

d. plt.subplots.adjust()
Knowledge
Check
Which of the following methods is used to adjust the distances between the subplots?
2

a. plot.subplots_adjust()

b. plt.subplots_adjust()

c. subplots_adjust()

d. plt.subplots.adjust()

The correct answer is b

plt.subplots_adjust() used to adjust the distances between the subplots.


Knowledge
Check
Which of the following libraries needs to be imported to display the plot on Jupyter notebook?
3

a. %matplotlib

b. %matplotlib inline

c. import matplotlib

d. import style
Knowledge
Check
Which of the following libraries needs to be imported to display the plot on Jupyter notebook?
3

a. %matplotlib

b. %matplotlib inline

c. import matplotlib

d. import style

The correct answer is b

To display the plot on Jupyter notebook “import‘%matplotlib inline.”


Knowledge
Check
Which of the following keywords is used to decide the transparency of the plot line?
4

a. Legend

b. Alpha

c. Animated

d. Annotation
Knowledge
Check
Which of the following keywords is used to decide the transparency of the plot line?
4

a. Legend

b. Alpha

c. Animated

d. Annotation

The correct answer is c

Alpha decides the line transparency in line properties while plotting line plot/ chart.
Knowledge
Check
Which of the following plots is used to represent data in a two-dimensional manner?
5

a. Histogram

b. Heat Map

c. Pie Chart

d. Scatter Plot
Knowledge
Check
Which of the following plots is used to represent data in a two-dimensional manner?
5

a. Histogram

b. Heat Map

c. Pie Chart

d. Scatter Plot

The correct answer is b

Heat Maps are used to represent data in a two-dimensional manner.


Visualize the Sales Data

Problem Statement:
BigMart is one of the biggest retailers of Europe and has operations across
multiple countries. You are a Data Analyst in the IT team of BigMart.
Invoice and SKU wise sales data for the years 2010 and 2011 is shared with
you. You need to prepare meaningful charts to showcase the various sales
trends for 2010 and 2011, to the top management.
Instructions to perform the assignment:
Download the dataset “BigMartSalesData.csv”. Use the data provided to
create visualizations of the trends.
Visualize the Sales Data

Steps to Perform:
• Plot Total Sales Per Month for the year 2011. How has the total sales
increased over the months? Which month has the lowest sales?
• Plot Total Sales Per Month for the year 2011 in a bar chart. Is bar chart
better to visualize than a simple plot?
• Plot a pie chart for the year 2010, country wise. Which country contributes
the highest and lowest towards sales? Create a pandas series with indexes
of the country-wise sales.
Thank You

You might also like