Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
21 views

Data Visualization

Uploaded by

sahilww26
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Data Visualization

Uploaded by

sahilww26
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 62

Data Visualization with Python

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 1
Forerunners in Technical Education
Contents
• Data Science Pipeline
• What is Data Visualization and Why is it so important?
• Categories of Data Visualization
• Characteristics of Data Visualization
• Advantages
• Data Visualization Libraries
• Matplotlib
• Seaborn

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 2
Forerunners in Technical Education
Data Science
•“Data Science is the science of analysing raw data using statistics and
machine learning techniques with the purpose of drawing conclusions
about that information”

•Data Science Pipeline : “a set of actions which changes the raw (and confusing)
data from various sources (surveys, feedback, list of purchases, votes, etc.), to
an understandable format so that we can store it and use it for analysis.”

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 3
Forerunners in Technical Education
Data Science Pipeline

The raw data undergoes different stages within a pipeline, which


are:

1.Fetching/Obtaining the Data


2.Scrubbing/Cleaning the Data
3.Data Visualization
4.Modeling the Data
5.Interpreting the Data
6.Revision

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 4
Forerunners in Technical Education
Data Visualization

• Graphical representation of information and data in a pictorial or


graphical format(Example: charts, graphs, and maps).
• Provide an accessible way to see and understand trends, patterns in
data, and outliers.
• Data visualization tools and technologies are essential to analyzing
massive amounts of information and making data-driven decisions.
• The concept of using pictures is to understand data that has been
used for centuries.
• General types of data visualization are Charts, Tables, Graphs, Maps,
Dashboards.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 5
Forerunners in Technical Education
Categories of Data Visualization

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 6
Forerunners in Technical Education
Categories of Data Visualization
Numerical Data :
•Numerical data is also known as Quantitative data.
•Numerical data is any data where data generally represents amount such as height,
weight, age of a person, etc.
•Numerical data visualization is easiest way to visualize data.
•It is generally used for helping others to digest large data sets and raw numbers in a
way that makes it easier to interpret into action.
•Numerical data is categorized into two categories :
1. Continuous Data –
It can be narrowed or categorized (Example: Height measurements).
2. Discrete Data –
This type of data is not “continuous” (Example: Number of cars).
•The type of visualization techniques that are used to represent numerical data
visualization is Charts and Numerical Values.
•Examples are Pie Charts, Bar Charts, Averages, Scorecards, etc.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 7
Forerunners in Technical Education
Categories of Data Visualization
Categorical Data :
•Categorical data is also known as Qualitative data.
•Categorical data is any data where data generally represents groups.
•It simply consists of categorical variables that are used to represent characteristics
such as a person’s ranking, a person’s gender, etc.
•Categorical data visualization is all about depicting key themes, establishing
connections, and lending context.
•Categorical data is classified into three categories :
1. Binary Data –
In this, classification is based on positioning (Example: Agrees or Disagrees).
2. Nominal Data –
In this, classification is based on attributes (Example: Male or Female).
3. Ordinal Data –
In this, classification is based on ordering of information (Example: Timeline or
processes).
The type of visualization techniques that are used to represent categorical data is Graphics,
Diagrams, and Flowcharts. Examples are Word clouds, Sentiment Mapping, Venn Diagram, etc.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 8
Forerunners in Technical Education
Characteristics of Effective Graphical Visual :

• It shows or visualizes data very clearly in an understandable manner.


• It encourages viewers to compare different pieces of data.
• It closely integrates statistical and verbal descriptions of data set.
• It grabs our interest, focuses our mind, and keeps our eyes on message
as human brain tends to focus on visual data more than written data.
• It also helps in identifying area that needs more attention and
improvement.
• Using graphical representation, a story can be told more efficiently. Also,
it requires less time to understand picture than it takes to understand
textual data.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 9
Forerunners in Technical Education
Advantages of Data Visualization

1. Better Agreement:
2. A Superior Method:
3. Simple Sharing of Data:
4. Deals Investigation:
5. Discovering Relations Between Occasions:
6. Investigating Openings and Patterns:

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 10
Forerunners in Technical Education
Why is Data Visualization Important?
1. Data Visualization Discovers the Trends in Data
2. Data Visualization Provides a Perspective on the Data
3. Data Visualization Puts the Data into the Correct Context
4. Data Visualization Saves Time
5. Data Visualization Tells a Data Story

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 11
Forerunners in Technical Education
Top Data Visualisation Tools
The following are the few Data Visualization Tools which are extensively
used in market:

•Tableau
•Looker
•Zoho Analytics
•Sisense
•IBM Cognos Analytics
•Qlik Sense
•Domo
•Microsoft Power BI
•Klipfolio
•SAP Analytics Cloud

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 12
Forerunners in Technical Education
Data Visualisation Libraries in Python
• Matplotlib
• Plotly
• ggplot
• Seaborn
• Altair
• Geoplotlib
• Bokeh

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 13
Forerunners in Technical Education
Matplotlib
• Matplotlib is easy to use and an amazing visualizing library in Python.
• It is built on NumPy arrays and designed to work with the broader SciPy stack and
consists of several plots like line, bar, scatter, histogram, etc.

• Using Matplotlib with Jupyter Notebook:


• The Jupyter Notebook is an open-source web application that allows you to
create and share documents that contain live code, equations, visualizations and
narrative text.
• Uses include data cleaning and transformation, numerical simulation, statistical
modeling, data visualization, machine learning, and much more.

• Matplotlib is one of the most popular Python packages used for data visualization.
• It is a cross-platform library for making 2D plots from data in arrays.
• To get started you just need to make the necessary imports, prepare some data,
and plotting of graph can be done with the help of the plot() function where as
show() function is used to show the plot.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 14
Forerunners in Technical Education
Matplotlib
Installation :
1.Install Matplotlib with pip

To install Matplotlib with pip, open a terminal window and type:

pip install matplotlib

2. Install Matplotlib with the Anaconda Prompt

To install Matplotlib, open the Anaconda Prompt and type:

conda install matplotlib

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 15
Forerunners in Technical Education
Simple Plot using Matplotlib
Example : Plotting two lists containing the X, Y coordinates for the plot.

import matplotlib.pyplot as plt


# initializing the data
x = [10, 20, 30, 40]
y = [20, 30, 40, 50]
# plotting the data
plt.plot(x, y)
# Adding the title
plt.title("Simple Plot")
# Adding the labels
plt.ylabel("y-axis")
plt.xlabel("x-axis")
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 16
Forerunners in Technical Education
Pyplot
• Pyplot is a Matplotlib module that provides a MATLAB like interface.
• Pyplot provides functions that interact with the figure i.e. creates a figure, decorates
the plot with labels, creates plotting area in a figure.

Syntax: matplotlib.pyplot.plot(*args, scalex=True,


scaley=True, data=None, **kwargs)

Example:

# Python program to show pyplot module


import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.axis([0, 6, 0, 20])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 17
Forerunners in Technical Education
Pyplot Parameters
• This function accepts parameters that enables us to set axes scales and format the graphs.
• These parameters are mentioned below :-
 plot(x, y): plot x and y using default line style and color.
 plot.axis([xmin, xmax, ymin, ymax]): scales the x-axis and y-axis from minimum to
maximum values
 plot.(x, y, color=’green’, marker=’o’, linestyle=’dashed’, linewidth=2, markersize=12):
x and y co-ordinates are marked using circular markers of size 12 and green color line with
— style of width 2
 plot.xlabel(‘X-axis’): names x-axis
 plot.ylabel(‘Y-axis’): names y-axis
 plot(x, y, label = ‘Sample line ‘) plotted Sample Line will be displayed as a legend

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 18
Forerunners in Technical Education
Pyplot Example
• Example: Electricity Power Consumption datasets of India and Bangladesh. Here, we are
using Google Public Data as a data source.

• Example 1: Linear plot


import matplotlib.pyplot as plt

# year contains the x-axis values and e-india & e-bangladesh are the
#y-axis values for plotting

year = [1972, 1982, 1992, 2002, 2012]


e_india = [100.6, 158.61, 305.54, 394.96, 724.79]
e_bangladesh = [10.5, 25.21, 58.65, 119.27, 274.87]

# plotting of x-axis(year) and y-axis(power consumption) with


#different colored labels of two countries

plt.plot(year, e_india, color ='orange', label ='India')

plt.plot(year, e_bangladesh, color ='g', label ='Bangladesh')

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 19
Forerunners in Technical Education
Pyplot Example. contd
# naming of x-axis and y-axis
plt.xlabel('Years')
plt.ylabel('Power consumption in kWh')

# naming the title of the plot


plt.title('Electricity consumption per capita of India and
Bangladesh')

plt.legend()
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 20
Forerunners in Technical Education
Example 2: Linear Plot with line formatting
import matplotlib.pyplot as plt

year = [1972, 1982, 1992, 2002, 2012]


e_india = [100.6, 158.61, 305.54,394.96, 724.79]

e_bangladesh = [10.5, 25.21, 58.65,119.27, 274.87]

# formatting of line style and plotting of co-ordinates


plt.plot(year, e_india, color ='orange',marker ='o', markersize = 12, label
='India')

plt.plot(year, e_bangladesh, color ='g',linestyle ='dashed', linewidth = 2,label


='Bangladesh')

plt.xlabel('Years')
plt.ylabel('Power consumption in kWh')

plt.title('Electricity consumption per \


capita of India and Bangladesh')

plt.legend()
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 21
Forerunners in Technical Education
Example 2: Linear Plot with line formatting
Output:

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 22
Forerunners in Technical Education
Matplotlib
Matplotlib take care of the creation of inbuilt defaults like Figure and Axes.

•Figure:
• top-level container for all the plots means it is the overall window or page on which
everything is drawn.
• box-like container that can hold one or more axes.

•Axes:
• most basic and flexible component for creating sub-plots.
• A given figure may contain many axes but a given axes can only be in one figure.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 23
Forerunners in Technical Education
Figure Class
Syntax:
class matplotlib.figure.Figure(figsize=None, dpi=None, facecolor=None,
edgecolor=None, linewidth=0.0, frameon=None, subplotpars=None,
tight_layout=None, constrained_layout=None)

Example 1: Python program to show pyplot module

import matplotlib.pyplot as plt


from matplotlib.figure import Figure
# Creating a new figure with width = 5 inches
# and height = 4 inches
fig = plt.figure(figsize =(5, 4))
# Creating a new axes for the figure
ax = fig.add_axes([1, 1, 1, 1])
# Adding the data to be plotted
ax.plot([2, 3, 4, 5, 5, 6, 6], [5, 7, 1, 3, 4, 6 ,8])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 24
Forerunners in Technical Education
Figure Class
Example 2: Creating multiple plots

import matplotlib.pyplot as plt


from matplotlib.figure import Figure
# Creating a new figure with width = 5 inches
# and height = 4 inches
fig = plt.figure(figsize =(5, 4))
# Creating first axes for the figure
ax1 = fig.add_axes([1, 1, 1, 1])
# Creating second axes for the figure
ax2 = fig.add_axes([1, 0.5, 0.5, 0.5])
# Adding the data to be plotted
ax1.plot([2, 3, 4, 5, 5, 6, 6], [5, 7, 1, 3, 4, 6 ,8])
ax2.plot([1, 2, 3, 4, 5], [2, 3, 4, 5, 6])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 25
Forerunners in Technical Education
Axes Class
Syntax: matplotlib.pyplot.axis(*args, emit=True, **kwargs)

Example 1: Python program to show pyplot module

import matplotlib.pyplot as plt


from matplotlib.figure import Figure
# Creating the axes object with argument as
# [left, bottom, width, height]
ax = plt.axes([1, 1, 1, 1])

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 26
Forerunners in Technical Education
Axes Class
Example 2: Python program to show pyplot module

import matplotlib.pyplot as plt

from matplotlib.figure import Figure


fig = plt.figure(figsize = (5, 4))
# Adding the axes to the figure
ax = fig.add_axes([1, 1, 1, 1])
# plotting 1st dataset to the figure
ax1 = ax.plot([1, 2, 3, 4], [1, 2, 3, 4])
# plotting 2nd dataset to the figure
ax2 = ax.plot([1, 2, 3, 4], [2, 3, 4, 5])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 27
Forerunners in Technical Education
Setting Limits and Tick labels
•Matplotlib automatically sets the values and the markers(points) of the x and y axis, however,
it is possible to set the limit and markers manually.
•set_xlim() and set_ylim() functions are used to set the limits of the x-axis and y-axis
respectively.
•Similarly, set_xticklabels() and set_yticklabels() functions are used to set tick labels.

Example: Python program to show pyplot module


import matplotlib.pyplot as plt
from matplotlib.figure import Figure
x = [3, 1, 3]
y = [3, 2, 1]
# Creating a new figure with width = 5 inches and
# height = 4 inches
fig = plt.figure(figsize =(5, 4))
# Creating first axes for the figure
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
# Adding the data to be plotted
ax.plot(x, y)
ax.set_xlim(1, 2)
ax.set_xticklabels(("one", "two", "three", "four", "five", "six"))
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 28
Forerunners in Technical Education
Multiple Plots
Method 1: Using the add_axes() method
•The add_axes() method figure module of matplotlib library is used to add an axes to the
figure.
•The add_axes() method adds the plot in the same figure by creating another axes object.

Syntax: add_axes(self, *args, **kwargs)

Example: Python program to show pyplot module


import matplotlib.pyplot as plt
from matplotlib.figure import Figure
# Creating a new figure with width = 5 inches and height = 4 inches
fig = plt.figure(figsize =(5, 4))
# Creating first axes for the figure
ax1 = fig.add_axes([0.1, 0.1, 0.8, 0.8])
# Creating second axes for the figure
ax2 = fig.add_axes([0.5, 0.5, 0.3, 0.3])
# Adding the data to be plotted
ax1.plot([5, 4, 3, 2, 1], [2, 3, 4, 5, 6])
ax2.plot([1, 2, 3, 4, 5], [2, 3, 4, 5, 6])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 29
Forerunners in Technical Education
Multiple Plots
Method 2: Using subplot() method.

This method adds another plot to the current figure at the specified grid position.

Syntax: subplot(nrows, ncols, index, **kwargs)


subplot(pos, **kwargs)
subplot(ax)

Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3]
y = [3, 2, 1]
z = [1, 3, 1]
plt.figure()
plt.subplot(121)
plt.plot(x, y)
plt.subplot(122)
plt.plot(z, y)

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 30
Forerunners in Technical Education
Multiple Plots
Method 3: Using subplots() method
This function is used to create figure and multiple subplots at the same time.
Syntax: matplotlib.pyplot.subplots(nrows=1, ncols=1, sharex=False,
sharey=False, squeeze=True, subplot_kw=None, gridspec_kw=None,
**fig_kw)

Example:
import matplotlib.pyplot as plt
# Creating the figure and subplots
# according the argument passed
fig, axes = plt.subplots(1, 2)
# plotting the data in the 1st subplot
axes[0].plot([1, 2, 3, 4], [1, 2, 3, 4])
# plotting the data in the 1st subplot only
axes[0].plot([1, 2, 3, 4], [4, 3, 2, 1])
# plotting the data in the 2nd subplot only
axes[1].plot([1, 2, 3, 4], [1, 1, 1, 1])

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 31
Forerunners in Technical Education
Multiple Plots
Method 4: Using subplot2grid() method
•This function give additional flexibility in creating axes object at a specified location inside a
grid.
•It also helps in spanning the axes object across multiple rows or columns.

Syntax: plt.subplot2grid(shape, location, rowspan, colspan)

Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3]
y = [3, 2, 1]
z = [1, 3, 1]
# adding the subplots
axes1 = plt.subplot2grid ((7, 1), (0, 0), rowspan = 2, colspan = 1)
axes2 = plt.subplot2grid ((7, 1), (2, 0), rowspan = 2, colspan = 1)
axes3 = plt.subplot2grid ((7, 1), (4, 0), rowspan = 2, colspan = 1)
# plotting the data
axes1.plot(x, y)
axes2.plot(x, z)
axes3.plot(z, y)

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 32
Forerunners in Technical Education
What is a Legend?
• An area describing the elements of the graph.
• reflects the data displayed in the graph’s Y-axis.
• appears as the box containing a small sample of each color on the graph and a small
description of what this data means.

Syntax: matplotlib.pyplot.legend([“blue”, “green”], bbox_to_anchor=(0.75,


1.15), ncol=2)

• Creating the Legend : A Legend can be created using the legend() method.
• The attribute Loc in the legend() is used to specify the location of the legend.
• The default value of loc is loc=”best” (upper left).
• The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the
corresponding corner of the axes/figure.
• The attribute bbox_to_anchor=(x, y) of legend() function is used to specify the coordinates of
the legend, and the attribute ncol represents the number of columns that the legend has. Its
default value is 1.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 33
Forerunners in Technical Education
What is a Legend?
Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3]
y = [3, 2, 1]
plt.plot(x, y)
plt.plot(y, x)
# Adding the legends
plt.legend(["blue", "orange"])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 34
Forerunners in Technical Education
Creating Different Types of Plots
1. Line Graph : Line Chart is used to represent a relationship between two data X
and Y on a different axis. It is plotted using the plot() function.

Example:

import matplotlib.pyplot as plt


# data to display on plots
x = [3, 1, 3]
y = [3, 2, 1]
# This will plot a simple line chart
# with elements of x as x axis and y
# as y axis
plt.plot(x, y)
plt.title("Line Chart")
# Adding the legends
plt.legend(["Line"])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 35
Forerunners in Technical Education
Creating Different Types of Plots
2. Bar chart
•represents the category of data with rectangular bars with lengths and heights that
is proportional to the values which they represent.
•The bar plots can be plotted horizontally or vertically.
•A bar chart describes the comparisons between the discrete categories.
•It can be created using the bar() method.

Syntax: plt.bar(x, height, width, bottom, align)

import matplotlib.pyplot as plt


# data to display on plots
x = [3, 1, 3, 12, 2, 4, 4]
y = [3, 2, 1, 4, 5, 6, 7]
# This will plot a simple bar chart
plt.bar(x, y)
# Title to the plot
plt.title("Bar Chart")
# Adding the legends
plt.legend(["bar"])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 36
Forerunners in Technical Education
Creating Different Types of Plots
3. Histograms
•used to represent data in the form of some groups.
•It is a type of bar plot where the X-axis represents the bin ranges while the Y-axis
gives information about frequency.
•To create a histogram the first step is to create a bin of the ranges, then distribute
the whole range of the values into a series of intervals, and count the values which
fall into each of the intervals.
•Bins are clearly identified as consecutive, non-overlapping intervals of variables.
•The hist() function is used to compute and create histogram of x.

Syntax: matplotlib.pyplot.hist(x, bins=None, range=None, density=False,


weights=None, cumulative=False, bottom=None, histtype=’bar’,
align=’mid’, orientation=’vertical’, rwidth=None, log=False,
color=None, label=None, stacked=False, \*, data=None, \*\*kwargs)

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 37
Forerunners in Technical Education
Histograms
Example:

import matplotlib.pyplot as plt


# data to display on plots
x = [1, 2, 3, 4, 5, 6, 7, 4]
# This will plot a simple histogram
plt.hist(x, bins = [1, 2, 3, 4, 5, 6, 7])
# Title to the plot
plt.title("Histogram")
# Adding the legends
plt.legend(["bar"])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 38
Forerunners in Technical Education
Bin Size in Matplotlib Histogram
• The towers or bars of a histogram are called bins.
• The height of each bin shows how many values from that data fall into that range.
• Width of each bin is = (max value of data – min value of data) / total number of bins
• The default value of the number of bins to be created in a histogram is 10.
• However, we can change the size of bins using the parameter bins
in matplotlib.pyplot.hist().

Method 1 :
We can pass an integer in bins stating how many bins/towers to be created in the histogram
and the width of each bin is then changed accordingly.

Example 1 :
import matplotlib.pyplot as plt

height = [189, 185, 195, 149, 189, 147, 154,


174, 169, 195, 159, 192, 155, 191,
153, 157, 140, 144, 172, 157, 181,
182, 166, 167]

plt.hist(height, edgecolor="red", bins=5)


plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 39
Forerunners in Technical Education
Bin Size in Matplotlib Histogram
Method 2 :
•We can also pass a sequence of int or float in the parameter bins.
•In which the elements of the sequence are the edges/boundaries of the bins.
•In this method, the bin width may vary for each bin.
•Suppose a sequence [1,2,3,4,5] is assigned to bins and then the number of bins made will be
4 i.e the first bin will be [1,2) (including 1, but excluding 2) second bin will be [2,3) (including 2,
but excluding 3) third bin will be [3,4) (including 3, but excluding 4).
•However, in the last bin [4,5] both 4 and 5 are included.
•Hence, all the bins are half-open [a, b) but the last bin is closed [a, b].
•For such cases, the width of each bin is equal.
•If the difference between each element of the sequence assigned to bins is not equal then the
width of each bin is different, hence the bin width depends on the sequence.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 40
Forerunners in Technical Education
Bin Size in Matplotlib Histogram
Method 2 :

Example : Equal bin width

import matplotlib.pyplot as plt

marks = [1, 2, 3, 2, 1, 2, 3,
2,
1, 4, 5, 4, 3, 2, 5,
4,
5, 4, 5, 3, 2, 1, 5]

plt.hist(marks, bins=[1, 2, 3,
4, 5], edgecolor="black")
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 41
Forerunners in Technical Education
Create a cumulative histogram in Matplotlib
Cumulative frequency: Cumulative frequency analysis is the analysis of the
frequency of occurrence of values. It is the total of a frequency and all frequencies
so far in a frequency distribution.

Example : X contains [1,2,3,4,5] then the cumulative frequency for x is [1,3,6,10,15].


Explanation: [1,1+2,1+2+3,1+2+3+4,1+2+3+4+5]

In Python, we can generate a histogram with dataframe.hist, and cumulative


frequency stats.cumfreq() histogram.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 42
Forerunners in Technical Education
Create a cumulative histogram in Matplotlib
Example 1:

import matplotlib.pyplot as plt


import numpy as np
from scipy import stats
x = [10, 40, 20, 10, 30, 10, 56, 45]
res = stats.cumfreq(x, numbins=4,defaultreallimits=(1.5, 5))
rng = np.random.RandomState(seed=12345)
samples = stats.norm.rvs(size=1000,random_state=rng)
res = stats.cumfreq(samples,numbins=25)
x = res.lowerlimit + np.linspace(0, res.binsize*res.cumcount.size,
res.cumcount.size)
fig = plt.figure(figsize=(10, 4))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.hist(samples, bins=25,color="green")
ax1.set_title('Histogram')
ax2.bar(x, res.cumcount, width=4, color="blue")
ax2.set_title('Cumulative histogram')
ax2.set_xlim([x.min(), x.max()])
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 43
Forerunners in Technical Education
Creating Different Types of Plots
4. Scatter Plots
•Scatter plots are used to observe relationship between variables and uses dots to
represent the relationship between them.
•The scatter() method in the matplotlib library is used to draw a scatter plot.

Syntax: matplotlib.pyplot.scatter(x_axis_data, y_axis_data, s=None,


c=None, marker=None, cmap=None, vmin=None, vmax=None, alpha=None,
linewidths=None, edgecolors=None)

Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3, 12, 2, 4, 4]
y = [3, 2, 1, 4, 5, 6, 7]
# This will plot a simple scatter chart
plt.scatter(x, y)
# Adding legend to the plot
plt.legend("A")
# Title to the plot
plt.title("Scatter chart")
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 44
Forerunners in Technical Education
Creating Different Types of Plots
5. Pie Charts
•A Pie Chart is a circular statistical plot that can display only one series of data.
•The area of the chart is the total percentage of the given data.
•The area of slices of the pie represents the percentage of the parts of the data.
•The slices of pie are called wedges.
•The area of the wedge is determined by the length of the arc of the wedge.
•It can be created using the pie() method.
Syntax: matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None,
autopct=None, shadow=False)
Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [1, 2, 3, 4]
# this will explode the 1st wedge
# i.e. will separate the 1st wedge
# from the chart
e =(0.1, 0, 0, 0)
# This will plot a simple pie chart
plt.pie(x, explode = e)
# Title to the plot
plt.title("Pie chart")
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 45
Forerunners in Technical Education
Creating Different Types of Plots
6. 3D Plots

Example 1:

import matplotlib.pyplot as plt


# Creating the figure object
fig = plt.figure()
# keeping the projection = 3d
# ctreates the 3d plot
ax = plt.axes(projection = '3d’)

Example 2:

import matplotlib.pyplot as plt


x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
z = [1, 8, 27, 64, 125]
# Creating the figure object
fig = plt.figure()
# keeping the projection = 3d
# ctreates the 3d plot
ax = plt.axes(projection = '3d')
ax.plot3D(z, y, x)

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 46
Forerunners in Technical Education
Creating Different Types of Plots
7. Box and Whisker Plot

•A Box Plot is also known as Whisker plot is created to display the summary of the set of data values having
properties like minimum, first quartile, median, third quartile and maximum.
•In the box plot, a box is created from the first quartile to the third quartile, a vertical line is also there which
goes through the box at the median.
•Here x-axis denotes the data to be plotted while the y-axis shows the frequency distribution.
•The matplotlib.pyplot module of matplotlib library provides boxplot() function with the help of which we can
create box plots.

Syntax: matplotlib.pyplot.boxplot(data, notch=None, vert=None, patch_artist=None,


widths=None)

import matplotlib.pyplot as plt


import numpy as np
np.random.seed(10)
data = np.random.normal(100, 20, 200)
fig = plt.figure(figsize =(10, 7))
plt.boxplot(data)
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 47
Forerunners in Technical Education
Customizing BoxPlots
•The matplotlib.pyplot.boxplot() provides endless customization possibilities to the box plot.
•The notch = True attribute creates the notch format to the box plot, patch_artist = True fills the boxplot with
colors, we can set different colors to different boxes.
•The vert = 0 attribute creates horizontal box plot. labels takes same dimensions as the number data sets.

# Import libraries
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)

data_1 = np.random.normal(100, 10, 200)


data_2 = np.random.normal(90, 20, 200)
data_3 = np.random.normal(80, 30, 200)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4]

fig = plt.figure(figsize =(10, 7))

ax = fig.add_axes([0, 0, 1, 1])

bp = ax.boxplot(data)
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 48
Forerunners in Technical Education
Working with Images
• The image module in matplotlib library is used for working with images in Python.
• imread = used to read images
• imshow = used to display the image.

# importing required libraries


import matplotlib.pyplot as plt
import matplotlib.image as img
# reading the image
testImage =
img.imread(‘flower.png')
# displaying the image
plt.imshow(testImage)

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 49
Forerunners in Technical Education
Python Seaborn
• Seaborn is a library mostly used for statistical plotting in Python.
• It is built on top of Matplotlib and provides beautiful default styles and color palettes to
make statistical plots more attractive.

Installation: pip install seaborn

Note: Seaborn has the following dependencies –

Python 2.7 or 3.4+


numpy
scipy
pandas
matplotlib

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 50
Forerunners in Technical Education
Python Seaborn
# importing packages
import seaborn as sns

# loading dataset
data = sns.load_dataset("iris")

# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 51
Forerunners in Technical Education
Using Seaborn with Matplotlib
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("iris")

# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)

# setting the title using Matplotlib


plt.title('Title using Matplotlib
Function')

plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 52
Forerunners in Technical Education
Using Seaborn with Matplotlib
Example : Setting the xlim and ylim

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("iris")

# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)

# setting the x limit of the plot


plt.xlim(5)

plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 53
Forerunners in Technical Education
Customizing Seaborn Plots
Changing Figure Aesthetic : set_style() method is used to set the aesthetic of the plot. It
means it affects things like the color of the axes, whether the grid is active or not, or other
aesthetic elements. There are five themes available in Seaborn.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("iris")

# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)

# changing the theme to dark


sns.set_style("dark")
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 54
Forerunners in Technical Education
Color Palette
• color_palette() method is used to give colors to the plot.
• palplot() is used to deal with the color palettes and plots the color
palette as a horizontal array
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# current colot palette


palette = sns.color_palette()

# plots the color palette as a


# horizontal array
sns.palplot(palette)

plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 55
Forerunners in Technical Education
Diverging Color Palette
• This type of color palette uses two different colors where each color depicts
different points ranging from a common point in either direction.
• Consider a range of -10 to 10 so the value from -10 to 0 takes one color and
values from 0 to 10 take another.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# current colot palette


palette =
sns.color_palette('PiYG', 11)

# diverging color palette


sns.palplot(palette)

plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 56
Forerunners in Technical Education
Sequential Color Palette
A sequential palette is used where the distribution ranges from a lower value to a
higher value. To do this add the character ‘s’ to the color passed in the color
palette.

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# current colot palette


palette = sns.color_palette('Greens', 11)

# sequential color palette


sns.palplot(palette)

plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 57
Forerunners in Technical Education
Setting the default Color Palette
set_palette() method is used to set the default color palette for all the plots.
The arguments for both color_palette() and set_palette() is same.
set_palette() changes the default matplotlib parameters
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("iris")

def plot():
sns.lineplot(x="sepal_length", y="sepal_width", data=data)

# setting the default color palette


sns.set_palette('vlag')
plt.subplot(211)

# plotting with the color palette


# as vlag
plot()
# setting another default color palette
sns.set_palette('Accent')
plt.subplot(212)
plot()
plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 58
Forerunners in Technical Education
Multiple plots with Seaborn
Method 1: Using FacetGrid() method
Syntax: seaborn.FacetGrid( data, \*\*kwargs)

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("iris")

plot = sns.FacetGrid(data, col="species")


plot.map(plt.plot, "sepal_width")

plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 59
Forerunners in Technical Education
Multiple plots with Seaborn
Method 2: Using PairGrid() method
Syntax: seaborn.PairGrid( data, \*\*kwargs)

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("flights")

plot = sns.PairGrid(data)
plot.map(plt.plot)

plt.show()

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 60
Forerunners in Technical Education
Creating different plots with Seaborn
1. Relational Plots:
• Relational plots are used for visualizing the statistical relationship between the
data points.
• Visualization is necessary because it allows the human to see trends and
patterns in the data.
• The process of understanding how the variables in the dataset relate each other
and their relationships are termed as Statistical analysis.

2. Categorical Plots:
•Categorical Plots are used where we have to visualize relationship between two
numerical values.
•A more specialized approach can be used if one of the main variable
is categorical which means such variables that take on a fixed and limited
number of possible values.

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 61
Forerunners in Technical Education
Thank You..

Department of Computer Engineering and Information Technology,


College of Engineering Pune (COEP) 62
Forerunners in Technical Education

You might also like