Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
73 views

Data Visualization Using Python

Data Visualization Using Python

Uploaded by

Nikhita Nair
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Data Visualization Using Python

Data Visualization Using Python

Uploaded by

Nikhita Nair
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

📊

Data Visualization using Python


Data Visualization
Data Visualization is all about viewing or visualizing data in the form of
graphical plots, charts, figures, and animations.

Data Visualization is an effective way of communicating information to others.

Data Visualization is one of the steps in Data Science .

Effective Data Visualization


An effective visual can be created from data when you realize that you are actually
telling a story.

Steps to be followed for creating effective visualizations.

Don't bother much about the tool used for creating the visuals.

Define what to communicate before you look at data.

Identify the right chart or plot, which suits your story.

Create the visual and verify if it is aligned with your story.

Data Visualization with Python


There are many Python libraries used for Data Visualization .

Few popular libraries are:

matplotlib : It is the most widely used Python Data Visualization library.

seaborn : It is used for generating informative statistical graphics. It is


dependent on matplotlib .

Data Visualization using Python 1


: It is used for generating interactive plots, which can be accessed as
bokeh

JSON, HTML objects, or interactive web applications.

Matplotlib
matplotlib is the first data visualization library in Python and is widely used.
In this course you will learn:

Usage of matplotlib library in creating basic plots such as Line plot , Scatter

Plot , etc.

Creating multiple plots in a single figure.

Customizing plots using various styles.

Installing Matplotlib
matplotlib is a third party library and is not part of standard Python library.

You can easily install it using pip utility as shown in below expression.

pip install matplotlib

matplotlib is available directly in distributions such as Anaconda , and WinPython .

Loading matplotlib
matplotlib is loaded using import as shown in below expression

import matplotlib

You can find the version of matplotlib with the below command.

print(matplotlib.__version__)

If matplotlib is already installed and you want to upgrade it, run the below
command on command line prompt.

Data Visualization using Python 2


pip install --upgrade matplotlib

About Matplotlib
In matplotlib , everything is organized in a hierarchy.

At the top level, it consists of matplotlib.pyplot module.

pyplot is used only for few activities such as figure creation.

Through the created figures, one or more axes/subplot objects are created.

The axes objects are further used for doing many plotting actions.

In next topic, you will understand the anatomy of a figure.

Parts of a Matplotlib Figure


In this topic, you will get introduced to essential parts of a Matplotlib Figure.

Figure : Whole area chosen for plotting.

Axes : Area where data is plotted.

Axis : Number-line like objects, which define graph limits.

Artist : Every element on the figure is an artist.

Sample Matplotlib Figure

Data Visualization using Python 3


A sample matplotlib figure with labelled parts is shown above.

Figure
Figure refers to the whole area or page on which everything is drawn.

It includes Axes, Axis, and other Artist elements.

Creating a Figure
A figure is created using figure function of pyplot module, as shown below.

import matplotlib.pyplot as plt


fig = plt.figure()

Executing the above code doesn't display any figure.

Data Visualization using Python 4


You should explicitly tell pyplot to display it.

NOTE : The code snippets shown in this course assumes that you have
imported pylot as plt .

Viewing a Figure
show method can be used to view the created figure as shown below.

fig = plt.figure()
plt.show()

Output

<matplotlib.figure.Figure at 0x185417f0>

The output simply shows the figure object.

You will be able to view a picture only when a figure contains at least
one Axes element

Adjusting Figure Size

The default width and height of a figure are 6 and 4 inches respectively.

Data Visualization using Python 5


You can change the size of a figure using figsize argument.

For example, the expression fig = plt.figure(figsize=(8,6)) generates a figure


having 8 inches width and 6 inches height.

Axes
An Axes is the region of the figure, available for plotting data.

An Axes object is associated with only one Figure .

A Figure can contain one or more number of Axes elements.

An Axes contains two Axis objects in case of 2D plots and three Axis objects
in case of 3D plots.

Creating an Axes

An Axes can be added to a figure using add_subplot methods.

Syntax

add_subplot(nrows, ncols, index)

Data Visualization using Python 6


When these argument values are less than 10, they all can be clubbed and
passed as a single three-digit number.

Hence, add_subplot(1, 1, 1) and add_subplot(111) are same.

The below code generates a figure with one axes, ax .

fig = plt.figure()
ax = fig.add_subplot(111)
plt.show()

Setting Title and Axis Labels

setmethod can be used on created axes, ax , to set various parameters such


as xlabel , ylabel and title .

Data Visualization using Python 7


fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='My First Plot',
xlabel='X-Axis', ylabel='Y-Axis',
xlim=(0, 5), ylim=(0,10))
plt.show()

Setting an attribute can also be done with functions of the


form set_<parameter_name> , as shown in below code.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set_title("My First Plot")
ax.set_xlabel("X-Axis");
ax.set_ylabel('Y-Axis')
ax.set_xlim([0,5]);
ax.set_ylim([0,10])
plt.show()

Plotting Data

Data Visualization using Python 8


plot is one of the functions used for plotting data points.

function is called on the created axes object,


plot ax , as shown in below
code snippet.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='My First Plot',
xlabel='X-Axis', ylabel='Y-Axis',
xlim=(0, 5), ylim=(0,10))
x = [1, 2, 3, 4]; y = [2, 4, 6, 8]
plt.plot(x, y)
plt.show()

Plotting data or setting attributes can also be done by calling functions


like plot , and title directly on plt .

This would plot the data on axes, which is active currently.

However, Explicit is better than implicit . Hence prefer former style of plotting.

fig = plt.figure(figsize=(8,6))

Data Visualization using Python 9


x = [1, 2, 3, 4]; y = [2, 4, 6, 8]
plt.plot(x, y)
plt.title('My First Plot')
plt.xlabel('X-Axis'); plt.ylabel('Y-Axis')
plt.xlim(0,5); plt.ylim(0,10)
plt.plot(x, y)
plt.show()

Adding a Legend

legend function is called on axes object ax to produce a legend.

The legend uses the label , provided to a line drawn using plot as shown in
below code.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='My First Plot',

Data Visualization using Python 10


xlabel='X-Axis', ylabel='Y-Axis',
xlim=(0, 5), ylim=(0,10))
x = [1, 2, 3, 4]; y = [2, 4, 6, 8]
plt.plot(x, y, label='linear-growth')
plt.legend()
plt.show()

Types of Plots
Data can be presented in a different number of plots.

In this topic, you will learn how to draw below-mentioned plots using matplotlib .

Line plot

Scatter plot

Bar plot

Pie plot

Histogram

Box plot

Line Plot
Line Plot is used to visualize a trend in data.

Line Plot is also used to compare two variables.

Line Plots are simple and effective in communicating.

plot function is used for drawing Line plots.

Syntax

plot(x, y)
# 'x' , 'y' : Data values representing two variables.

Line Plot Example

Data Visualization using Python 11


The above example plots average day temperature of Jan 2018 month.

Temperature is collected on every Monday and Friday of a week. It shows an


increasing trend in temperature.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Daily Temperature in Jan 2018',
xlabel='Day', ylabel='Temperature (in deg)',
xlim=(0, 30), ylim=(25, 35))
days = [1, 5, 8, 12, 15, 19, 22, 26, 29]
temp = [29.3, 30.1, 30.4, 31.5, 32.3, 32.6, 31.8, 32.4, 32.7]
ax.plot(days, temp)
plt.show()

Common Parameters of 'plot' Function

color : Sets the color of the line.

linestyle : Sets the line style, e.g., solid, dashed, etc.

Data Visualization using Python 12


linewidth : Sets the thickness of a line.

marker : Chooses a marker for data points, e.g., circle, triangle, etc.

markersize : Sets the size of the chosen marker.

label : Names the line, which will come in legend.

Setting 'plot' Parameters

For customizing line, required parameters need to be passed as arguments


to plot function.

A green dashed line , having width 3 can be generated by using the following
expression.

ax.plot(days, temp, color='green', linestyle='--', linewidth=


3)

Marking Data Points

Data points are made visible using marker argument.

The below-shown expression plots a green colored line with data points
marked in circles .

Data Visualization using Python 13


ax.plot(days, temp, color='green', marker='o')

Plotting Multiple Lines

Using plot function multiple times is one of the ways to draw multiple lines.

Two lines representing temperatures of two different locations are plotted


using below code.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Daily Temperature of Jan 2018',
xlabel='Day', ylabel='Temperature (in deg)',
xlim=(0, 30), ylim=(25, 35))
days = [1, 5, 8, 12, 15, 19, 22, 26, 29]
location1_temp = [29.3, 30.1, 30.4, 31.5, 32.3, 32.6, 31.8, 3
2.4, 32.7]
location2_temp = [26.4, 26.8, 26.1, 26.4, 27.5, 27.3, 26.9, 2
6.8, 27.0]
ax.plot(days, location1_temp, color='green', marker='o', line
width=3)

Data Visualization using Python 14


ax.plot(days, location2_temp, color='red', marker='o', linewi
dth=3)
plt.show()

Scatter Plot
Scatter plot is very similar to Line Plot .

Scatter Plot is used for showing how one variable is related with another.

Scatter Plot consists of data points. If the spread of data points is linear, then
two variables are highly correlated.

scatter function is used for drawing scatter plots.

Syntax

scatter(x, y)
# 'x', 'y' : Data values representing two variables.

plt.scatter(x-axis,y-axis,s=size,c=color(hexvalues,direct names can be


passed or the x/y-axes can also be passed to get a unique plot),marker="*"
(by default circles are plot but can pass *,^),alpha=0.3(determines the
opacity of the plot)) used for plotting a scatter plot in a figure.

plt.sca() sets the current Axes to ax and the current Figure to the parent of ax.

2) Changing x and y limits:


set_xlim(lower, upper): used for setting starting and ending limits in x axis.
set_ylim(lower, upper): used for setting starting and ending limits in y axis.

3)Adding title and axis labels


set_title("label",loc="right/left/center",fontdict={"fontsize": 20,"color":"red"}) will
set the title for the plot.

Data Visualization using Python 15


set_xlabel("",labelpad,fontdict,) used for setting label for x-axis
set_ylabel() used for setting label for y-axis

4) Adding in equation in titles:


set_title(r"$\frac{1}{2}$") it shows 1/2 in title for math formatting.
5)savefig(,orientation="landscape/potrait",dpi) for saving the plot figure.

Scatter Plot with Scatter Function

scatter plot only marks the data points with the chosen marker.

Data Visualization using Python 16


The below example displays the average temperature of a day, corresponding
to every Monday and Friday of Jan 2018.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Daily Temperature of Jan 2018',
xlabel='Day', ylabel='Temperature (in deg)',
xlim=(0, 30), ylim=(25, 35))
days = [1, 5, 8, 12, 15, 19, 22, 26, 29]
temp = [29.3, 30.1, 30.4, 31.5, 32.3, 32.6, 31.8, 32.4, 32.7]
ax.scatter(days, temp)
plt.show()

Common Parameters of 'scatter'


c : Sets color of markers.
s : Sets size of markers.
marker : Selects a marker. e.g: circle, triangle, etc
edgecolor : Sets the color of lines on edges of markers.
Setting 'scatter' Parameters

Parameters c and s can take a list of values.

If the number of values is less than the number of data points considered, then
the list is repeated.

The below example plots green colored circles of size 60 , with black edges .

ax.scatter(days, temp, marker='o', c=['green'], s=[60], edgec


olor='black')

Scatter Plot Using 'plot'

plotfunction can also create a scatter plot when linestyle is set to none , and
a marker is chosen, as shown in below code.

Data Visualization using Python 17


fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Daily Temperature of Jan 2018',
xlabel='Day', ylabel='Temperature (in deg)',
xlim=(0, 30), ylim=(25, 35))
days = [1, 5, 8, 12, 15, 19, 22, 26, 29]
temp = [29.3, 30.1, 30.4, 31.5, 32.3, 32.6, 31.8, 32.4, 32.7]
ax.plot(days, temp, marker='o', linestyle='none')
plt.show()

Bar Plot
Bar Plot is commonly used for comparing categories.

It is also used to compare categories over a short period of time.

barand barh are used for plotting vertical and horizontal bar plots
respectively.

Syntax

bar(x,height)
# 'x' : x coordinates of bars.
# 'height' : List of heights of each bar.

barh(y, width)
# 'y' : y coordinates of bars
# 'width' : List of widths.

Bar Plot Using 'bar'

Data Visualization using Python 18


The below example plots the average sales of a company, in first three
quarters of 2017 .

The code also sets the ticks on X-Axis and labels them.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Quarterly Sales',
xlabel='Quarter', ylabel='Sales (in millions)')
quarters = [1, 2, 3]
sales_2017 = [25782, 35783, 36133]
ax.bar(quarters, sales_2017)
ax.set_xticks(quarters)
ax.set_xticklabels(['Q1-2017', 'Q2-2017', 'Q3-2017'])
plt.show()

Common Parameters of 'bar'

Data Visualization using Python 19


color : Sets the color of bars.

`edgecolor: Sets the color of the border line of bars.

width : Sets the width of bars

align : Aligns the bars w.r.t x-coordinates

label : Sets label to a bar, appearing in legend.

Setting Parameters of 'bar'


The width of bars can be adjusted with width , color with color , edge color
with edgecolor parameters.

Red color bars with black edges can be drawn using the below expression.

ax.bar(quarters, sales_2017, color='red', width=0.6, edgecolo


r='black')

Plotting Multiple Groups

Vertical bar plots are used for comparing more than one category at a time.

Data Visualization using Python 20


The example below compares a company sales, occurred in first three
quarters of 2016 and 2017 .

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Quarterly Sales',
xlabel='Quarter', ylabel='Sales (in millions)')
quarters = [1, 2, 3]
x1_index = [0.8, 1.8, 2.8]; x2_index = [1.2, 2.2, 3.2]
sales_2016 = [28831, 30762, 32178]; sales_2017 = [25782, 3578
3, 36133]
ax.bar(x1_index, sales_2016, color='yellow', width=0.4, edgec
olor='black', label='2016')
ax.bar(x2_index, sales_2017, color='red', width=0.4, edgecolo
r='black', label='2017')
ax.set_xticks(quarters)
ax.set_xticklabels(['Q1', 'Q2', 'Q3'])
ax.legend()
plt.show()

Barplot Using 'barh'

Data Visualization using Python 21


barh draws the bars horizontally as shown in above image.

height parameter is used to adjust the height of each bar.

Horizontal bar plots are used while comparing values of one category at a
time.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Quarterly Sales',
xlabel='Sales (in millions)', ylabel='Quarter')
quarters = [1, 2, 3]
sales_2017 = [25782, 35783, 36133]
ax.barh(quarters, sales_2017, height=0.6, color='red')
ax.set_yticks(quarters)
ax.set_yticklabels(['Q1-2017', 'Q2-2017', 'Q3-2017'])
plt.show()

Data Visualization using Python 22


Pie Plot
Pie plot is effective in showing the proportion of categories.

It is best suited used for comparing fewer categories.

In general, Pie Plot is used to highlight proportion of one or a group of


categories.

Syntax

pie(x)
# 'x' : sizes of portions, passed either as a fraction or a n
umber.

Pie Plot Using 'pie'

The above pie chart displays company sales, occurred in first three quarters
of 2017 .

fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Quarterly Sales')

Data Visualization using Python 23


sales_2017 = [25782, 35783, 36133]
ax.pie(sales_2017)
plt.show()

Common Parameters of 'pie'


colors : Sets the colors of portions.

labels : Sets the labels of portions.

startangle : Sets the start angle at which portion drawing starts.

autopct : Sets the percentage display format of an area, covering portion

Setting Parameters of 'pie'

Labels and percentage of portions are drawn with below code snippet.

fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Quarterly Sales')
sales_2017 = [25782, 35783, 36133]
quarters = ['Q1-2017', 'Q2-2017', 'Q3-2017']
ax.pie(sales_2017, labels=quarters, startangle=90, autopct='%

Data Visualization using Python 24


1.1f%%')
plt.show()

Histogram
Histogram is used to visualize the spread of data of a distribution.

hist function is used to plot a histogram.

Syntax

hist(x)
# 'x' : Data values of a single variable.

Histogram Using 'hist'

Data Visualization using Python 25


The below example simulates 1000 percentage values from a normal
distribution with mean 60 and standard deviation 10 .

Then the histogram of percentage values is plotted.

import numpy as np
np.random.seed(100)
x = 60 + 10*np.random.randn(1000)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title="Distribution of Student's Percentage",
ylabel='Count', xlabel='Percentage')
ax.hist(x)
plt.show()

Common Parameters of 'hist'


color : Sets the color of bars.

bins : Sets the number of bins to be used.

normed : Sets to True where bins display fraction and not the count.

Setting Parameters of 'hist'

Data Visualization using Python 26


You can also create more bins and show bin count as a fraction (as specified
in the following example).

import numpy as np
np.random.seed(100)
x = 60 + 10*np.random.randn(1000)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title="Distribution of Student's Percentage",
ylabel='Proportion', xlabel='Percentage')
ax.hist(x, color='blue', bins=30, density=True)
plt.show()

Box Plots
Box plots are also used to visualize the spread of data.

Box plots are used to compare distributions.

Box plots can also be used to detect outliers.

Data Visualization using Python 27


Syntax

boxplot(x)
# 'x' : list of values or list of list of values.

Boxplot Using 'boxplot'

The above image displays box plot of Percentages obtained by 1000 Students
of a class.

import numpy as np
np.random.seed(100)
x = 50 + 10*np.random.randn(1000)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title="Box plot of Student's Percentage",
xlabel='Class', ylabel='Percentage')

Data Visualization using Python 28


ax.boxplot(x)
plt.show()

Common Parameters of 'boxplot'


labels : Sets the labels for box plots.

notch : Sets to True if notches need to be created around the median.

bootstrap : Number set to indicate that notches around the median are
bootstrapped.

vert : Sets to False for plotting Box plots horizontally.

Setting Parameters of 'boxplot'

Box plot of Student Percentages can be redrawn by


setting notch , bootstrap and labels using the below-shown expression.

ax.boxplot(x, labels=['A'], notch=True, bootstrap=10000)

Data Visualization using Python 29


Plotting Multiple Boxplots

List of data values can be passed as an argument for plotting multiple box
plots as shown in below code snippet.

import numpy as np
np.random.seed(100)
x = 50 + 10*np.random.randn(1000)
y = 70 + 25*np.random.randn(1000)
z = 30 + 5*np.random.randn(1000)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title="Box plot of Student's Percentage",
xlabel='Class', ylabel='Percentage')
ax.boxplot([x, y, z], labels=['A', 'B', 'C'], notch=True, boo
tstrap=10000)
plt.show()

Plotting Boxplots Horizontally

Data Visualization using Python 30


Box plots are plotted horizontally by setting vert to False , as shown in the
below code snippet.

ax.set(title="Box plot of Student's Percentage",


xlabel='Percentage', ylabel='Class')
ax.boxplot([x, y, z], labels=['A', 'B', 'C'], vert=False, not
ch=True, bootstrap=10000)

Matplotlib Styles
matplotlib.pyplot comes with a lot of styles. Based on the chosen style, the
display of figure changes.

You can view various styles available in pyplot by running the following
commands.

import matplotlib.pyplot as plt


print(plt.style.available)

Data Visualization using Python 31


Output

['seaborn-darkgrid', 'fivethirtyeight', ...]

Using a Style
A specific style can be invoked with either of the two expressions shown
below.

plt.style.use('ggplot')

or

plt.style.context('ggplot')

Using the later expression with a keyword, with is recommended.

The above shown image uses ggplot style.

Data Visualization using Python 32


with plt.style.context('ggplot'):
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Avg. Daily Temperature of Jan 2018',
xlabel='Day', ylabel='Temperature (in deg)',
xlim=(0, 30), ylim=(25, 35))
days = [1, 5, 8, 12, 15, 19, 22, 26, 29]
temp = [29.3, 30.1, 30.4, 31.5, 32.3, 32.6, 31.8, 32.4, 3
2.7]
ax.plot(days, temp, color='green', linestyle='--', linewi
dth=3)
plt.show()

Composing Styles

Multiple style sheets can be used together in matplotlib .

Data Visualization using Python 33


This provides the flexibility to compose two style sheets such as one for
customizing colors and other for customizing element sizes.

with plt.style.context(['dark_background', 'seaborn-poste


r']):
....
....

Creating a Custom Style


A style sheet is a text file having extension .mplstyle .

All custom style sheets are placed in a folder, stylelib , present in the config
directory of matplotlib .

Use the below expression for knowing the Config folder.

import matplotlib
print(matplotlib.get_configdir())

Now, create a file mystyle.mplstyle with the below-shown contents and save it
in the folder <matplotlib_configdir/stylelib/ .

axes.titlesize : 24
axes.labelsize : 20
lines.linewidth : 8
lines.markersize : 10
xtick.labelsize : 16
ytick.labelsize : 16

Reload the matplotlib library with the subsequent expression.

matplotlib.style.reload_library()

Data Visualization using Python 34


A custom style can also be used similar to builtin styles, after reloading the
style library.

The below code snippet used mystyle along with dark_background .

with plt.style.context(['dark_background', 'mystyle']):


....
....

matplotlibrc file
matplotlib uses all the settings specified in matplotlibrc file.

These settings are known as rc settings or rc parameters .

For customization, rc settings can be altered in the file or interactively.

The location of active matplotlibrc file used by matplotlib can be found with
below expression.

import matplotlib
matplotlib.matplotlib_fname()

Matplotlib rcParams
All rc settings , present in matplotlibrc file are stored in a dictionary
named matplotlib.rcParams .

Any settings can be changed by editing values of this dictionary.

For example, if you want to change linewidth and color , the following
expressions can be used.

import matplotlib as mpl


mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['lines.color'] = 'r'

Data Visualization using Python 35


Creating Subplots
Till now, you have seen how to create a single plot in a figure.

In this topic, you will see how to create multiple plots in a single figure.

subplot is one of the functions used to create subplots.

Syntax

subplot(nrows, ncols, index)


# 'index' is the position in a virtual grid with 'nrows' and
'ncols'
# 'index' number varies from 1 to `nrows*ncols`.

subplot creates the Axes object at index position and returns it.

Example of Using 'subplot'

fig = plt.figure(figsize=(10,8))
axes1 = plt.subplot(2, 2, 1, title='Plot1')
axes2 = plt.subplot(2, 2, 2, title='Plot2')
axes3 = plt.subplot(2, 2, 3, title='Plot3')

Data Visualization using Python 36


axes4 = plt.subplot(2, 2, 4, title='Plot4')
plt.show()

The above shown code creates a figure with four subplots, having two rows
and two columns.

The third argument, index value varied from 1 to 4 , and respective subplots
are drawn in row-major order.

Example 2 of 'subplot'

Now let's try to create a figure with three subplots, where the first subplot
spans all columns of first row.

fig = plt.figure(figsize=(10,8))
axes1 = plt.subplot(2, 2, (1,2), title='Plot1')
axes1.set_xticks([]); axes1.set_yticks([])
axes2 = plt.subplot(2, 2, 3, title='Plot2')
axes2.set_xticks([]); axes2.set_yticks([])

Data Visualization using Python 37


axes3 = plt.subplot(2, 2, 4, title='Plot3')
axes3.set_xticks([]); axes3.set_yticks([])
plt.show()

The above code also removes all ticks of x and y axes.

Subplots Using 'GridSpec'


GridSpec class of matplotlib.gridspec can also be used to create Subplots.

Initially, a grid with given number of rows and columns is set up.

Later while creating a subplot, the number of rows and columns of grid,
spanned by the subplot are provided as inputs to subplot function.

Example of Using Gridspec


The below example recreates the previous figure using GridSpec .

A GridSpec object, gd is created with two rows and two columns.

Then a selected grid portion is passed as an argument to subplot .

import matplotlib.gridspec as gridspec


import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10,8))
gd = gridspec.GridSpec(2,2)
axes1 = plt.subplot(gd[0,:],title='Plot1')
axes1.set_xticks([]); axes1.set_yticks([])
axes2 = plt.subplot(gd[1,0])
axes2.set_xticks([]); axes2.set_yticks([])
axes3 = plt.subplot(gd[1,-1])
axes3.set_xticks([]); axes3.set_yticks([])
plt.show()

Creating a Complex Layout

Data Visualization using Python 38


The below example creates a complex layout.

It generates four plots as shown in the above figure.

fig = plt.figure(figsize=(12,10))
gd = gridspec.GridSpec(3,3)
axes1 = plt.subplot(gd[0,:],title='Plot1')
axes1.set_xticks([]); axes1.set_yticks([])
axes2 = plt.subplot(gd[1,:-1], title='Plot2')
axes2.set_xticks([]); axes2.set_yticks([])
axes3 = plt.subplot(gd[1:, 2], title='Plot3')
axes3.set_xticks([]); axes3.set_yticks([])
axes4 = plt.subplot(gd[2, :-1], title='Plot4')
axes4.set_xticks([]); axes4.set_yticks([])
plt.show()

Adding Text
Text can be added to any part of the figure using text function.

Syntax

Data Visualization using Python 39


text(x, y, s)
# 'x', 'y' represent x and y coordinates of a position.
# 's' is the text or string to be written

Example of Adding Text

The below example labels the data points using text .

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set(title='Writing Text',
xlabel='X-Axis', ylabel='Y-Axis',
xlim=(0, 5), ylim=(0, 9))
x = [1, 2, 3, 4]

Data Visualization using Python 40


y = [2, 4, 6, 8]
ax.scatter(x, y, c=['green'], s=[60], edgecolor='black')
for i in range(len(x)):
str_temp = '({}, {})'.format(x[i] - 0.2, y[i] + 0.4)
ax.text(x[i] - 0.4, y[i] + 0.4, str_temp, fontsize=16)
plt.show()

Matplotlib Backend
matplotlib can generate plots in different outputs.

In general, backend refers to everything that occurs from the time of executing
a plotting code to generating the figure.

Backends are of two types.

interactive backends : Graphical user interfaces like PyGTK, wxPython,


Tkinter, qt4, etc.

non-interactive backends : Image files such as PNG, JPEG, etc.

Choosing a Backend
The default backend chosen by matplotlib is available as backend setting
in matplotlibrc file.

If you want to alter the backend of many figures, change the value
of backend setting.

The below expression chooses WXAgg backend.

backend : WXAgg

You can also use use method if you want to go with a specific backend.

import matplotlib
matplotlib.use('WXAgg')

Data Visualization using Python 41


# Above expression must be used before importing pylot

Choosing a Backend
The default backend chosen by matplotlib is available as backend setting
in matplotlibrc file.

If you want to alter the backend of many figures, change the value
of backend setting.

The below expression chooses WXAgg backend.

backend : WXAgg

You can also use use method if you want to go with a specific backend.

import matplotlib
matplotlib.use('WXAgg')
# Above expression must be used before importing pylot

Saving Figures
Once a backend is chosen, a matplotlib figure can be saved in any format
supported by it.

The below shown example saves the plot in png format.

fig = plt.figure(figsize(8,6))
ax = fig.add_subplot(111)
ax.set(title='My First Plot',
xlabel='X-Axis', ylabel='Y-Axis',
xlim=(0, 5), ylim=(0,10))
x = [1, 2, 3, 4]; y = [2, 4, 6, 8]
plt.plot(x, y)
plt.savefig('myplot.png')

Data Visualization using Python 42


Common Pitfalls in Data Visualization
Common pitfalls to be avoided for a better Data Visualization are:

Creating unlabelled plots.

Using 3-Dimensional charts. Don't prefer 3-D plots, unless they add any value
over 2-D charts.

Portions of a pie plot do not sum up to meaningful number.

Showing too many portions in a single pie chart.

Bar charts not starting at zero.

Failing to normalize the data.

Adding extra labels, and fancy images.

Best Practices of Data Visualization


Few of the best practices of Data Visualization are:

Display the data points on the plot, whenever required.

Whenever correlation is plotted, clarify that you have not established any
cause of link between the variables.

Prefer labeling data objects directly inside the plot, rather than using legends.

Create a visualization, which stands by itself. Avoid adding extra text to tell
more about visualization.

def test_hist_of_a_sample_normal_distribution():
fig=plt.figure(figsize=(8,6))
ax=fig.add_subplot(111)
np.random.seed(100)
x1=25 + 3.0*np.random.randn(1000)
ax.set(title='Histogram of a Single Dataset',xlabel='X1',ylabel='Bin Count')
plt.hist(x1,bins=30)
plt.savefig('histogram_normal.png')
plt.show()
test_hist_of_a_sample_normal_distribution()

Data Visualization using Python 43


def test_boxplot_of_four_sample_normal_distribution():
fig=plt.figure(figsize=(8,6))
ax=fig.add_subplot(111)
np.random.seed(100)
x1=25 + 3.0
np.random.randn(1000)
x2=35 + 5.0
np.random.randn(1000)
x3=55 + 10.0
np.random.randn(1000)
x4=45 + 3.0
np.random.randn(1000)
x=[x1,x2,x3,x4]
labels=['X1','X2','X3','X4']
ax.set(title='Box plot of Multiple Datasets',xlabel='Dataset',ylabel='Value')
plt.boxplot(x,labels=labels,notch=True,patch_artist=True,sym='+')
plt.savefig('box_distribution.png')
plt.show()
test_boxplot_of_four_sample_normal_distribution()

Data Visualization using Python 44

You might also like