Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
25 views

Visualizing Using Matplotlib

Data Exploration & Visualization - Unit 2

Uploaded by

starbiz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Visualizing Using Matplotlib

Data Exploration & Visualization - Unit 2

Uploaded by

starbiz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

1

UNIT – 2 VISUALIZING USING MATPLOTLIB

What is Visualization?
Data Visualization means graphical or pictorial representation of data using graph, chart etc.,
It helps to communicate the information effectively to intended users. The purpose of plotting
data is to visualize variation or show relationship between variables.
Eg. Traffic Symbols, Ultrasound Reports, Speedometer of Vehicle, Atlas book of maps.
What is Matplotlib?
Matplotlib is a powerful plotting library in Python used for creating static, animated, and
interactive visualizations. Matplotlib’s primary purpose is to provide users with the tools and
functionality to represent data graphically, making it easier to analyze and understand. It was
originally developed by John D. Hunter in 2003 and is now maintained by a large community
of developers.
Key Features of Matplotlib:
1. Versatility: Matplotlib can generate a wide range of plots, including line plots, scatter
plots, bar plots, histograms, pie charts, and more.
2. Customization: It offers extensive customization options to control every aspect of the
plot, such as line styles, colors, markers, labels, and annotations.
3. Integration with NumPy: Matplotlib integrates seamlessly with NumPy, making it
easy to plot data arrays directly.
4. Publication Quality: Matplotlib produces high-quality plots suitable for publication
with fine-grained control over the plot aesthetics.
5. Extensible: Matplotlib is highly extensible, with a large ecosystem of add-on toolkits
and extensions like Seaborn, Pandas plotting functions, and Basemap for geographical
plotting.
6. Cross-Platform: It is platform-independent and can run on various operating systems,
including Windows, macOS, and Linux.
7. Interactive Plots: Matplotlib supports interactive plotting through the use of widgets
and event handling, enabling users to explore data dynamically.
Difference between MATLAB and Matplotlib:
Matlab Matplotlib
1. MATLAB uses a proprietary scripting Matplotlib is a Python library that utilizes Python's
language, syntax

Mr. K. Sathish AP/CSE, ESCET


2

2. MATLAB offers a wide range of Matplotlib focuses on creating high-quality data


functionalities and toolboxes to handle visualizations, making it a suitable library for data
mathematical and scientific computations, exploration and presentation tasks.
including signal processing, image
processing, and control system design.

3. MATLAB requires additional toolboxes or Matplotlib integrates seamlessly with other Python
manual efforts to integrate with external libraries such as NumPy and Pandas, which allows
libraries and to achieve similar for efficient data handling and manipulation
functionality.
4. MATLAB is a commercial software that Matplotlib is an open-source library that is freely
requires a paid license to access its full available and can be easily installed using Python's
capabilities. package manager
5. MATLAB provides an integrated Matplotlib, being a Python library, primarily relies
development environment (IDE) that on third-party IDEs or notebooks (e.g., Jupyter) for
offers a user-friendly interface, interactive coding and execution.
debugging, and direct execution of code
blocks

Applications of Matplotlib:
1. Scientific Research: For plotting experimental results and visualizations that describe
the data more effectively.
2. Finance: For creating financial charts to analyze market trends and movements.
3. Data Analysis: For exploratory data analysis in fields such as data science and machine
learning.
4. Education: For teaching complex concepts in mathematics, physics, and statistics
through visual aids.
5. Engineering: For visualizing engineering simulations and results.

Disadvantages of Matplotlib:
1. Steep Learning Curve
2. Verbose Syntax
3. Default Aesthetics
4. Limited Interactivity
5. Limited 3D Plotting Capabilities
6. Performance Issues with Large Datasets
7. Documentation and Error Messages
Mr. K. Sathish AP/CSE, ESCET
3

8. Dependency on External Libraries


9. Limited Native Support for Statistical Plotting
10. Less Modern Features
Installation of Matplotlib:
To use Pyplot we must first download the Matplotlib module. For this write the following
command:
pip install matplotlib
Pyplot Library:
Pyplot, a sub library of matplotlib, is a collection of functions that helps in creating a variety
of charts. Each pyplot function makes some changes to a figure: e.g., creates a figure, creates
a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels,
etc.
Different Types of Plots in Matplotlib
Matplotlib offers a wide range of plot types to suit various data visualization needs. Here are
some of the most commonly used types of plots in Matplotlib:
 Line Graph
 Stem Plot
 Bar chart
 Histograms
 Scatter Plot
 Stack Plot
 Box Plot
 Pie Chart
 Error Plot
 Contour Plot
 3D Plots
Plot Function:
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:
import matplotlib.pyplot as plt
Now the Pyplot package can be referred to as plt.
Mr. K. Sathish AP/CSE, ESCET
4

Syntax:
matplotlib.pyplot.plot(*args, scalex=True, scaley=True, data=None, **kwargs)
Parameters:
This function accepts parameters that enable us to set axes scales and format the graphs. These
parameters are mentioned below :-
 plot(x, y): plot x and y using default line style and color.
 plot.axis([xmin, xmax, ymin, ymax]): scales the x-axis and y-axis from minimum to
maximum values
 plot.(x, y, color=’green’, marker=’o’, linestyle=’dashed’, linewidth=2,
markersize=12):
x and y co-ordinates are marked using circular markers of size 12 and green color line
with — style of width 2
 plot.xlabel(‘X-axis’): names x-axis
 plot.ylabel(‘Y-axis’): names y-axis
 plot(x, y, label = ‘Sample line ‘): plotted Sample Line will be displayed as a legend.
What is a Matplotlib Figure?
In Matplotlib, a figure is the top-level container that holds all the elements of a plot. It
represents the entire window or page where the plot is drawn.

Mr. K. Sathish AP/CSE, ESCET


5

Basic Components or Parts of Matplotlib Figure


The parts of a Matplotlib figure include (as shown in the figure above):
1. Figures in Matplotlib: The Figure object is the top-level container for all elements of
the plot. It serves as the canvas on which the plot is drawn. You can think of it as the
blank sheet of paper on which you’ll create your visualization.
2. Axes in Matplotlib: Axes are the rectangular areas within the figure where data is
plotted. Each figure can contain one or more axes, arranged in rows and columns if
necessary. Axes provide the coordinate system and are where most of the plotting
occurs.
3. Axis in Matplotlib: Axis objects represent the x-axis and y-axis of the plot. They
define the data limits, tick locations, tick labels, and axis labels. Each axis has a scale
and a locator that determine how the tick marks are spaced.
4. Marker in Matplotlib: Markers are symbols used to denote individual data points on
a plot. They can be shapes such as circles, squares, triangles, or custom symbols.
Markers are often used in scatter plots to visually distinguish between different data
points.
5. Adding lines to Figures: Lines connect data points on a plot and are commonly used
in line plots, scatter plots with connected points, and other types of plots. They represent

Mr. K. Sathish AP/CSE, ESCET


6

the relationship or trend between data points and can be styled with different colors,
widths, and styles to convey additional information.
6. Matplotlib Title: The title is a text element that provides a descriptive title for the plot.
It typically appears at the top of the figure and provides context or information about
the data being visualized.
7. Axis Labels in Matplotlib: Labels are text elements that provide descriptions for the
x-axis and y-axis. They help identify the data being plotted and provide units or other
relevant information.
8. Ticks: Tick marks are small marks along the axis that indicate specific data points or
intervals. They help users interpret the scale of the plot and locate specific data values.
9. Tick Labels: Tick labels are text elements that provide labels for the tick marks. They
usually display the data values corresponding to each tick mark and can be customized
to show specific formatting or units.
10. Matplotlib Legend: Legends provide a key to the symbols or colors used in the plot to
represent different data series or categories. They help users interpret the plot and
understand the meaning of each element.
11. Matplotlib Grid Lines: Grid lines are horizontal and vertical lines that extend across
the plot, corresponding to specific data intervals or divisions. They provide a visual
guide to the data and help users identify patterns or trends.
12. Spines of Matplotlib Figures: Spines are the lines that form the borders of the plot
area. They separate the plot from the surrounding whitespace and can be customized to
change the appearance of the plot borders.

SIMPLE LINE PLOTS:


Line plots are a very important plot type as they are useful in displaying time series data. It is
often important to visualize how KPIs change over time to understand patterns in data that can
be actioned on.
Example 1: Plotting a Simple Line Plot Styles in Matplotlib
In this example, we use Matplotlib to visualize the marks of 20 students in a class. Each
student’s name is paired with a randomly generated mark, and a dashed magenta line
graph represents the distribution of these marks.
import matplotlib.pyplot as plt
import random as random

students = ["Jane","Joe","Beck","Tom",
"Sam","Eva","Samuel","Jack",
"Dana","Ester","Carla","Steve",

Mr. K. Sathish AP/CSE, ESCET


7

"Fallon","Liam","Culhane","Candance",
"Ana","Mari","Steffi","Adam"]

marks=[]
for i in range(0,len(students)):
marks.append(random.randint(0, 101))

plt.xlabel("Students")
plt.ylabel("Marks")
plt.title("CLASS RECORDS")
plt.plot(students,marks,'m--')

Output:

Customization of Line Plots:


Line Styles in Matplotlib
Below are the available line styles and marker styles present in Matplotlib.

Character Definition

– Solid line

Mr. K. Sathish AP/CSE, ESCET


8

— Dashed line

-. dash-dot line

: Dotted line

. Point marker

o Circle marker

, Pixel marker

v triangle_down marker

^ triangle_up marker

< triangle_left marker

> triangle_right marker

1 tri_down marker

2 tri_up marker

3 tri_left marker

4 tri_right marker

Mr. K. Sathish AP/CSE, ESCET


9

s square marker

p pentagon marker

* star marker

h hexagon1 marker

H hexagon2 marker

+ Plus marker

x X marker

D Diamond marker

d thin_diamond marker

| vline marker

_ hline marker

Color codes:
The below table shows the list of colours that are supported to change the color of
plotted data. We can either use character codes or the color names as values to the parameter
color in the plot().

Mr. K. Sathish AP/CSE, ESCET


10

Codes Description

b blue

g green

r red

c cyan

m magenta

y yellow

k black

w white

Example 2: Adding Markers to Line Plots


A marker is any symbol that represents a data value in a line chart or a scatter plot.
In this example, we visualize the marks of 20 students using a line plot. Each student’s name
is plotted against their corresponding mark. The line is solid green, and data points are marked
with red circles. By this example, we can understand how to change the line styles and markers
in Matplotlib.
import matplotlib.pyplot as plt
import random as random

students = ["Jane","Joe","Beck","Tom","Sam",
"Eva","Samuel","Jack","Dana","Ester",
"Carla","Steve","Fallon","Liam","Culhane",
"Candance","Ana","Mari","Steffi","Adam"]

marks=[]

Mr. K. Sathish AP/CSE, ESCET


11

for i in range(0,len(students)):
marks.append(random.randint(0, 101))

plt.xlabel("Students")
plt.ylabel("Marks")
plt.title("CLASS RECORDS")
plt.plot(students, marks, color = 'green',
linestyle = 'solid', marker = 'o',
markerfacecolor = 'red', markersize = 12)

Output:

Example 3: Adding Grid Lines in Line Plots


Grids are made up of intersecting straight (vertical, horizontal, and angular) or curved lines
used to structure our content.
The grid() function in the Pyplot module of the Matplotlib library is used to configure the grid
lines.
Syntax: matplotlib.pyplot.grid(True, color = “grey”, linewidth = “1.4”, axis = ”Y”, linestyle
= “-.”)
The grid() sets the visibility of grids by specifying a boolean value (True/False). We can also
choose to display minor or major ticks or both. Also, color, linewidth, and linestyle can be
changed as additional parameters.

Mr. K. Sathish AP/CSE, ESCET


12

In this example, we use Matplotlib to visualize the marks of 20 students in a class. The marks
are displayed using a dashed magenta line graph. Grid lines are added to provide better
readability and reference across the plot.
import matplotlib.pyplot as plt
import random as random

students = ["Jane", "Joe", "Beck", "Tom",


"Sam", "Eva", "Samuel", "Jack",
"Dana", "Ester", "Carla", "Steve",
"Fallon", "Liam", "Culhane", "Candance",
"Ana", "Mari", "Steffi", "Adam"]

marks = []
for i in range(0, len(students)):
marks.append(random.randint(0, 101))

plt.xlabel("Students")
plt.ylabel("Marks")
plt.title("CLASS RECORDS")
plt.plot(students, marks, 'm--')

# Adding grid lines


plt.grid(True, which='both', linestyle='--', linewidth=0.5)

plt.show()

Output:

Example 4: Plot Multiple lines in Matplotlib


import matplotlib.pyplot as plt

# Example data
x = range(10)

Mr. K. Sathish AP/CSE, ESCET


13

y1 = [xi**2 for xi in x]
y2 = [xi**1.5 for xi in x]

# Plotting multiple lines


plt.plot(x, y1, label='x squared')
plt.plot(x, y2, label='x to the power of 1.5')
plt.legend()
plt.title('Multi-Line Graph')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Output:

Customizing the line plot – Labeling the plots:


Titles and axis labels are the simplest labels. These are the methods that can be used to
quickly set them.
import matplotlib.pyplot as plt
month=[“Jan”, “Feb”, “Mar”, Äpr”, “May”, “Jun”]
msaving=[1000, 500, 700, 550, 600, 800]
plt.plot(month,msaving)
plt.xlabel(“Month name”)
plt.ylabel(“Monthly Saving”)
plt.title(“Month Name vs Monthly Saving”)
plt.grid(True)
plt.show()

Mr. K. Sathish AP/CSE, ESCET


14

Line Width and Line Style:


The line width and line style property can be used to change the width and style of the
line chart. Line Width is specified in pixels. The default line width is 1 pixel. We can also set
the line style of a line chart using the linestyle parameter. It can be given in the following ways.
plt.plot(x,color=“red”, linewidth=3,linestyle=’solid’)
or
plt.plot(x,color=“red”, linewidth=3,linestyle=’:’)
To combine these linestyle and color codes using non keyword arguments to the plt.plot()
function.
plt.plot(x,linewidth=2,’:r’) #dotted red
Adjusting the Plot: Axes Limits
Axes limits in Matplotlib define the range of values displayed along the x-axis and y-axis in a
plot. They determine the span of data visible within the plot area specifying the minimum and
maximum values shown on each axis. Matplotlib automatically calculates and sets the axes
limits based on the range of data plotted by default.
X-axis and Y-axis Limits − Axes limits can be set independently for the x-axis and y-axis.
These limits dictate the range of values displayed along each axis.
The plt.xlim() function in Matplotlib is used to set the limits for the x-axis in a plot. It allows
us to specify the range of values that will be displayed along the x-axis.

Mr. K. Sathish AP/CSE, ESCET


15

Syntax: plt.xlim(left, right)


Where,
 left − The minimum value to be displayed on the x-axis.
 right − The maximum value to be displayed on the x-axis.
The plt.ylim() function in Matplotlib is used to set the limits for the x-axis in a plot. It allows
us to specify the range of values that will be displayed along the x-axis.
Syntax: plt.ylim(top, bottom)
Where,
 bottom − The minimum value to be displayed on the x-axis.
 top − The maximum value to be displayed on the x-axis.
Customization − Limits can be manually set using plt.xlim() for the x-axis and for the y-
axis plt.ylim() or using ax.set_xlim() and ax.set_ylim() methods when working with an axis
object ax.
Plot Multiple Plots:
With the subplot() function you can draw multiple plots in one figure. The subplot() function
takes three arguments that describes the layout of the figure.

The layout is organized in rows and columns, which are represented by the first and second
argument.
The third argument represents the index of the current plot.
plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.
plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.
Example:
import matplotlib.pyplot as plt
import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)

Mr. K. Sathish AP/CSE, ESCET


16

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)

plt.show()

SIMPLE SCATTER PLOTS:


Scatter plots serve as a visual tool to explore and analyze the relationships between
variables, utilizing dots to depict the connection between them. In scatter plots, instead of
points being joined by the line segments, the points are represented individually with a dot or
circlr or shape. These plots are instrumental in illustrating the interdependencies among
variables and how alterations in one variable can impact another.
Scatter plots can be constructed in the following 2 situations:
 When one continuous variable is dependent on another variable, which is under the
control of observer.
 When both continuous variables are independent.
Some Examples in which scatter plots are suitable are as follows:
 Studies have successfully established that the number of hours of sleep required by a
person depends on the age of a person.
 The average income for adults is based on the number of years of education.

Mr. K. Sathish AP/CSE, ESCET


17

Syntax: matplotlib.pyplot.scatter(x_axis_data, y_axis_data, s=None, c=None, marker=None,


cmap=None, vmin=None, vmax=None, alpha=None, linewidths=None, edgecolors=None)
Parameters:
 x_axis_data: An array containing data for the x-axis.matplotlib
 s: Marker size, which can be a scalar or an array of size equal to the size of x or y.
 c: Color of the sequence of colors for markers.
 marker: Marker style.
 cmap: Colormap name.
 linewidths: Width of the marker border.
 edgecolor: Marker border color.
 alpha: Blending value, ranging between 0 (transparent) and 1 (opaque).
Except for x_axis_data and y_axis_data, all other parameters are optional, with their default
values set to None.
Example - 1:
import matplotlib.pyplot as plt
x =[5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6]
y =[99, 86, 87, 88, 100, 86, 103, 87, 94, 78, 77, 85, 86]
plt.scatter(x, y, c ="blue")
# To show the plot
plt.show()
Output:

Mr. K. Sathish AP/CSE, ESCET


18

Example – 2: Drawing two plots on same figure


import matplotlib.pyplot as plt
import numpy as np

#day one, the age and speed of 13 cars:


x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)

#day two, the age and speed of 15 cars:


x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y)
plt.show()
Output:

Mr. K. Sathish AP/CSE, ESCET


19

plot() Vs scatter()
Matplotlib’s plt.plot() is a general purpose plotting function that will allow to create
various different line or marker plots.
The primary difference of plt.scatter from plt.plot is that it can be used to create scatter
plots where the properties of each individual point (size, face color, edge color, etc.) can be
individually controlled or mapped to data.
plt.plot can be noticeably more efficient than plt.scatter. The reason is
that plt.scatter has the capability to render a different size and/or color for each point, so the
renderer must do the extra work of constructing each point individually.
In plt.plot, on the other hand, the points are always essentially clones of each other, so
the work of determining the appearance of the points is done only once for the entire set of
data.
For large datasets, the difference between these two can lead to vastly different
performance, and for this reason, plt.plot should be preferred over plt.scatter for large datasets.

Customizing Scatter Plot:


Color:
We can even set a specific color for each dot by using an array of colors as value for
the c argument
Example:
Set your own color of the markers:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors=np.array(["red","green","blue","yellow","pink","black","orange","purple","beige","br
own","gray","cyan","magenta"])
plt.scatter(x, y, c=colors)
plt.show()

Mr. K. Sathish AP/CSE, ESCET


20

Size &
You can change the size of the dots with the s argument. Just like colors, make sure the array
for sizes has the same length as the arrays for the x- and y-axis.
Example
Set your own size for the markers:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])
plt.scatter(x, y, s=sizes)
plt.show()

COLOR EACH DOT


 To set a specific color for each dot by using an array of colors as value for the c
argument:

Mr. K. Sathish AP/CSE, ESCET


21

 You cannot use the color argument for this, only the c argument.
Example
Set your own color of the markers:
Example:
colors=np.array(["red","green","blue","yellow","pink","black","orange","purple","beige","br
own","gray","cyan","magenta"])
plt.scatter(x, y, c=colors) plt.show()

COLORMAP
The Matplotlib module has a number of available colormaps. A colormap is like a list of colors,
where each color has a value that ranges from 0 to 100.

 This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple
color, up to 100, which is a yellow color.
 You can specify the colormap with the keyword argument cmap with the value of the
colormap, in this case 'viridis' which is one of the built-in colormaps available in
Matplotlib.

Mr. K. Sathish AP/CSE, ESCET


22

 In addition you have to create an array with values (from 0 to 100), one value for each
point in the scatter plot:
Example:
Create a color array, and specify a colormap in the scatter plot:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar()
plt.show()

ALPHA
You can adjust the transparency of the dots with the alpha argument. Just like colors, make
sure the array for sizes has the same length as the arrays for the x- and y-axis:
Example
Set your own size for the markers:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])

Mr. K. Sathish AP/CSE, ESCET


23

plt.scatter(x, y, s=sizes, alpha=0.5)


plt.show()

COMBINE COLOR SIZE AND ALPHA


You can combine a colormap with different sizes of the dots. This is best visualized if the dots
are transparent:
Example
Create random arrays with 100 values for x-points, y-points, colors and sizes:
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randint(100, size=(100))
y = np.random.randint(100, size=(100))
colors = np.random.randint(100, size=(100))
sizes = 10 * np.random.randint(100, size=(100))
plt.scatter(x, y, c=colors, s=sizes, alpha=0.5, cmap='nipy_spectral')
plt.colorbar() plt.show()

Mr. K. Sathish AP/CSE, ESCET


24

Mr. K. Sathish AP/CSE, ESCET

You might also like