Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
252 views

Data Visualization - Matplotlib PDF

This document discusses different types of data visualizations that can be created using the matplotlib library in Python. It provides code examples for creating line charts, multiple plots on the same canvas using subplots, stack plots, pie charts, histograms, scatter plots, and box plots. Matplotlib is introduced as the most popular Python plotting library, with pyplot making plotting easy by controlling font properties, line styles, and formatting axes. Examples of basic line charts are provided along with adding titles, labels, and multiple lines or data points to the same plot.

Uploaded by

pradeep don
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
252 views

Data Visualization - Matplotlib PDF

This document discusses different types of data visualizations that can be created using the matplotlib library in Python. It provides code examples for creating line charts, multiple plots on the same canvas using subplots, stack plots, pie charts, histograms, scatter plots, and box plots. Matplotlib is introduced as the most popular Python plotting library, with pyplot making plotting easy by controlling font properties, line styles, and formatting axes. Examples of basic line charts are provided along with adding titles, labels, and multiple lines or data points to the same plot.

Uploaded by

pradeep don
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Science by Nireekshan

2. Data Visualization with matplotlib

matplotlib

 matplotlib is the most popular python plotting library.


 It has a module called pyplot that makes plotting easy by providing features for controlling
font properties, line styles, formatting axes, etc.
 Matplotlib is very good for creating graphs like line charts, bar charts, histograms and
many more.

1|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

1. Line charts introduction

 A line chart or line graph is a type of chart which displays information as a series of data
points connected by straight line
 A line chart is often used to visualize a trend in data over intervals of time.

Technical info

 plt.plot() - functions create a line plot


 plot.xlabel() & plot.ylabel() - helps to label x and y axis
 plot.legend() - helps to signify the observation variables.
 plot.title() - helps to set the title of the plot.
 plot.show() - helps to display the plot

Program create a simple line chart


Name demo1.py

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 10)
y=x+1

plt.plot(x,y)
plt.show()

Output

2|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

Program create a simple line chart


Name demo2.py

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 10)
y=x+2
z=x+3

plt.title("A Graph")

plt.plot(x, y)
plt.plot(x, z)
plt.show()

Output

3|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

Giving title to line

 We can label the axes as well and add title

Program create a simple line chart and giving title


Name demo3.py

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 10)
y=x+2

plt.title("A Graph")

plt.plot(x, y)
plt.show()

Output

4|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

Labelling the axes

 We can label x axis and y axis by using xlabel and ylabel

Program create a simple line chart and giving title and labelling
Name demo4.py

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0,10)
y=x+2

plt.title("A Graph")

plt.xlabel("X values")
plt.ylabel("Y values")

plt.plot(x,y)
plt.show()

Output

5|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

2. Multiple Plots

 With matplotlib, you can create more than one plot on the same canvas.
 You do so by use of the subplot() function which defines the location and the number of
the plot.

Program create a multiple plots


Name demo5.py

import matplotlib.pyplot as plt

x = range(0, 20)
y = range(0, 40, 2)

plt.subplot(2, 1, 1)

plt.plot(x, y)

plt.ylabel('Value')
plt.title('First chart')
plt.grid(True)

plt.subplot(2, 1, 2)

plt.plot(x, y)

plt.xlabel('Item (s)')
plt.ylabel('Value')
plt.title('Second chart')
plt.grid(True)

plt.show()

Output

 The subplot() command specifies numrows, numcols and fignum.

6|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

3. Stack Plot

 This is an advanced line chart or bar chart that breaks down data from various categories
and stacks them together so that a comparison between the values from various
categories may be made.
 Suppose you need to compare the sales scored by three different months per year over
the last 8 years.

Program Creating stack plot


Name demo6.py

import matplotlib.pyplot as plt

year = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018]

sales1_mobiles = [800,1000,1700,1500,2300,1800,2400,2900]
sales2_tvs = [1000,1400,1900,1600,2500,2000,2600,3200]
sales3_freeze = [1200,1700,2100,1900,2600,2200,2800,3500]

plt.plot([],[], color='y', label = ' sales1_mobiles ')


plt.plot([],[], color='r', label = ' sales2_tvs')
plt.plot([],[], color='b', label = 'sales3_freeze')

plt.stackplot(year, sales1_mobiles, sales2_tvs, sales3_freeze, colors = ['y', 'r', 'b'])


plt.legend()

plt.title('Sales Information')
plt.xlabel('year')
plt.ylabel('sales')

plt.show()

Output

7|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

4. Pie Chart

 This is a circular plot that has been divided into slices displaying numerical proportions.
 Every slice in the pie chart shows the proportion of the element to the whole.
 A large category means that it will occupy a larger portion of the pie chart.

Program Creating pie chart


Name demo7.py

import matplotlib.pyplot as plt

students = 'Nireekshan', 'Abhi', 'Srinu'


points = [62,48,36]

plt.pie(points, labels = students)

plt.axis('equal')
plt.show()

Output

8|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

Attributes

 To create a pie chart, we call the pie function of the pyplot module.
 The first parameter to the function is the list of numbers for every category.
o labels attribute:
 A list of categories separated by commas is then passed as the argument
to labels attribute.
o colors attribute:
 To provide the color for every category.
o To create shadows around the various categories in pie chart.
o To split each slice of the pie chart into its own.

Program Creating pie chart


Name demo8.py

import matplotlib.pyplot as plt

students = 'Nireekshan', 'Abhi', 'Srinu'


points = [60, 48, 36]
colors = ['y', 'r', 'b']

plt.pie(points, labels = students, colors=colors , shadow = True, explode = (0.05,


0.05, 0.05), autopct = '%1.1f%%')

plt.axis('equal')
plt.show()

Output

9|Page nireekshan.ds@gmail.com
Data Science by Nireekshan

5. Histogram

 A histogram is an accurate graphical representation of the distribution of numerical data.


 It takes as input one numerical variable only.
 The variable is cut into several bins, and the number of observation per bin is represented
by the height of the bar.
 It is a good tool when you need to understand the count of data ranges.

Program Creating histogram


Name demo9.py

import pandas as pd
import matplotlib.pyplot as plt

titanic_data = pd.read_csv("sales8.csv")
titanic_data['Quantity'].hist()

plt.show()

Output

10 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan

6. Scatter Plot

 This is a type of plot that shows many data points plotted.


 Each point is a representation of two variables.
 One of the variables is chosen on the vertical axis while the other one is chosen on the
horizontal axis.
 To create a scatter plot, we call the scatter() method of pyplot module.
 This method takes two numeric data points for scattering the data points on the plot.

Program Creating Scatter plot


Name demo10.py

import pandas as pd
import matplotlib.pyplot as plt

titanic_data = pd.read_csv("titanic_train.csv")
titanic_data.plot.scatter(x='Age', y='Fare', figsize=(8,6))

plt.show()

Output

11 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan

7. Box Plots

 Box plots help us measure how well data in a dataset is distributed.


 The dataset is divided into three quartiles.
 The graph shows the maximum, minimum, median, first quartile and third quartiles of the
dataset.
 It is also good for comparing how data is distributed across datasets by creating box plots
for each dataset.

Use Box plots

 Use a boxplot when you need to get the overall statistical information about the data
distribution.
 It is a good tool for detecting outliers in a dataset.
 To create a boxplot, we call the boxplot function of pyplot.
 The function takes the name of the dataset as the parameter.

Program Creating box plot


Name demo11.py

import matplotlib.pyplot as plt

student1 = [8,10,17,15,23,18,24,29]
student2 = [10,14,19,16,25,20,26,32]
student3 = [12,17,21,19,26,22,28,35]

data=[student1, student2, student3]

plt.boxplot(data)

plt.show()

Output

 The line dividing the box into two shows the median of the data.
 The end of the box represents the upper quartile (75%) while the start of the box
represents the lower quartile (25%).
 The part between the upper quartile and the lower quartile is known as the Inter Quartile
Range (IQR) and helps in approximating 50% of the middle data.

12 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan

8. Bar Chart

 This type of chart is used for showing the distribution of data over many groups.
 Most people confuse it with the histogram but note that a histogram only accepts
numerical data for plotting.

When to use?

 It is only good for comparing numerical values.


 Use a bar plot when you need to make a comparison between multiple groups.
 To create a bar plot, we use the bar function of the Matplotlib library.

Program Creating bar chart


Name demo12.py

import matplotlib.pyplot as plt

months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]


sales = [23, 45, 56, 78, 213, 45, 78, 89, 99, 100, 101, 130]

plt.bar(months, sales)

plt.xlabel('Month')
plt.ylabel('Product Sales')

plt.title('A Bar Graph')

plt.show()

Output

13 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan

9. Bubble Chart

 This type of chart shows the data in the form of a cluster of circles.
 The data to generate the bubble chart should have the xy coordinates, the bubble size and
the color of the bubbles.
 The colors can be supplied by use of the Matplotlib library.
 To create a bubble chart, we use the scatter function provided in the Matplotlib library.
Here is an example:

Program Creating bubble chart


Name demo13.py

import numpy as np
import matplotlib.pyplot as plt

x = np.random.rand(30)
y = np.random.rand(30)
z = np.random.rand(30)

colors = np.random.rand(30)

plt.scatter(x, y, s=z*1000,c=colors)

plt.show()

Output

14 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan

10. Heat Maps

 A heat map has values representing different shades of a similar color for every value that
is to be plotted.
 The darker shades of the chart indicate the higher values compared to the lighter shades.
 For the case of a very different value, you can use a more different color.

Program Creating heat maps


Name demo14.py

from pandas import DataFrame


import matplotlib.pyplot as plt

data=[{3, 4, 6, 1}, {6, 5, 4, 2}, {7, 3, 5, 2}, {2, 7, 5, 3}, {1, 8, 1, 4}]
Index= ['I1', 'I2','I3','I4','I5']

Cols = ['Col1', 'Col2', 'Col3','Col4']

df = DataFrame(data, index=Index, columns=Cols)

plt.pcolor(df)

plt.show()

Output

 We have created a two-dimensional plot of values that are mapped to the columns and
indices of the chart.

15 | P a g e nireekshan.ds@gmail.com

You might also like