Data Visualization - Matplotlib PDF
Data Visualization - Matplotlib PDF
matplotlib
1|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
A line chart or line graph is a type of chart which displays information as a series of data
points connected by straight line
A line chart is often used to visualize a trend in data over intervals of time.
Technical info
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 10)
y=x+1
plt.plot(x,y)
plt.show()
Output
2|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 10)
y=x+2
z=x+3
plt.title("A Graph")
plt.plot(x, y)
plt.plot(x, z)
plt.show()
Output
3|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 10)
y=x+2
plt.title("A Graph")
plt.plot(x, y)
plt.show()
Output
4|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
Program create a simple line chart and giving title and labelling
Name demo4.py
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0,10)
y=x+2
plt.title("A Graph")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.plot(x,y)
plt.show()
Output
5|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
2. Multiple Plots
With matplotlib, you can create more than one plot on the same canvas.
You do so by use of the subplot() function which defines the location and the number of
the plot.
x = range(0, 20)
y = range(0, 40, 2)
plt.subplot(2, 1, 1)
plt.plot(x, y)
plt.ylabel('Value')
plt.title('First chart')
plt.grid(True)
plt.subplot(2, 1, 2)
plt.plot(x, y)
plt.xlabel('Item (s)')
plt.ylabel('Value')
plt.title('Second chart')
plt.grid(True)
plt.show()
Output
6|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
3. Stack Plot
This is an advanced line chart or bar chart that breaks down data from various categories
and stacks them together so that a comparison between the values from various
categories may be made.
Suppose you need to compare the sales scored by three different months per year over
the last 8 years.
sales1_mobiles = [800,1000,1700,1500,2300,1800,2400,2900]
sales2_tvs = [1000,1400,1900,1600,2500,2000,2600,3200]
sales3_freeze = [1200,1700,2100,1900,2600,2200,2800,3500]
plt.title('Sales Information')
plt.xlabel('year')
plt.ylabel('sales')
plt.show()
Output
7|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
4. Pie Chart
This is a circular plot that has been divided into slices displaying numerical proportions.
Every slice in the pie chart shows the proportion of the element to the whole.
A large category means that it will occupy a larger portion of the pie chart.
plt.axis('equal')
plt.show()
Output
8|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
Attributes
To create a pie chart, we call the pie function of the pyplot module.
The first parameter to the function is the list of numbers for every category.
o labels attribute:
A list of categories separated by commas is then passed as the argument
to labels attribute.
o colors attribute:
To provide the color for every category.
o To create shadows around the various categories in pie chart.
o To split each slice of the pie chart into its own.
plt.axis('equal')
plt.show()
Output
9|Page nireekshan.ds@gmail.com
Data Science by Nireekshan
5. Histogram
import pandas as pd
import matplotlib.pyplot as plt
titanic_data = pd.read_csv("sales8.csv")
titanic_data['Quantity'].hist()
plt.show()
Output
10 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan
6. Scatter Plot
import pandas as pd
import matplotlib.pyplot as plt
titanic_data = pd.read_csv("titanic_train.csv")
titanic_data.plot.scatter(x='Age', y='Fare', figsize=(8,6))
plt.show()
Output
11 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan
7. Box Plots
Use a boxplot when you need to get the overall statistical information about the data
distribution.
It is a good tool for detecting outliers in a dataset.
To create a boxplot, we call the boxplot function of pyplot.
The function takes the name of the dataset as the parameter.
student1 = [8,10,17,15,23,18,24,29]
student2 = [10,14,19,16,25,20,26,32]
student3 = [12,17,21,19,26,22,28,35]
plt.boxplot(data)
plt.show()
Output
The line dividing the box into two shows the median of the data.
The end of the box represents the upper quartile (75%) while the start of the box
represents the lower quartile (25%).
The part between the upper quartile and the lower quartile is known as the Inter Quartile
Range (IQR) and helps in approximating 50% of the middle data.
12 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan
8. Bar Chart
This type of chart is used for showing the distribution of data over many groups.
Most people confuse it with the histogram but note that a histogram only accepts
numerical data for plotting.
When to use?
plt.bar(months, sales)
plt.xlabel('Month')
plt.ylabel('Product Sales')
plt.show()
Output
13 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan
9. Bubble Chart
This type of chart shows the data in the form of a cluster of circles.
The data to generate the bubble chart should have the xy coordinates, the bubble size and
the color of the bubbles.
The colors can be supplied by use of the Matplotlib library.
To create a bubble chart, we use the scatter function provided in the Matplotlib library.
Here is an example:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(30)
y = np.random.rand(30)
z = np.random.rand(30)
colors = np.random.rand(30)
plt.scatter(x, y, s=z*1000,c=colors)
plt.show()
Output
14 | P a g e nireekshan.ds@gmail.com
Data Science by Nireekshan
A heat map has values representing different shades of a similar color for every value that
is to be plotted.
The darker shades of the chart indicate the higher values compared to the lighter shades.
For the case of a very different value, you can use a more different color.
data=[{3, 4, 6, 1}, {6, 5, 4, 2}, {7, 3, 5, 2}, {2, 7, 5, 3}, {1, 8, 1, 4}]
Index= ['I1', 'I2','I3','I4','I5']
plt.pcolor(df)
plt.show()
Output
We have created a two-dimensional plot of values that are mapped to the columns and
indices of the chart.
15 | P a g e nireekshan.ds@gmail.com