Data Visualization
Data Visualization
•Data Science Pipeline : “a set of actions which changes the raw (and confusing)
data from various sources (surveys, feedback, list of purchases, votes, etc.), to
an understandable format so that we can store it and use it for analysis.”
1. Better Agreement:
2. A Superior Method:
3. Simple Sharing of Data:
4. Deals Investigation:
5. Discovering Relations Between Occasions:
6. Investigating Openings and Patterns:
•Tableau
•Looker
•Zoho Analytics
•Sisense
•IBM Cognos Analytics
•Qlik Sense
•Domo
•Microsoft Power BI
•Klipfolio
•SAP Analytics Cloud
• Matplotlib is one of the most popular Python packages used for data visualization.
• It is a cross-platform library for making 2D plots from data in arrays.
• To get started you just need to make the necessary imports, prepare some data,
and plotting of graph can be done with the help of the plot() function where as
show() function is used to show the plot.
Example:
# year contains the x-axis values and e-india & e-bangladesh are the
#y-axis values for plotting
plt.legend()
plt.show()
plt.xlabel('Years')
plt.ylabel('Power consumption in kWh')
plt.legend()
plt.show()
•Figure:
• top-level container for all the plots means it is the overall window or page on which
everything is drawn.
• box-like container that can hold one or more axes.
•Axes:
• most basic and flexible component for creating sub-plots.
• A given figure may contain many axes but a given axes can only be in one figure.
This method adds another plot to the current figure at the specified grid position.
Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3]
y = [3, 2, 1]
z = [1, 3, 1]
plt.figure()
plt.subplot(121)
plt.plot(x, y)
plt.subplot(122)
plt.plot(z, y)
Example:
import matplotlib.pyplot as plt
# Creating the figure and subplots
# according the argument passed
fig, axes = plt.subplots(1, 2)
# plotting the data in the 1st subplot
axes[0].plot([1, 2, 3, 4], [1, 2, 3, 4])
# plotting the data in the 1st subplot only
axes[0].plot([1, 2, 3, 4], [4, 3, 2, 1])
# plotting the data in the 2nd subplot only
axes[1].plot([1, 2, 3, 4], [1, 1, 1, 1])
Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3]
y = [3, 2, 1]
z = [1, 3, 1]
# adding the subplots
axes1 = plt.subplot2grid ((7, 1), (0, 0), rowspan = 2, colspan = 1)
axes2 = plt.subplot2grid ((7, 1), (2, 0), rowspan = 2, colspan = 1)
axes3 = plt.subplot2grid ((7, 1), (4, 0), rowspan = 2, colspan = 1)
# plotting the data
axes1.plot(x, y)
axes2.plot(x, z)
axes3.plot(z, y)
• Creating the Legend : A Legend can be created using the legend() method.
• The attribute Loc in the legend() is used to specify the location of the legend.
• The default value of loc is loc=”best” (upper left).
• The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the
corresponding corner of the axes/figure.
• The attribute bbox_to_anchor=(x, y) of legend() function is used to specify the coordinates of
the legend, and the attribute ncol represents the number of columns that the legend has. Its
default value is 1.
Example:
Method 1 :
We can pass an integer in bins stating how many bins/towers to be created in the histogram
and the width of each bin is then changed accordingly.
Example 1 :
import matplotlib.pyplot as plt
marks = [1, 2, 3, 2, 1, 2, 3,
2,
1, 4, 5, 4, 3, 2, 5,
4,
5, 4, 5, 3, 2, 1, 5]
plt.hist(marks, bins=[1, 2, 3,
4, 5], edgecolor="black")
plt.show()
Example:
import matplotlib.pyplot as plt
# data to display on plots
x = [3, 1, 3, 12, 2, 4, 4]
y = [3, 2, 1, 4, 5, 6, 7]
# This will plot a simple scatter chart
plt.scatter(x, y)
# Adding legend to the plot
plt.legend("A")
# Title to the plot
plt.title("Scatter chart")
plt.show()
Example 1:
Example 2:
•A Box Plot is also known as Whisker plot is created to display the summary of the set of data values having
properties like minimum, first quartile, median, third quartile and maximum.
•In the box plot, a box is created from the first quartile to the third quartile, a vertical line is also there which
goes through the box at the median.
•Here x-axis denotes the data to be plotted while the y-axis shows the frequency distribution.
•The matplotlib.pyplot module of matplotlib library provides boxplot() function with the help of which we can
create box plots.
# Import libraries
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
ax = fig.add_axes([0, 0, 1, 1])
bp = ax.boxplot(data)
plt.show()
# loading dataset
data = sns.load_dataset("iris")
# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)
# loading dataset
data = sns.load_dataset("iris")
# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)
plt.show()
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
# loading dataset
data = sns.load_dataset("iris")
# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)
plt.show()
# loading dataset
data = sns.load_dataset("iris")
# draw lineplot
sns.lineplot(x="sepal_length",
y="sepal_width", data=data)
plt.show()
plt.show()
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
plt.show()
# loading dataset
data = sns.load_dataset("iris")
def plot():
sns.lineplot(x="sepal_length", y="sepal_width", data=data)
Example:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
# loading dataset
data = sns.load_dataset("iris")
plt.show()
Example:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
# loading dataset
data = sns.load_dataset("flights")
plot = sns.PairGrid(data)
plot.map(plt.plot)
plt.show()
2. Categorical Plots:
•Categorical Plots are used where we have to visualize relationship between two
numerical values.
•A more specialized approach can be used if one of the main variable
is categorical which means such variables that take on a fixed and limited
number of possible values.