DSBDA Lab Assignment No 10
DSBDA Lab Assignment No 10
DSBDA Lab Assignment No 10
Assignment No: 10
Objective of the Assignment: Students should be able to perform the data Visualization
operation using Python on any open source dataset
Prerequisite:
1. Basic of Python Programming
2. Seaborn Library, Concept of Data Visualization.
3. Types of variables
Theory:
Histograms:
A histogram is basically used to represent data provided in a form of some groups.It is accurate
method for the graphical representation of numerical data distribution. It is a type of bar plot where
X-axis represents the bin ranges while Y-axis gives information about frequency.
The following table shows the parameters accepted by matplotlib.pyplot.hist() function :
Attribute Parameter
optional parameter used to create type of histogram [bar, barstacked, step, stepfilled],
histtype default is “bar”
align optional parameter controls the plotting of histogram [left, right, mid]
rwidth optional parameter which is relative width of the bars with respect to bin width
label optional parameter string or sequence of string to match with multiple datasets
Algorithm:
1. Import required libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pylab
import seaborn as sns
import os
2. Create the data frame for downloaded iris.csv dataset.
os.chdir("D:\Pandas")
df =
pd.read_csv("Iris.csv") df
3. Apply data preprocessing techniques.
df.isnull().sum()
df.describe()
4. Plot the box plot for each feature in the dataset and observe and detect the
outliers. sns.set(style ="whitegrid", palette = "GnBu_d", rc =
{'figure.figsize':(11.7,8.27)} ) sns.boxplot(x='Species', y='SepalLengthCm', data=df)
plt.title('Distribution of sepal length')
plt.show()
5. Plot the histogram for each feature in the dataset.
df.hist()
Viva Questions
1. For the iris dataset, list down the features and their types.
2. Write a code to create a histogram for each feature. (iris dataset)
3. Write a code to create a boxplot for each feature. (iris dataset)
4. Identify the outliers from the boxplot drawn for iris dataset.