Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DSBDA Lab Assignment No 10

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Group A

Assignment No: 10

Title of the Assignment: Data Visualization III


Download the Iris flower dataset or any other dataset into a DataFrame. (e.g.,
https://archive.ics.uci.edu/ml/datasets/Iris ). Scan the dataset and give the inference as:
1. List down the features and their types (e.g., numeric, nominal) available in the dataset.
2. Create a histogram for each feature in the dataset to illustrate the feature distributions.
3. Create a box plot for each feature in the dataset.
4. Compare distributions and identify outliers.

Objective of the Assignment: Students should be able to perform the data Visualization
operation using Python on any open source dataset

Prerequisite:
1. Basic of Python Programming
2. Seaborn Library, Concept of Data Visualization.
3. Types of variables

Theory:
Histograms:
A histogram is basically used to represent data provided in a form of some groups.It is accurate
method for the graphical representation of numerical data distribution. It is a type of bar plot where
X-axis represents the bin ranges while Y-axis gives information about frequency.
The following table shows the parameters accepted by matplotlib.pyplot.hist() function :
Attribute Parameter

x array or sequence of array

bins optional parameter contains integer or sequence or strings

density optional parameter contains boolean values

range optional parameter represents upper and lower range of bins

optional parameter used to create type of histogram [bar, barstacked, step, stepfilled],
histtype default is “bar”

align optional parameter controls the plotting of histogram [left, right, mid]

weights optional parameter contains array of weights having same dimensions as x

bottom location of the basline of each bin

rwidth optional parameter which is relative width of the bars with respect to bin width

color optional parameter used to set color or sequence of color specs

label optional parameter string or sequence of string to match with multiple datasets

log optional parameter used to set histogram axis on log scale

Algorithm:
1. Import required libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pylab
import seaborn as sns
import os
2. Create the data frame for downloaded iris.csv dataset.
os.chdir("D:\Pandas")
df =
pd.read_csv("Iris.csv") df
3. Apply data preprocessing techniques.
df.isnull().sum()
df.describe()
4. Plot the box plot for each feature in the dataset and observe and detect the
outliers. sns.set(style ="whitegrid", palette = "GnBu_d", rc =
{'figure.figsize':(11.7,8.27)} ) sns.boxplot(x='Species', y='SepalLengthCm', data=df)
plt.title('Distribution of sepal length')
plt.show()
5. Plot the histogram for each feature in the dataset.
df.hist()

Viva Questions
1. For the iris dataset, list down the features and their types.
2. Write a code to create a histogram for each feature. (iris dataset)
3. Write a code to create a boxplot for each feature. (iris dataset)
4. Identify the outliers from the boxplot drawn for iris dataset.

You might also like