Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
132 views

3) Code For ID3 Algorithm Implementation

The document loads and analyzes Iris flower data using Python libraries like Pandas and Seaborn. It uploads an Iris CSV file, loads the data into a Pandas dataframe, then performs various visualizations and analyses. These include scatter plots of features colored by species, box plots, density plots, and pair plots to understand relationships between features and species. It also fits a decision tree classifier to the data and plots the tree.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
132 views

3) Code For ID3 Algorithm Implementation

The document loads and analyzes Iris flower data using Python libraries like Pandas and Seaborn. It uploads an Iris CSV file, loads the data into a Pandas dataframe, then performs various visualizations and analyses. These include scatter plots of features colored by species, box plots, density plots, and pair plots to understand relationships between features and species. It also fits a decision tree classifier to the data and plots the tree.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

 

import pandas as pd

from google.colab import files
uploaded = files.upload()

Choose Files No file chosen


Upload widget is only available when the cell has been
executed in the
current browser session. Please rerun this cell to enable.
Saving Iris.csv to Iris (1).csv

import io
Iris = pd.read_csv(io.BytesIO(uploaded['Iris.csv']))
Iris

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

... ... ... ... ... ... ...

145 146 6.7 3.0 5.2 2.3 Iris-virginica

146 147 6.3 2.5 5.0 1.9 Iris-virginica

147 148 6.5 3.0 5.2 2.0 Iris-virginica

148 149 6.2 3.4 5.4 2.3 Iris-virginica

149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

import pandas as pd
 
# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll igno
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", color_codes=True)

Iris.head()
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa


Iris["Species"].value_counts()
3 4 4.6 3.1 1.5 0.2 Iris-setosa

Iris-setosa
4 5 50

5.0 3.6 1.4 0.2 Iris-setosa


Iris-versicolor 50

Iris-virginica 50

Name: Species, dtype: int64

# The first way we can plot things is using the .plot extension from Pandas dataframes
# We'll use this to make a scatterplot of the Iris features.
Iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")

*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoide
<matplotlib.axes._subplots.AxesSubplot at 0x7fb55c046750>

# We can also use the seaborn library to make a similar plot
# A seaborn jointplot shows bivariate scatterplots and univariate histograms in the same f
sns.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=Iris, size=10)
<seaborn.axisgrid.JointGrid at 0x7fb55bf27150>

# One piece of information missing in the plots above is what species each plant is
# We'll use seaborn's FacetGrid to color the scatterplot by species
sns.FacetGrid(Iris, hue="Species", size=5) \
   .map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
   .add_legend()

<seaborn.axisgrid.FacetGrid at 0x7fb55bd81710>

# We can look at an individual feature in Seaborn through a boxplot
sns.boxplot(x="Species", y="PetalLengthCm", data=Iris)
<matplotlib.axes._subplots.AxesSubplot at 0x7fb55bc8ced0>

# A final seaborn plot useful for looking at univariate relations is the kdeplot,
# which creates and visualizes a kernel density estimate of the underlying feature
sns.FacetGrid(Iris, hue="Species", size=6) \
   .map(sns.kdeplot, "SepalLengthCm") \
   .add_legend()

<seaborn.axisgrid.FacetGrid at 0x7fb5657b0350>

# Another useful seaborn plot is the pairplot, which shows the bivariate relation
# between each pair of features

# From the pairplot, we'll see that the Iris-setosa species is separataed from the other
# two across all feature combinations
sns.pairplot(Iris.drop("Id", axis=1), hue="Species", size=3)
<seaborn.axisgrid.PairGrid at 0x7fb56579ae50>

from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
X, y = iris.data, iris.target
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
clf

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',

max_depth=None, max_features=None, max_leaf_nodes=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, presort='deprecated',

random_state=None, splitter='best')

tree.plot_tree(clf) 

[Text(167.4, 199.32, 'X[2] <= 2.45\ngini = 0.667\nsamples = 150\nvalue = [50, 50, 50]
Text(141.64615384615385, 163.07999999999998, 'gini = 0.0\nsamples = 50\nvalue = [50,
Text(193.15384615384616, 163.07999999999998, 'X[3] <= 1.75\ngini = 0.5\nsamples = 10
Text(103.01538461538462, 126.83999999999999, 'X[2] <= 4.95\ngini = 0.168\nsamples =
Text(51.50769230769231, 90.6, 'X[3] <= 1.65\ngini = 0.041\nsamples = 48\nvalue = [0,
Text(25.753846153846155, 54.359999999999985, 'gini = 0.0\nsamples = 47\nvalue = [0,
Text(77.26153846153846, 54.359999999999985, 'gini = 0.0\nsamples = 1\nvalue = [0, 0,
Text(154.52307692307693, 90.6, 'X[3] <= 1.55\ngini = 0.444\nsamples = 6\nvalue = [0,
Text(128.76923076923077, 54.359999999999985, 'gini = 0.0\nsamples = 3\nvalue = [0, 0
Text(180.27692307692308, 54.359999999999985, 'X[0] <= 6.95\ngini = 0.444\nsamples =
Text(154.52307692307693, 18.119999999999976, 'gini = 0.0\nsamples = 2\nvalue = [0, 2
Text(206.03076923076924, 18.119999999999976, 'gini = 0.0\nsamples = 1\nvalue = [0, 0
Text(283.2923076923077, 126.83999999999999, 'X[2] <= 4.85\ngini = 0.043\nsamples = 4
Text(257.53846153846155, 90.6, 'X[1] <= 3.1\ngini = 0.444\nsamples = 3\nvalue = [0,
Text(231.7846153846154, 54.359999999999985, 'gini = 0.0\nsamples = 2\nvalue = [0, 0,
Text(283.2923076923077, 54.359999999999985, 'gini = 0.0\nsamples = 1\nvalue = [0, 1,
Text(309.04615384615386, 90.6, 'gini = 0.0\nsamples = 43\nvalue = [0, 0, 43]')]

import graphviz 

dot_data = tree.export_graphviz(clf, out_file=None) 

graph = graphviz.Source(dot_data)

graph.render("iris") 

'iris.pdf'

dot_data = tree.export_graphviz(clf, out_file=None, 
feature names=iris feature names
...                      feature_names=iris.feature_names,  
...                      class_names=iris.target_names,  
...                      filled=True, rounded=True,  
...                      special_characters=True)  

graph = graphviz.Source(dot_data)

graph 
petal length (c
gini = 0
samples
value = [50
class = s

True

gini = 0.0
samples = 50
value = [50, 0, 0]
class = setosa

petal length (cm) ≤ 4


gini = 0.168
samples = 54
value = [0, 49, 5
class = versicolo

You might also like