We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 10
7113728, 1106 AM
lis Flower Project Task + - Jupyter Notebook
Project Report On Iris Flower Classification
Using Machine Learning
Submitted By, Mr. Omkar Balwant Jadhav
Introduction
The purpose of this project is to perform classification on the Iris flower dataset. The Iris dataset
is a widely-used dataset in machine learning and consists of measurements of four features
(Sepal Length, Sepal Width, Petal Length, and Petal Width) of three different species of Iris
flowers (Setosa, Versicolor, and Virginica)
Iris Flower Classification
Defining the problem statement
Collecting the data
Filtering Data
Exploratory data analysis
Feature engineering
Data Visualization
Machine Leaming Stages
1. Defining the problem statement
In this project, we study the data of iris flower which is present in tabular format in which we use
different libraries like numpy, pandas and matplotlib and different machine leaming algorithms.
We study different columns of the table and try to co-relate them with others and find a relation
between those two.
We try to find and analyze those key factors like species, petal lenths. etc which helps
classification of iris data
2. Collecting Data
locas 8888inotebooks/Desktop/Final Projectiris Flower Project Task 1 pynti#Submitted-By-Mr-Omkar-Balwant-Jadhav 107113728, 1106 AM
In [2]:
out [2]:
In [3]:
out(3]:
In [4]:
out[4]:
lis Flower Project Task + - Jupyter Notebook
1 import pandas as pd
2 inis_data=pd.read_csv!
3 inis_data
\Users\\Onkar\\Desktop\\Iris.csv")
Id SepalLengthcm SepalWidthcm PetalLengthcm PetalWidthcm Species,
o 4 eA 35 14 02 is-setosa
12 49 30 14 02. ris-setosa
2 3 47 32 13 0.2 Iris-setosa
34 46 34 15 02 rie-setosa
45 50 38 14 02 is-setosa
145146 67 30 52 23. Inswiginica
148 147 63 25 50 4.9. biswieginica
147 148 65 30 52 20. Iniswviginica
143 149 62 34 54 23. Iniswirginica
149 150 59 30 54 1.8 is-vieginica
150 rows 6 columns
3.Filtering Data
1 inis_data. shape
(158, 6)
1 Gris _data[ "Species"
-value_counts()
Inis-setosa 50
Iris-versicolor 50
Inis-virginica 50
Name: Species, dtype: inted
4. Exploratory data analysis
Exploratory Data Analysis refers to the critical process of performing initial investigations on
data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions
with the help of summary statistics and graphical representations.
Itis a good practice to understand the data first and try to gather as many insights from tt.
EDA\is all about making sense of data in hand.
Iocahost 8888/notebooks/Desktop/Final Projects Flower Projact Task 1 ipynb#Submitted By-Mr-Omkar-Balwant Jadhav 2107113728, 1106 AM lis Flower Project Task + - Jupyter Notebook
In [5]: 1 iris_data.info()
Rangelndex: 15@ entries, @ to 149
Data columns (total 6 columns):
# Column Non-Null Count type
Id 15@ non-null intea
SepallengthCm 15@ non-null float6a
SepalWidthCm 150 non-null float64
PetalLengthCm 15@ non-null —floate4
PetalWidthCm 15@ non-null —floatea
5 Species 15@ non-null object
dtypes: floatea(4), int64(1), object(1)
memory usage: 7.2+ KB
e
1
2
3
4
In [6]: 1 iris_data.describe()
outs}: Id SepalLengthcm SepalWidthCm PatalLengthCm PetalWidthCm
count 150,000000 180,000000 150,000000 150.0000 160,000000,
mean 75500000 5243333 3.054000 3758667 1.198687
std 43.445968 0.28063 0.433504 1764420 0.768161
min 1.000000 4300000 2.000000 1.000000 0.100000
25% — 38,250000 5.100000 2.800000 1.600000 0.300000
50% 75.5000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.200000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
In [8]: 1 iris_data.isnull().sum()
out[s]: 1d
SepalLengthcn
Sepalwidthcn
PetalLengthcn
Petalwidthcm
Species
dtype: intea
5. Feature Engineering
What is a feature and why we need the engineering of it? Basically, all machine leaning
algorithms use some input data to create outputs. This input data comprise features, which are
usually in the form of structured columns. Algorithms require features with some specific
characteristic to work properly. Here, the need for feature engineering arises.
I think feature engineering efforts mainly have two goals:
Iocahost 8888inotebooks/Desktop/Final Projects Flower Projact Task 1 ipynb#Submitted By-Mr-Omkar-Balwant Jadhav7113728, 1106 AM
In [9]:
In [12]:
out [12]:
7 .
~
.
6 oY
5
° a
5 oot
E - ‘
6 °
ea is
5 .
es a]
£ °
é; .
2
L
ris-setosa Iris-versicolor Iris-virginica
Species
Iocahost 8888inotebooks/Desktop/Final Projects Flower Projact Task 1 ipynb#Submitted By-Mr-Omkar-Balwant Jadhav eno7113728, 1106 AM lis Flower Project Task + - Jupyter Notebook
In [15]: 1. sns.stripplot(data=iris_data, x='Species’, y="PetalwidthCm')
Out[15]:
25 ee
oo
2.0
E
gus
2
§
210
0s .
ewe
oe
a
0.0
lris-setosa Iris-versicolor lris-virginica
Species
7. Machine Learning Stages
Stage 1: Importing Libraries and Loading the
Dataset
In [17]: from sklearn.model_selection import train_test_split
1
2
3 # Load the dataset
4 data = pd.read_csv("C:\\Users\\Onkar\Desktop\\Iris.csv")
5
6
7
8
# Split the dataset into features (x) and Labels (y)
x
y
data.drop(‘Species', axis=1)
data[ 'Species"]
10 # Split the dataset into training and testing sets
11 X.train, Xtest, y_train, y test = train_test_split(x, y, test_size=0.2, |
Stage 2: Data Preprocessing
Iocahost 8888inotebooks/Desktop/Final Projects Flower Projact Task 1 ipynb#Submitted By-Mr-Omkar-Balwant Jadhav
m07113728, 1106 AM lis Flower Project Task + - Jupyter Notebook
In [18]: from sklearn. preprocessing import StandardScaler
# Scale the features using Standardscaler
Standardscaler()
( train_scaled = scaler. fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Stage 3: Model Training
In [19]: from sklearn.svm import SVC
# Create an SVM classifier
classifier = SVC()
# Train the classifier on the scaled training data
classifier.fit(x_train_scaled, y_train)
out[19]: sve()
Stage 4: Model Evaluation
In [20]: from sklearn.metrics import accuracy_score
# Predict the Labels for the test set
y_pred = classifier.predict(x_test_scaled)
# Calculate the accuracy of the classifier
accuracy = accuracy_score(y test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 1.0
Stage 5: Hyperparameter Tuning
Iocahost 8888inotebooks/Desktop/Final Projects Flower Projact Task 1 ipynb#Submitted By-Mr-Omkar-Balwant Jadhav ano7113728, 1106 AM
In [21]:
out(21]:
In [22]:
In [24]:
SVC(C=1, ganma=@.1, kerne:
lis Flower Project Task + - Jupyter Notebook
from sklearn.model_selection import GridSearchcv
# Define the hyperparameters to tune
param_grid = {'C': [@.1, 1, 1@, 100], ‘ganma
[@.1, 1, 10, 100], ‘kernel
# Create a GridSearchCV object and fit it to the training data
grid_search = GridSearchcV(classifier, param_grid, cv=5)
grid_search.fit(X_train_scaled, y train)
# Get the best hyperparameters and retrain the classifier
best_params = grid_search.best_params_
classifier = SVC(**best_params)
classifier.fit(x_train_scaled, y_train)
“Linear*)
Stage 6: Feature Selection (Optional)
from sklearn.feature_selection import SelectkBest, f_classif
# Perform feature selection using ANOVA F-value
selector = SelectkBest(f_classif, k=3)
X_train_selected = selector. fit_transform(X_train_scaled, y_train)
X_test_selected = selector.transform(x_test_scaled)
Stage 7: Model Training with Selected Features
(Optional)
ae
# Retrain the classifier on the selected features
classifier. fit(x_train_selected, y_train)
# Predict the Labels for the test set with selected features
y_pred_selected = classifier.predict(x_test_selected)
# Calculate the accuracy with selected features
accuracy_selected = accuracy_score(y_test, y_pred_selected)
print("Accuracy with selected features:”, accuracy_selected)
Accuracy with selected features: 1.0
Stage 8: Final Predictions
Iocahost 8888inotebooks/Desktop/Final Projects Flower Projact Task 1 ipynb#Submitted By-Mr-Omkar-Balwant Jadhav 907113728, 1106 AM
In (28):
lis Flower Project Task + - Jupyter Notebook
# Scale the entire dataset
X_scaled = scaler. transform(x)
1
2
3
4 # Retrain the classifier on the entire dataset
5 classifier.fit(x_scaled, y)
6
7
8
3
# Make predictions on new data
new_data = pd.DataFrame({'SepallengthCn': [5.2, 6.1, 4.9],
(3.1, 2.8, 3.5],
ae "PetalLengthcm': [1.7, 4.7, 1.5],
a "PetalWidthcm': [@.5, 1.6, @.4]})
12 new_data_scaled = scaler. transform(new_data)
13 predictions = classifier.predict(new_data_scaled)
14. print("Predictions:", predictions)
"SepalWidthcm
Predictions: [‘Iris-setosa' ‘Iris-versicolor’ ‘Iris-setosa"]
Conclusion
In this project, we successfully performed classification on the Iris flower dataset using machine
learning techniques. The trained classifier achieved a high accuracy score, indicating its
‘effectiveness in predicting the species of Iris flowers based on their measurements. The project
demonstrates the steps involved in solving a classification problem and provides insights into
feature analysis, data preprocessing, model training, and evaluation.
Iocahost 8888inotebooks/Desktop/Final Projects Flower Projact Task 1 ipynb#Submitted By-Mr-Omkar-Balwant Jadhav
s010