0% found this document useful (0 votes)

30 views

Project-1 (Data Preprocessing)

Uploaded by

Arijeet ros

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Project-1 (Data Preprocessing)

Uploaded by

Arijeet ros

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

GIET UNIVERSITY, GUNUPUR

SCHOOL OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF CSE (AIML)

Step 1: Import Python Libraries:-

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

Step 2: Read the Dataset:-

df= pd.read_csv('kerala.csv')
df.head(5)

Step 3: Explore the Dataset:-

1)df.info()

2)df.shape

3)df.describe()

4)df.corr()

Replace:- In order to train this Python model, we need the values of our target
output to be 0 & 1. So, we'll replace values in the Floods column (YES, NO)
with (1, 0) respectively
df['FLOODS'].replace(['YES', 'NO'], [1,0], inplace=True)
df.head(5)

null values:- To find the null values In the dataset

df.isnull().mean().sort_values(ascending=False) * 100

corr:- To identifying the correlation between the data points using heat map

NAME OF THE STUDENT: Arijeet Mishra ROLL NO – 21CSEAIML008

PAGE NO: 01
GIET UNIVERSITY, GUNUPUR
SCHOOL OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF CSE (AIML)

corr df.corr()

sns.heatmap(corr, xticklabels corr.columns, yticklabels

corr.columns)

Step 3: Feature Selection:-

Start by importing the Select Best library:

from sklearn.feature_selection import SelectKBest

from sklearn.feature_selection import chi2

After, define X & Y:-

X= df.iloc[:,1:14] //for all features

Y= df.iloc[:,-1] //for target output (floods)

Select the top 3 features:-

best_features= SelectKBest(score_func=chi2, k=3)

fit= best_features.fit(X,Y)

Now we create data frames for the features and the score of each
feature:

df_scores= pd.DataFrame(fit.scores_)
df_columns= pd.DataFrame(X.columns)

Finally, we’ll combine all the features and their corresponding scores in
one data frame:

features_scores= pd.concat([df_columns, df_scores], axis=1)

features_scores.columns= ['Features', 'Score']
features_scores.sort_values(by = 'Score')

Step 4: Build the Model:-

X= df[['SEP', 'JUN', 'JUL']] the top 3 features
Y= df[['FLOODS']] the target output

Splitting the dataset into train and test:-

X_train,X_test,y_train,y_test=train_test_split(X,Y,test_siz
e=0.4,random_state=100)

NAME OF THE STUDENT: Arijeet Mishra ROLL NO – 21CSEAIML008

PAGE NO: 02
GIET UNIVERSITY, GUNUPUR
SCHOOL OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF CSE (AIML)

Create a logistic regression body:-

logreg= LogisticRegression()
logreg.fit(X_train,y_train)

we predict the likelihood of a flood using the logistic regression body we

created:-
y_pred=logreg.predict(X_test)
print (X_test) #test dataset
print (y_pred) #predicted values

Step 5: Evaluate the Model’s Performance:-

• 5.1:- Mean Absolute Error(MAE):- MAE is a straightforward metric that calculates

the absolute difference between actual and predicted values. The degree of errors for
predictions and observations is measured using the average absolute errors for the
entire group.

from sklearn.metrics import mean absolute_error

print("MAE", mean_absolute_error(y_test,y_pred)

• 5.2:- Mean Squared Error(MSE)

MSE is a popular and straightforward statistic with a bit of variation in mean

absolute error. The squared difference between the actual and anticipated values
is calculated using mean squared error.

from sklearn.metrics import mean_squared_error

print("MSE", mean_squared_error(y_test,y_pred)

• 5.3:-Root Mean Squared Error(RMSE)

As the term, RMSE implies that it is a straightforward square root of mean

squared error.

NAME OF THE STUDENT: Arijeet Mishra ROLL NO – 21cseaiml008

PAGE NO: 03
GIET UNIVERSITY, GUNUPUR
SCHOOL OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF CSE (AIML)

• R Squared (R2)

The R2 score, also called the coefficient of determination, is one of the

performance evaluation measures for the regression-based machine learning
model. Simply put, it measures how close the target data points are to the fitted
line. As we have shown, MAE and MSE are context-dependent, but the R2 score
is context neutral. So, with the help of R squared, we have a baseline model to
compare to a model that none of the other metrics give

from sklearn.metrics import r2_score

r2 = r2_score(y_test, y_pred)

print(r2)

Classification Report:-
A classification report is a performance evaluation report that is used
to evaluate the performance of machine learning models by the
following 5 criteria:

• Accuracy is a score used to evaluate the model’s performance. The

higher it is, the better.
• Recall measures the model’s ability to correctly predict the true
positive values.
• Precision is the ratio of true positives to the sum of both true and
false positives.
• F-score combines precision and recall into one metric. Ideally, its
value should be closest to 1, the better.
• Support is the number of actual occurrences of each class in the
dataset.

NAME OF THE STUDENT: Arijeet Mishra Roll no – 21cseaiml008

PAGE NO: 04
GIET UNIVERSITY, GUNUPUR
SCHOOL OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF CSE (AIML)

from sklearn import metrics

from sklearn.metrics import classification_report
print(‘Accuracy: ‘,metrics.accuracy_score(y_test, y_pred))
print(‘Recall: ‘,metrics.recall_score(y_test, y_pred,
zero_division=1))
print(“Precision:”,metrics.precision_score(y_test, y_pred,
zero_division=1))
print(“CL Report:”,metrics.classification_report(y_test,
y_pred, zero_division=1))

ROC Curve:-

The receiver operating characteristic (ROC) curve is used to display the

sensitivity and specificity of the logistic regression model by calculating the true
positive and false positive rates.

From the ROC curve, we can calculate the area under the curve (AUC) whose
value ranges from 0 to 1. You’ll remember that the closer to 1, the better it is for
our predictive modeling.

• To determine the ROC curve, first define the metrics:-

y_pred_proba= logreg.predict_proba(X_test) [::,1]

• Then, calculate the true positive and false positive rates:-

false_positive_rate, true_positive_rate, _ =
metrics.roc_curve(y_test, y_pred_proba)

• Next, calculate the AUC to see the model's performance:-

auc= metrics.roc_auc_score(y_test, y_pred_proba)

• Finally, plot the ROC curve:-

plt.plot(false_positive_rate,
true_positive_rate,label="AUC="+str(auc))
plt.title('ROC Curve')
plt.ylabel('True Positive Rate')
plt.xlabel('false Positive Rate')
plt.legend(loc=4)

NAME OF THE STUDENT: Arijeet Mishra roll no – 21AIML008

PAGE NO: 05

Supervised Learning
100% (1)
Supervised Learning
15 pages
(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
E705 User Manual
No ratings yet
E705 User Manual
26 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Types of Separators
No ratings yet
Types of Separators
9 pages
23BCE7092_ML_Lab_Assignment[1]
No ratings yet
23BCE7092_ML_Lab_Assignment[1]
14 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
Assignment 3 - LP1
No ratings yet
Assignment 3 - LP1
13 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
23BCE7199 ML Lab Assignment[1]
No ratings yet
23BCE7199 ML Lab Assignment[1]
15 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML in Python Part-2
No ratings yet
ML in Python Part-2
21 pages
Train
No ratings yet
Train
17 pages
St. John College of Engineering and Management, Palghar - Maharashtra
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
11 pages
Machine Learning Assignment-2
No ratings yet
Machine Learning Assignment-2
7 pages
DA_012307
No ratings yet
DA_012307
8 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
36 pages
Ritesh Mangla ML PracticalFile
No ratings yet
Ritesh Mangla ML PracticalFile
55 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
MLA Manual
No ratings yet
MLA Manual
25 pages
data-analytics-manual lab g.anill kumar
No ratings yet
data-analytics-manual lab g.anill kumar
23 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
No ratings yet
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
4 pages
21CSC305P Ml - Lab Programs 1 -9
No ratings yet
21CSC305P Ml - Lab Programs 1 -9
36 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
Data Analytics Program
No ratings yet
Data Analytics Program
11 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Pandas: Reference Sheet
No ratings yet
Pandas: Reference Sheet
9 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
Rainfall Prediction using Machine Learning
No ratings yet
Rainfall Prediction using Machine Learning
9 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
TP.ipynb - Colab
No ratings yet
TP.ipynb - Colab
6 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Machine Learning Basics 1683717543
No ratings yet
Machine Learning Basics 1683717543
15 pages
Kartik mlp 4-9prg (1)
No ratings yet
Kartik mlp 4-9prg (1)
10 pages
ML Lab Programs (1)
No ratings yet
ML Lab Programs (1)
9 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
DS-Food
No ratings yet
DS-Food
23 pages
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
No ratings yet
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
3 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
3 pages
Data Analysis in Python-3
No ratings yet
Data Analysis in Python-3
4 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Dav practicals
No ratings yet
Dav practicals
33 pages
Worked Examples in Advanced Mechanics of Materials using MATLAB
From Everand
Worked Examples in Advanced Mechanics of Materials using MATLAB
Eric Okoth Ogur
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Sinumerik en
No ratings yet
Sinumerik en
370 pages
Put Call Ratio
No ratings yet
Put Call Ratio
12 pages
Ansys
No ratings yet
Ansys
26 pages
Sequentix P3 OS4-4.5 2
No ratings yet
Sequentix P3 OS4-4.5 2
7 pages
Class 11 Physics Notes Chapter 10 Thermal Properties of Matter
No ratings yet
Class 11 Physics Notes Chapter 10 Thermal Properties of Matter
62 pages
Smart Automated Irrigation System (IOT) : A Seminar Report On
No ratings yet
Smart Automated Irrigation System (IOT) : A Seminar Report On
30 pages
April 2022 Full Math Corrections
No ratings yet
April 2022 Full Math Corrections
26 pages
1U Switching Power Supplies: Installation
No ratings yet
1U Switching Power Supplies: Installation
2 pages
Shapes of Molecules and Molecular Ions
No ratings yet
Shapes of Molecules and Molecular Ions
1 page
Function Exe 1
No ratings yet
Function Exe 1
7 pages
Graphites and Fullerene
No ratings yet
Graphites and Fullerene
9 pages
2-NaCaF3, NaMgF3, NaSrF3 and NaZnF3
No ratings yet
2-NaCaF3, NaMgF3, NaSrF3 and NaZnF3
6 pages
Chapter 5-Multimedia Playback Systems
0% (2)
Chapter 5-Multimedia Playback Systems
33 pages
F Table
No ratings yet
F Table
75 pages
Chapter 12
No ratings yet
Chapter 12
7 pages
Elical 2: CALI-0550 03-3220 2024-08
No ratings yet
Elical 2: CALI-0550 03-3220 2024-08
2 pages
Surface Grinding Report
100% (2)
Surface Grinding Report
12 pages
Phased Array Antenna
No ratings yet
Phased Array Antenna
8 pages
Referat CNC Engleza
No ratings yet
Referat CNC Engleza
12 pages
Create A Project Timeline Template in Excel in 10 Steps
No ratings yet
Create A Project Timeline Template in Excel in 10 Steps
20 pages
DX Diag
No ratings yet
DX Diag
27 pages
High Strength Concrete (M70)
50% (2)
High Strength Concrete (M70)
41 pages
Lecture 2
No ratings yet
Lecture 2
20 pages
Civil Lab Test 1
90% (10)
Civil Lab Test 1
4 pages
Brochure Cleaning Emeia Product Catalog
No ratings yet
Brochure Cleaning Emeia Product Catalog
40 pages
Answers PDF
No ratings yet
Answers PDF
19 pages
[FREE PDF SAMPLE] Mechanics of Materials Brief Edition James M. Gere ebook full chapters
100% (2)
[FREE PDF SAMPLE] Mechanics of Materials Brief Edition James M. Gere ebook full chapters
86 pages
computer class 10 icse project
No ratings yet
computer class 10 icse project
39 pages