Project-1 (Data Preprocessing)
Project-1 (Data Preprocessing)
2)df.shape
3)df.describe()
4)df.corr()
Replace:- In order to train this Python model, we need the values of our target
output to be 0 & 1. So, we'll replace values in the Floods column (YES, NO)
with (1, 0) respectively
df['FLOODS'].replace(['YES', 'NO'], [1,0], inplace=True)
df.head(5)
df.isnull().mean().sort_values(ascending=False) * 100
corr:- To identifying the correlation between the data points using heat map
corr df.corr()
Now we create data frames for the features and the score of each
feature:
df_scores= pd.DataFrame(fit.scores_)
df_columns= pd.DataFrame(X.columns)
Finally, we’ll combine all the features and their corresponding scores in
one data frame:
logreg= LogisticRegression()
logreg.fit(X_train,y_train)
print("MAE", mean_absolute_error(y_test,y_pred)
print("MSE", mean_squared_error(y_test,y_pred)
PAGE NO: 03
GIET UNIVERSITY, GUNUPUR
SCHOOL OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF CSE (AIML)
• R Squared (R2)
r2 = r2_score(y_test, y_pred)
print(r2)
Classification Report:-
A classification report is a performance evaluation report that is used
to evaluate the performance of machine learning models by the
following 5 criteria:
ROC Curve:-
From the ROC curve, we can calculate the area under the curve (AUC) whose
value ranges from 0 to 1. You’ll remember that the closer to 1, the better it is for
our predictive modeling.