Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

vertopal.com_Untitled57

The document outlines a machine learning workflow using various regression models including Recursive Least Squares, Decision Tree, Random Forest, XGBoost, and an Artificial Neural Network (ANN) to predict a target variable from a dataset. It includes data preprocessing steps such as scaling and train-test splitting, model training with hyperparameter tuning using GridSearchCV, and evaluation of model performance using metrics like MSE, MAE, and R² score. Finally, the results are saved to an Excel file for comparison.

Uploaded by

Akash Layek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

vertopal.com_Untitled57

The document outlines a machine learning workflow using various regression models including Recursive Least Squares, Decision Tree, Random Forest, XGBoost, and an Artificial Neural Network (ANN) to predict a target variable from a dataset. It includes data preprocessing steps such as scaling and train-test splitting, model training with hyperparameter tuning using GridSearchCV, and evaluation of model performance using metrics like MSE, MAE, and R² score. Finally, the results are saved to an Excel file for comparison.

Uploaded by

Akash Layek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

import pandas as pd

import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, mean_absolute_error,
r2_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.preprocessing import StandardScaler

# For ANN
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor

# Load dataset
df = pd.read_csv("Imbalanced_PV_Fault_Dataset.csv")

X = df.drop(columns=["Feature_0", "Fault_Flag"])
y = df["Feature_0"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# ----------- Recursive Least Squares -----------


class RecursiveLeastSquares:
def __init__(self, num_features, lambda_factor=0.99, delta=1.0):
self.num_features = num_features
self.lambda_factor = lambda_factor
self.P = np.eye(num_features) * delta
self.theta = np.zeros(num_features)

def update(self, X, y):


X = np.array(X).reshape(-1, 1)
y = np.array(y)
K = self.P @ X / (self.lambda_factor + X.T @ self.P @ X)
self.theta += (y - X.T @ self.theta) * K.flatten()
self.P = (self.P - K @ X.T @ self.P) / self.lambda_factor

rls = RecursiveLeastSquares(num_features=X.shape[1])
for i in range(len(X_train)):
rls.update(X_train.iloc[i].values, y_train.iloc[i])
y_rls_pred = [np.dot(rls.theta, x) for x in X_test.values]
# ----------- GridSearchCV Models -----------

# Decision Tree
dtr = DecisionTreeRegressor(random_state=42)
dtr_param = {'max_depth': [3, 5, 10, None]}
dtr_grid = GridSearchCV(dtr, dtr_param, cv=5)
dtr_grid.fit(X_train, y_train)
dtr_pred = dtr_grid.best_estimator_.predict(X_test)

# Random Forest
rfr = RandomForestRegressor(random_state=42)
rfr_param = {'n_estimators': [50, 100], 'max_depth': [5, 10, None]}
rfr_grid = GridSearchCV(rfr, rfr_param, cv=5)
rfr_grid.fit(X_train, y_train)
rfr_pred = rfr_grid.best_estimator_.predict(X_test)

# XGBoost
xgb = XGBRegressor(random_state=42, verbosity=0)
xgb_param = {'n_estimators': [50, 100], 'max_depth': [3, 5, 10]}
xgb_grid = GridSearchCV(xgb, xgb_param, cv=5)
xgb_grid.fit(X_train, y_train)
xgb_pred = xgb_grid.best_estimator_.predict(X_test)

# ----------- ANN Model -----------


def build_ann():
model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1],
activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1)) # Output layer
model.compile(optimizer='adam', loss='mse')
return model

ann = KerasRegressor(build_fn=build_ann, epochs=100, batch_size=16,


verbose=0)
ann.fit(X_train_scaled, y_train)
ann_pred = ann.predict(X_test_scaled)

/var/folders/tt/9tcd3n611x1_n7ww1jb91r500000gn/T/
ipykernel_27794/2130103562.py:1: DeprecationWarning: KerasRegressor is
deprecated, use Sci-Keras (https://github.com/adriangb/scikeras)
instead. See https://www.adriangb.com/scikeras/stable/migration.html
for help migrating.
ann = KerasRegressor(build_fn=build_ann, epochs=100, batch_size=16,
verbose=0)

# ----------- Metrics -----------


metrics = {
"Model": ["RLS", "Decision Tree", "Random Forest", "XGBoost",
"ANN"],
"MSE": [
mean_squared_error(y_test, y_rls_pred),
mean_squared_error(y_test, dtr_pred),
mean_squared_error(y_test, rfr_pred),
mean_squared_error(y_test, xgb_pred),
mean_squared_error(y_test, ann_pred),
],
"MAE": [
mean_absolute_error(y_test, y_rls_pred),
mean_absolute_error(y_test, dtr_pred),
mean_absolute_error(y_test, rfr_pred),
mean_absolute_error(y_test, xgb_pred),
mean_absolute_error(y_test, ann_pred),
],
"R² Score": [
r2_score(y_test, y_rls_pred),
r2_score(y_test, dtr_pred),
r2_score(y_test, rfr_pred),
r2_score(y_test, xgb_pred),
r2_score(y_test, ann_pred),
]
}

comparison_df = pd.DataFrame(metrics)
print(comparison_df)

# Optional: print best parameters


print("\nBest Parameters:")
print("DTR:", dtr_grid.best_params_)
print("RFR:", rfr_grid.best_params_)
print("XGB:", xgb_grid.best_params_)

Model MSE MAE R² Score


0 RLS 4.525742e-26 1.691223e-13 1.000000
1 Decision Tree 1.820721e-01 3.104887e-01 0.921079
2 Random Forest 6.800423e-02 1.663037e-01 0.970523
3 XGBoost 4.184689e-02 1.506359e-01 0.981861
4 ANN 2.230879e-04 1.038437e-02 0.999903

Best Parameters:
DTR: {'max_depth': None}
RFR: {'max_depth': None, 'n_estimators': 100}
XGB: {'max_depth': 3, 'n_estimators': 100}

import pandas as pd

# Assuming `comparison_df` already exists from previous code


# and contains columns: "Model", "MSE", "MAE", "R² Score"

# Define output file name


output_file = "comparison_report.xlsx"
# Save to Excel
comparison_df.to_excel(output_file, index=False)

print(f"Comparison report saved as '{output_file}' in your current


working directory.")

Comparison report saved as 'comparison_report.xlsx' in your current


working directory.

You might also like