0% found this document useful (0 votes)

4 views

ML Assignment 6

The document discusses the challenges of using Random Forest for classification or regression tasks with a dataset containing both numerical and categorical features. It proposes strategies for encoding categorical variables using One-Hot Encoding and handling missing data through imputation methods to enhance model performance and interpretability. The implementation includes loading the dataset, preprocessing, training the model, and evaluating its accuracy and mean squared error.

Uploaded by

anuj rawat

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

ML Assignment 6

Uploaded by

anuj rawat

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

abeluwlee

January 3, 2025

Given a dataset Customer.csv with a mix of numerical and categorical features, discuss the chal-
lenges and considerations in using Random Forest for classification or regression tasks.
Propose strategies for encoding categorical variables and handling missing data to improve model
performance and interpretability.
You are being provided with a meta data also please read it before doing implementation.
[5]: import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score, mean_squared_error

[6]: # Load the dataset

url = "https://itv-contentbucket.s3.ap-south-1.amazonaws.com/Exams/ML/EDA/
↪customers.csv"

data = pd.read_csv(url)

# Display the first few rows

print(data.head())

Channel Region Fresh Milk Grocery Frozen Detergents_Paper \

0 2 3 12669 9656 7561 214 2674
1 2 3 7057 9810 9568 1762 3293
2 2 3 6353 8808 7684 2405 3516
3 1 3 13265 1196 4221 6404 507
4 2 3 22615 5410 7198 3915 1777

Delicatessen
0 1338
1 1776
2 7844
3 1788
4 5185

[11]: data.columns

1
[11]: Index(['Channel', 'Region', 'Fresh', 'Milk', 'Grocery', 'Frozen',
'Detergents_Paper', 'Delicatessen'],
dtype='object')

[19]: # Imputation for missing numerical values

numerical_features = ['Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper',␣
↪'Delicatessen']

numerical_imputer = SimpleImputer(strategy='mean')
data[numerical_features] = numerical_imputer.
↪fit_transform(data[numerical_features])

# Imputation for missing categorical values

categorical_features = ['Channel_2.0', 'Region_2.0']
categorical_imputer = SimpleImputer(strategy='most_frequent')
data[categorical_features] = pd.DataFrame(categorical_imputer.
↪fit_transform(data[categorical_features]), columns=categorical_features)

[20]: # One-Hot Encoding

one_hot_encoder = OneHotEncoder(sparse_output=False, drop='first')
encoded_data = pd.DataFrame(one_hot_encoder.
↪fit_transform(data[categorical_features]), columns=one_hot_encoder.

↪get_feature_names_out(categorical_features))

# Add the one-hot encoded variables back to the dataset and drop original␣
↪categorical columns

data = data.drop(categorical_features, axis=1)

data = pd.concat([data, encoded_data], axis=1)

[22]: # Confirm the column names

print(data.columns)

# Select 'Region' as the target variable (if 'Channel' is not present)

target_variable = 'Region_3.0'
X = data.drop(target_variable, axis=1)
y = data[target_variable]

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

Index(['Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper',

'Delicatessen', 'Region_3.0', 'Channel_2.0_1.0', 'Region_2.0_1.0'],
dtype='object')

[23]: # Initialize and train the model

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

2
# Make predictions
y_pred = rf_classifier.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.90

[24]: # Initialize and train the model

rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train, y_train)

# Make predictions
y_pred = rf_regressor.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

Mean Squared Error: 0.11

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (65)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
1001 Songs
71% (69)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
PDMS Piping Tutorial
No ratings yet
PDMS Piping Tutorial
1 page
ML Assignment 7
No ratings yet
ML Assignment 7
3 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
Correction
No ratings yet
Correction
3 pages
saurabh
No ratings yet
saurabh
22 pages
Ml Solution
No ratings yet
Ml Solution
60 pages
Machine Failure Prediction
No ratings yet
Machine Failure Prediction
11 pages
AIML 01 Merged
No ratings yet
AIML 01 Merged
25 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
COMPARISON - Jupyter Notebook
No ratings yet
COMPARISON - Jupyter Notebook
5 pages
Factor Backtest
No ratings yet
Factor Backtest
13 pages
Data analytics assignment solutions
No ratings yet
Data analytics assignment solutions
20 pages
Python Deep Learning Lab Programs (2)
No ratings yet
Python Deep Learning Lab Programs (2)
35 pages
EXP-2
No ratings yet
EXP-2
6 pages
New Chat: 1. Predicting Uber Ride Prices
No ratings yet
New Chat: 1. Predicting Uber Ride Prices
16 pages
LSTM - Jupyter Notebook
No ratings yet
LSTM - Jupyter Notebook
7 pages
Labpractice 2
100% (2)
Labpractice 2
29 pages
P3) Code Neural Networks
No ratings yet
P3) Code Neural Networks
3 pages
ML Assignment 8
No ratings yet
ML Assignment 8
2 pages
DA Lab 1-7
No ratings yet
DA Lab 1-7
26 pages
linear-regression
No ratings yet
linear-regression
8 pages
ml exp-5,6 (1)[1] (1)
No ratings yet
ml exp-5,6 (1)[1] (1)
6 pages
R Assignment
No ratings yet
R Assignment
8 pages
ML assignment
No ratings yet
ML assignment
11 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Untitled 10
No ratings yet
Untitled 10
6 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
Tsne On Credit Card
No ratings yet
Tsne On Credit Card
9 pages
Ds Paper
No ratings yet
Ds Paper
35 pages
DS-Food
No ratings yet
DS-Food
18 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
ML Assignment
No ratings yet
ML Assignment
34 pages
Utkarsh Da 5 Final
No ratings yet
Utkarsh Da 5 Final
13 pages
ML1
No ratings yet
ML1
6 pages
210430_PracticalWeek03a
No ratings yet
210430_PracticalWeek03a
1 page
ML Manual Final
No ratings yet
ML Manual Final
35 pages
Lab Assignment 2
No ratings yet
Lab Assignment 2
1 page
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
ml lab
No ratings yet
ml lab
14 pages
83 Sklearn Pipeline
No ratings yet
83 Sklearn Pipeline
8 pages
PR
No ratings yet
PR
17 pages
Report On - Social Media Research Topic Modeling
No ratings yet
Report On - Social Media Research Topic Modeling
26 pages
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
No ratings yet
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
16 pages
LeNet 5 CNN Architecture.ipynb+ +Colab
No ratings yet
LeNet 5 CNN Architecture.ipynb+ +Colab
2 pages
Machine Learning with PySpark and MLlib — Solving a Binary Classification Problem _ by Susan Li _ Towards Data Science
No ratings yet
Machine Learning with PySpark and MLlib — Solving a Binary Classification Problem _ by Susan Li _ Towards Data Science
10 pages
SHASHANK ML.docx
No ratings yet
SHASHANK ML.docx
23 pages
House Pricing
No ratings yet
House Pricing
15 pages
BIDA practical print
No ratings yet
BIDA practical print
56 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
3_Modeling.ipynb - Colaboratory
No ratings yet
3_Modeling.ipynb - Colaboratory
31 pages
Content: From Import Import As Import Import Import As
No ratings yet
Content: From Import Import As Import Import Import As
8 pages
ccs355 Lab Manual
No ratings yet
ccs355 Lab Manual
24 pages
DTE-2 R Language Paper
No ratings yet
DTE-2 R Language Paper
8 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
A Rational Function Is One That Can Be Written As A Polynomial
100% (1)
A Rational Function Is One That Can Be Written As A Polynomial
20 pages
Final Anniversary 2020 2021 Evaluation For Anush Jain
No ratings yet
Final Anniversary 2020 2021 Evaluation For Anush Jain
21 pages
Aditi Agarwal - An Expert Guide To Problem-Solving - With Practical Examples (Learn Brainstorming, Fishbone, SWOT, FMEA, 5whys + 6 More) - Aditi Agarwal Books LLC (2016)
No ratings yet
Aditi Agarwal - An Expert Guide To Problem-Solving - With Practical Examples (Learn Brainstorming, Fishbone, SWOT, FMEA, 5whys + 6 More) - Aditi Agarwal Books LLC (2016)
55 pages
Olam Information Security Policy
No ratings yet
Olam Information Security Policy
34 pages
3.4.6 Lab - Configure Vlans and Trunking Answers
No ratings yet
3.4.6 Lab - Configure Vlans and Trunking Answers
4 pages
Human Factors Handbook: Kementerian Perhubungan Republik Indonesia
No ratings yet
Human Factors Handbook: Kementerian Perhubungan Republik Indonesia
530 pages
COA GATE
No ratings yet
COA GATE
45 pages
Aircraft Electricity and Electronics, Sixth Edition (FREE)
50% (2)
Aircraft Electricity and Electronics, Sixth Edition (FREE)
5 pages
Detecting Fake News by RNN-based Gatekeeping Behavior Model on Social Networks
No ratings yet
Detecting Fake News by RNN-based Gatekeeping Behavior Model on Social Networks
13 pages
Bank Question Test
No ratings yet
Bank Question Test
81 pages
IBM Training Front Cover Parte 1
100% (1)
IBM Training Front Cover Parte 1
31 pages
Elementary English For College Elementary English Tutorial Grammar A An The Indefinite
No ratings yet
Elementary English For College Elementary English Tutorial Grammar A An The Indefinite
29 pages
6th Merit List BS Artificial Intelligence Group A Department of Artificial Intelligence BAHAWALPUR BWP Merit Fall 2024 Fall 2024
No ratings yet
6th Merit List BS Artificial Intelligence Group A Department of Artificial Intelligence BAHAWALPUR BWP Merit Fall 2024 Fall 2024
6 pages
Literature Review Organization Template
100% (2)
Literature Review Organization Template
5 pages
Dlink Catalogo Junio-14-02-0000
No ratings yet
Dlink Catalogo Junio-14-02-0000
5 pages
Gmail - CLINT FMT FACEBOOK
No ratings yet
Gmail - CLINT FMT FACEBOOK
2 pages
Dinesh Kamani Full Times Resume
No ratings yet
Dinesh Kamani Full Times Resume
1 page
Unit 4
No ratings yet
Unit 4
16 pages
Literature Review On Hospital Billing System
100% (2)
Literature Review On Hospital Billing System
6 pages
dc2 MCQ
No ratings yet
dc2 MCQ
7 pages
SAP-C_ARP2P_19Q4
No ratings yet
SAP-C_ARP2P_19Q4
27 pages
1.2 Compare & Contrast TCP and UDP Protocol
No ratings yet
1.2 Compare & Contrast TCP and UDP Protocol
5 pages
Answers To Exercises For Chapter 5 Integrals
No ratings yet
Answers To Exercises For Chapter 5 Integrals
4 pages
Et Final HND This Is Emerging Technology Assignment Guide On To Get The Distinction
No ratings yet
Et Final HND This Is Emerging Technology Assignment Guide On To Get The Distinction
82 pages
EN-C160-2A Manual
No ratings yet
EN-C160-2A Manual
22 pages
Cisco VCEup - 300-410 24-June-2022 158q
No ratings yet
Cisco VCEup - 300-410 24-June-2022 158q
112 pages
CCC2143 Lab 3
No ratings yet
CCC2143 Lab 3
3 pages
Prima Comm Guide Line (Ignore CCTV)
No ratings yet
Prima Comm Guide Line (Ignore CCTV)
30 pages
UBITX V6 Main
No ratings yet
UBITX V6 Main
15 pages

ML Assignment 6

Uploaded by

ML Assignment 6

Uploaded by

abeluwlee

[6]: # Load the dataset

# Display the first few rows

Channel Region Fresh Milk Grocery Frozen Detergents_Paper \

[19]: # Imputation for missing numerical values

# Imputation for missing categorical values

[20]: # One-Hot Encoding

data = data.drop(categorical_features, axis=1)

[22]: # Confirm the column names

# Select 'Region' as the target variable (if 'Channel' is not present)

# Split the data into training and testing sets

Index(['Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper',

[23]: # Initialize and train the model

# Evaluate the model

[24]: # Initialize and train the model

# Evaluate the model

Mean Squared Error: 0.11

You might also like