0% found this document useful (0 votes)

89 views

Simple Linear Regression - Assign2

A food delivery service recorded delivery time and time taken for orders to be sorted to improve services. A simple linear regression model was built with delivery time as the target variable. Log, exponential, and polynomial transformations were applied and RMSE and correlation values were recorded for each. The log transformation model had the lowest RMSE and was selected as best. The final model was fitted on training and test split data, and the final RMSE value was reported.

Uploaded by

Sravani Adapa

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

Simple Linear Regression - Assign2

Uploaded by

Sravani Adapa

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Simple Linear Regression With scikit-learn

There are five basic steps when you’re implementing linear regression:

1. Import the packages and classes you need.

2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
5. Apply the model for predictions.

These steps are more or less general for most of the regression approaches and implementations.

Problem Statement: -

A food delivery service recorded the data of delivery time taken and the time taken for the
deliveries to be sorted by the restaurants in order to improve their delivery services.
Approach – A Simple Linear regression model needs to be built with target variable
‘Delivery.Time’. Apply necessary transformations and record the RMSE values, Correlation
coefficient values for different transformation models.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:

import numpy as np
from sklearn.linear_model import LinearRegression
Now, you have all the functionalities you need to implement linear regression.

The fundamental data type of NumPy is the array type called numpy.ndarray. The rest of this article
uses the term array to refer to instances of the type numpy.ndarray.

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

regression and make predictions accordingly.

Step 2: Provide data

The second step is defining data to work with. The inputs (regressors, 𝑥) and output (predictor, 𝑦).
calories_consumed.csv is imported .

Exploratory data analysis is performed on data

Step 3: Create a model and fit it

The next step is to create a linear regression model and fit it using the existing data.

Let’s create an instance of the class LinearRegression, which will represent the regression model:

Simple linear regression

model = LinearRegression()
This statement creates the variable model as the instance of LinearRegression. You can provide several
optional parameters to LinearRegression

statsmodels.formula.api is imported to build a model based on ols of data

model1=smf.ols('calories ~ weight',data=cal_data).fit()

Regression line is plotted after obtaining predicted values

after plotting scattered plot root mean squared error is calculated

In order to reduce the errors and to obtain best fit line Transformation is performed on data

Log transformation

In exponential transformation, transformation is applied on y data

#x=log(sort_time),y=time

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model2 is built on obtained data

new regression line is plotted

new rmse is calculated

Exponential transformation

In exponential transformation, transformation is applied on y data

#x=(sort_time),y=log(time)

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model3 is built on obtained data

new regression line is plotted

new rmse is calculated

Polynomial transformation

x=sort_time ,x^2=sort_time*sort_time, y=log(time)

from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression

new regression line

from the above regressive model the rmse is obtained

choose the best model by using all RMSE values of above transformations

models with respective RMS values are tabulated

from the above observations log model is taken as best

Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works
satisfactorily and interpret it.

the summary of final model is

final model is fitted on train and test split data and prediction is observed

the final rmse value is

James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
100% (1)
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
789 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Module 2 Data Types, Operators, Variables Assignment
No ratings yet
Module 2 Data Types, Operators, Variables Assignment
4 pages
Anova Ancova Manova Mancova
100% (3)
Anova Ancova Manova Mancova
1 page
Coursework Assignment Summer
No ratings yet
Coursework Assignment Summer
7 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Network Analytics - Problem Statement
No ratings yet
Network Analytics - Problem Statement
4 pages
Q1 Answer 1: Module 6-Assignment - Power Bi
No ratings yet
Q1 Answer 1: Module 6-Assignment - Power Bi
5 pages
Basic Statistics (Module - 3)
No ratings yet
Basic Statistics (Module - 3)
9 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Python Assignment 1 A
No ratings yet
Python Assignment 1 A
2 pages
K Mean Clustering 1
100% (1)
K Mean Clustering 1
12 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Mathematical Foundation
100% (1)
Mathematical Foundation
6 pages
Day17 Association Rules
No ratings yet
Day17 Association Rules
4 pages
Day13-K-Means Clustering
No ratings yet
Day13-K-Means Clustering
10 pages
15 KNN - Problem Statement
0% (2)
15 KNN - Problem Statement
3 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Assignment Module 6
No ratings yet
Assignment Module 6
2 pages
Radhika PCA - Problem Statement
No ratings yet
Radhika PCA - Problem Statement
3 pages
Problem Statement - Mathematical Foundations
No ratings yet
Problem Statement - Mathematical Foundations
2 pages
7 K-Means Clustering
0% (1)
7 K-Means Clustering
4 pages
Python For Data Analytics
No ratings yet
Python For Data Analytics
3 pages
Day13 K Means Clustering
No ratings yet
Day13 K Means Clustering
4 pages
Support Vector Machines Problem Statement
No ratings yet
Support Vector Machines Problem Statement
27 pages
Day12 Hierarchical Clustering
No ratings yet
Day12 Hierarchical Clustering
9 pages
DS+C25 PGDDS+Masters
No ratings yet
DS+C25 PGDDS+Masters
13 pages
Day10 Mathematical Foundations
No ratings yet
Day10 Mathematical Foundations
4 pages
Advanced Certification in Data Science and Artificial Intelligence
No ratings yet
Advanced Certification in Data Science and Artificial Intelligence
18 pages
Multinomial Problem Statement
No ratings yet
Multinomial Problem Statement
28 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
Data Analytics 360digitmg
No ratings yet
Data Analytics 360digitmg
10 pages
Text Mining Problem Statement
100% (1)
Text Mining Problem Statement
3 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
Spam News Detection Report
No ratings yet
Spam News Detection Report
9 pages
Lab 6 - Naive Bayesian Classification Exercises
No ratings yet
Lab 6 - Naive Bayesian Classification Exercises
9 pages
Pages From SAS - A00-240
No ratings yet
Pages From SAS - A00-240
15 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Lead Score Case Study Presentation
No ratings yet
Lead Score Case Study Presentation
16 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Assignment 1 Word
No ratings yet
Assignment 1 Word
24 pages
100 SQL Formulas Each Student Should Know
No ratings yet
100 SQL Formulas Each Student Should Know
10 pages
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
CRISP DM Business Aissgnment
No ratings yet
CRISP DM Business Aissgnment
18 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Counting Sort
No ratings yet
Counting Sort
24 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Final Project - Regression Models
100% (1)
Final Project - Regression Models
35 pages
R - Assignment
No ratings yet
R - Assignment
2 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Answer 1722791857 NLP and Classification Practical MCQ 4991
No ratings yet
Answer 1722791857 NLP and Classification Practical MCQ 4991
26 pages
01.CRISP DM Business Understanding
No ratings yet
01.CRISP DM Business Understanding
10 pages
Problem Set 2 Topics: Sampling Distributions and Central Limit Theorem
100% (1)
Problem Set 2 Topics: Sampling Distributions and Central Limit Theorem
4 pages
Tutorial 2018 Optimization
No ratings yet
Tutorial 2018 Optimization
7 pages
X Education - Lead Scoring Case Study
No ratings yet
X Education - Lead Scoring Case Study
24 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
8 pages
Artificial Intelligence Certification
No ratings yet
Artificial Intelligence Certification
8 pages
PLS Algorithm
No ratings yet
PLS Algorithm
87 pages
Group 1 ECON6006 Financial Econometrics Assignment 2 Submission
No ratings yet
Group 1 ECON6006 Financial Econometrics Assignment 2 Submission
20 pages
Problem Statement - Graded Project: Variable Details
0% (1)
Problem Statement - Graded Project: Variable Details
3 pages
Cambridge Stats Table
No ratings yet
Cambridge Stats Table
15 pages
T - KEYS To Pastexam1
No ratings yet
T - KEYS To Pastexam1
13 pages
Lec 9 Linear Correlation and Linear Regression
No ratings yet
Lec 9 Linear Correlation and Linear Regression
71 pages
PDB UNIT ROOT TES LEVEL-1
No ratings yet
PDB UNIT ROOT TES LEVEL-1
9 pages
ch 14 .....
No ratings yet
ch 14 .....
36 pages
Practice Final
No ratings yet
Practice Final
18 pages
SPSS Problems Solved
100% (2)
SPSS Problems Solved
15 pages
Machine Learning - Random Forest
No ratings yet
Machine Learning - Random Forest
6 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
4.practice Assignment 4.1 - Not Graded
No ratings yet
4.practice Assignment 4.1 - Not Graded
7 pages
Latent Class Análysis
No ratings yet
Latent Class Análysis
33 pages
Exploratory Factor Analysis Concepts and Theory
No ratings yet
Exploratory Factor Analysis Concepts and Theory
9 pages
Practice Final Exam, STATS 401 W18
No ratings yet
Practice Final Exam, STATS 401 W18
9 pages
Chapter Three: Estimation of Multiple Linear Regression Model
No ratings yet
Chapter Three: Estimation of Multiple Linear Regression Model
18 pages
CF 10e Chapter 11 Excel Master Student
No ratings yet
CF 10e Chapter 11 Excel Master Student
32 pages
DSA5102_lecture3
No ratings yet
DSA5102_lecture3
34 pages
Curve Fitting Dan Optimalisasi: Komputasi Geofisika Minggu Ke - 5
No ratings yet
Curve Fitting Dan Optimalisasi: Komputasi Geofisika Minggu Ke - 5
32 pages
2024 L2 QuantMethods
No ratings yet
2024 L2 QuantMethods
57 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Quant Interview Cheat Sheet
No ratings yet
Quant Interview Cheat Sheet
13 pages
Regression Exercise PDF
No ratings yet
Regression Exercise PDF
2 pages
machine learning with python
No ratings yet
machine learning with python
33 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages