Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Assignment B 1 LinearRegression

Uploaded by

Mahesh Kadam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Assignment B 1 LinearRegression

Uploaded by

Mahesh Kadam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

B.E.

(COMP) Sinhgad Institute of Technology, Lonavala LP_III

Name of the Student: __________________________________ Roll No: ___


CLASS: - B. E. [COMP] Division: A, B, C Course: LP-III
Machine Learning
Assignment No. 01
UBER RIDE FARE PREDICTION
Marks: /10

Date of Performance: / /2023


2024 Sign with Date:

Title : Uber ride fare prediction using regression algorithms

Objectives:
• To analyse Uber ride dataset to predict the fare of a ride.
• To compare performance of different regressors.

Outcomes:
• Predict the sales of a store.

PEOs, POs, PSOs and COs satisfied


PEOs: I, III POs: 1, 2, 3, 4, 5 PSOs: 1, 2 COs: 1

Problem Statement:
Predict the price of the Uber ride from a given pickup point to the agreed drop-off location.
Perform following tasks:
1. Pre-process the dataset.
2. Identify outliers.
3. Check the correlation.
4. Implement linear regression and random forest regression models.
5. Evaluate the models and compare their respective scores like R2, RMSE, etc.

Dataset link: https://www.kaggle.com/datasets/yasserh/uber-fares-dataset

Theory:

Linear Regression
In statistics, linear regression is a linear approach to modeling the relationship between a
scalar response (or dependent variable) and one or more explanatory variables (or independent
variables). The case of one explanatory variable is called simple linear regression. For more
than one explanatory variable, the process is called multiple linear regression.

Linear Regression Equation:


Linear regression is a way to model the relationship between two variables. You might also
recognize the equation as the slope formula. The equation has the form Y= a + bX, where Y

1 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

is the dependent variable (that’s the variable that goes on the Y axis), X is the independent
variable (i.e. it is plotted on the X axis), b is the slope of the line and a is the y-intercept.

Simple Linear Regression


Simple or single-variate linear regression is the simplest case of linear regression with a single
independent variable, 𝐱 = 𝑥.
The following figure illustrates simple linear regression:

Simple Linear Regression With scikit-learn


1. Import the packages and classes you need.
2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
5. Apply the model for predictions.
These steps are more or less general for most of the regression approaches and
implementations.

Linear Regression – Implementation using scikit learn

from sklearn.linear_model import LinearRegression


from sklearn.metrics import mean_squared_error

# Cannot use Rank 1 matrix in scikit learn


X = X.reshape((m, 1))
# Creating Model
reg = LinearRegression()
# Fitting training data
reg = reg.fit(X, Y)
# Y Prediction
Y_pred = reg.predict(X)

# Calculating R2 Score
r2_score = reg.score(X, Y)
print(r2_score)

2 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

Decision Tree
Decision tree builds regression or classification models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an associated decision
tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.
A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy),
each representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a
decision on the numerical target. The topmost decision node in a tree which corresponds to the
best predictor called root node. Decision trees can handle both categorical and numerical data.

Decision Tree Regression – Implementation using scikit learn

# import the regressor


from sklearn.tree import DecisionTreeRegressor

# create a regressor object


regressor = DecisionTreeRegressor(random_state = 0)

# fit the regressor with X and Y data


regressor.fit(X, y)

# predicting a new value

# test the output by changing values, like 3750


y_pred = regressor.predict(3750)

# print the predicted price


print("Predicted price: % d\n"% y_pred)

Random Forest Regression

3 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

Random Forest Regression is a supervised learning algorithm that uses ensemble


learning method for regression. Ensemble learning method is a technique that combines
predictions from multiple machine learning algorithms to make a more accurate prediction than
a single model.

The diagram above shows the structure of a Random Forest. You can notice that the trees run
in parallel with no interaction amongst them. A Random Forest operates by constructing several
decision trees during training time and outputting the mean of the classes as the prediction of all
the trees.

Random Forest Regression – Implementation using scikit learn

# import the regressor


from sklearn.ensemble import RandomForestRegressor

# create regressor object


regressor = RandomForestRegressor(n_estimators = 100,
random_state = 0)

# fit the regressor with x and y data


regressor.fit(x, y)

# predicting a new value


Y_pred = regressor.predict(np.array([6.5]).reshape(1, 1))
# test the output by changing values

# print the predicted price


Y_pred

4 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

Conclusion:
Thus we implemented and compared different regressors using PYTHON scikit-learn
library.

A. Write short answer of following questions :


1. What is linear regression?
2. What is pruning in Decision Tree?
3. What is Ensemble Learning?
4. What is Entropy and Information gain in Decision tree algorithm?
5. What is Random Forest? How does it work?

5 | Department of Computer Engineering, SIT, Lonavala

You might also like