Assignment B 1 LinearRegression
Assignment B 1 LinearRegression
Objectives:
• To analyse Uber ride dataset to predict the fare of a ride.
• To compare performance of different regressors.
Outcomes:
• Predict the sales of a store.
Problem Statement:
Predict the price of the Uber ride from a given pickup point to the agreed drop-off location.
Perform following tasks:
1. Pre-process the dataset.
2. Identify outliers.
3. Check the correlation.
4. Implement linear regression and random forest regression models.
5. Evaluate the models and compare their respective scores like R2, RMSE, etc.
Theory:
Linear Regression
In statistics, linear regression is a linear approach to modeling the relationship between a
scalar response (or dependent variable) and one or more explanatory variables (or independent
variables). The case of one explanatory variable is called simple linear regression. For more
than one explanatory variable, the process is called multiple linear regression.
is the dependent variable (that’s the variable that goes on the Y axis), X is the independent
variable (i.e. it is plotted on the X axis), b is the slope of the line and a is the y-intercept.
# Calculating R2 Score
r2_score = reg.score(X, Y)
print(r2_score)
Decision Tree
Decision tree builds regression or classification models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an associated decision
tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.
A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy),
each representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a
decision on the numerical target. The topmost decision node in a tree which corresponds to the
best predictor called root node. Decision trees can handle both categorical and numerical data.
The diagram above shows the structure of a Random Forest. You can notice that the trees run
in parallel with no interaction amongst them. A Random Forest operates by constructing several
decision trees during training time and outputting the mean of the classes as the prediction of all
the trees.
Conclusion:
Thus we implemented and compared different regressors using PYTHON scikit-learn
library.