Implementation of Linear Regression With Python
Implementation of Linear Regression With Python
Objectives:
• To understand the basics of linear regression
• To implement the linear regression in real life problem solving
Theory:
Linear regression is a supervised learning approach that uses one or more input features to predict
a continuous value (or target). It is assumed that the input characteristics and the target variable
have a linear relationship. Linear regression seeks the best-fitting line through the data points that
minimizes the sum of the squared residuals. A simple linear regression model's equation is as
follows:
y = mx + b
where y denotes the goal variable, x denotes the input feature, m denotes the line's slope, and b
denotes the y-intercept. The goal of linear regression is to find the best-fit line that represents the
relationship between the variables. The best-fit line is the one that minimizes the sum of the
squared differences between the predicted values and the actual values of the dependent variable.
To obtain the best fitting line, we must first determine the m and b values that minimize the sum
of the squared residuals between the predicted and actual values. This can be accomplished by
employing a cost function, which computes the difference between expected and actual values.
The mean squared error (MSE) is the most often used cost function for linear regression, and it is
calculated as:
The formula for MSE is:
MSE = 1/n * Σ (yi - ŷi)2
where n is the number of observations, yi is the actual value of the dependent variable for the ith
observation, and ŷi is the predicted value of the dependent variable for the ith observation. By
minimizing the MSE, we can ensure that the line fits the data as closely as possible, which is the
main goal of linear regression.
Linear regression has numerous real-world applications. Here are a couple of such examples:
▪ Sales Forecasting: Using past sales data, marketing expenditure, and other relevant
variables, linear regression can be used to forecast future sales. This enables businesses to
make more informed decisions about production, inventory, and marketing.
▪ Stock Price Prediction: Linear regression can be used to forecast future stock values based
on historical trends and pertinent financial data. This can assist investors in making more
informed investing selections.
▪ Medical Research: Linear regression can be used to explore the link between variables in
medical research, such as the relationship between age and blood pressure or the
relationship between body mass index (BMI) and the chance of getting particular diseases.
▪ Real Estate: Linear regression can be used to forecast the price of a house based on
characteristics such as location, square footage, number of bedrooms and bathrooms, and
other pertinent parameters.
▪ Sports analytics: Based on factors like team performance, player statistics, and weather
conditions, linear regression can be used to evaluate sports data and forecast the results of
games or tournaments.
Overall, linear regression is a powerful tool that may be used to solve a wide variety of real-world
problems in a variety of sectors.
Problem generation:
Effect of deforestation on global warming with linear regression (without using any library)
Code:
import numpy as np
# Define the input data (deforestation) and target values (global warming)
# Percentage of deforestation
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
y = np.array([0.5, 0.7, 0.8, 1.0, 1.3, 1.4, 1.6, 1.7, 1.8, 2.0]).reshape(-1, 1)
class LinearRegression:
self.learning_rate = learning_rate
self.iterations = iterations
self.y = y
self.costs = []
for i in range(self.iterations):
self.costs.append(cost)
def plot_cost(self):
plt.plot(self.costs)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()
lr = LinearRegression()
lr.fit(X, y)
y_pred = lr.predict(X_new)
print(y_pred)
# Plot the cost function over the course of training
lr.plot_cost()
plt.scatter(X, y)
plt.xlabel('Percentage of Deforestation')
plt.show()
Output:
[[2.22333566]
[2.39638715]
[2.56943863]
[2.74249012]
[2.91554161]]
Discussion:
This code defines a linear regression class with methods for fitting the model, making predictions,
and plotting the cost function and regression line. It uses the input data X (percentage of
deforestation) and target values y (global warming in degrees Celsius) to train the linear regression
model with gradient descent. The fit method takes in the input data X and target values y and
initializes the necessary variables for training the model. It then iteratively updates the model's
parameters (theta) by calculating the gradient of the cost function with respect to the parameters
and using gradient descent to update the parameters. The prediction method takes in new input
data X and uses the trained model parameters to make predictions on the target values. The
plot_cost method plots the cost function over the course of training, which shows how the cost
decreases as the model gets better at making predictions. The main part of the code creates an
instance of the ‘LinearRegression’ class and trains it on the input data X and target values y. It then
uses the trained model to make predictions on new input data X_new and plots the cost function
and regression line on the original data. Overall, this code demonstrates how to implement a simple
linear regression model with gradient descent in Python without using any external libraries.
Conclusion:
The relationship between deforestation vs. global warming might be non-linear sometimes with
respect to different natural calamities also. But here the standard linear relationship was portrayed
to show the application of linear regression in real-life problem-solving.