Linear Regression: Machine Learning
Linear Regression: Machine Learning
Linear Regression: Machine Learning
Machine Learning
Introduction
For this stage of development, we are trying to implement linear
regression and get to see it work on data.
We are performing this work on MATLAB for easy analysis of data and once
it is structured and tested, we will implement in Python for deployment.
1
For this programming part, we are only implementing the program to implement
linear regression with one variable.
The file rev3data1.txt contains the dataset for our linear regression prob-
lem. The first column is the population of a city and the second column
is the profit of a food truck in that city. A negative value for profit
indicates a loss.
data = load('ex1data1.txt');%readcommaseparateddata
X = data(:, 1); y = data(:, 2);
m = length(y);%numberoftrainingexamples
Next, the script calls the plotData function to create a scatter plot of
the data. Wer job is to complete plotData.m to draw the plot; modify the
file and fill in the following code:
3
25
20
15
Profit in $10,000s
10
−5
4 6 8 10 12 14 16 18 20 22 24
Population of City in 10,000s
hθ(x) = θT x = θ0 + θ1x1
Recall that the parameters of wer model are the θj values. These are
the values we will adjust to minimize cost J (θ). One way to do this is to
use the batch gradient descent algorithm. In batch gradient descent, each
iteration performs the update
4
1 Σ
m (i) (i) (i)
θj := θj − α (hθ (x ) − y )x j (simultaneously update θj for all j).
m i=1
With each step of gradient descent, wer parameters θj come closer to the
optimal values that will achieve the lowest cost J (θ).
1.2.2 Implementation
In rev3.m, we have already set up the data for linear regression. In the
following lines, we add another dimension to our data to accommodate the
θ0 intercept term. We also initialize the initial parameters to 0 and the
learning rate alpha to 0.01.
iterations = 1500;
alpha = 0.01;
5
1.2.4 Gradient descent
Next, we will implement gradient descent in the file gradientDescent.m.
The loop structure has been written for we, and we only need to supply
the updates to θ within each iteration.
As we program, make sure we understand what we are trying to opti-
mize and what is being updated. Keep in mind that the cost J (θ) is parame-
terized by the vector θ, not X and y. That is, we minimize the value of J (θ)
by changing the values of the vector θ, not by changing X or y. Refer to the
equations in this handout and to the video lectures if we are uncertain.
A good way to verify that gradient descent is working correctly is to look
at the value of J (θ) and check that it is decreasing with each step. The
starter code for gradientDescent.m calls computeCost on every iteration
and prints the cost. Assuming we have implemented gradient descent and
computeCost correctly, wer value of J (θ) should never increase, and should
converge to a steady value by the end of the algorithm.
After we are finished, rev3.m will use our final parameters to plot the
linear fit. The result should look something like Figure 2:
Our final values for θ will also be used to make predictions on profits in
areas of 35,000 and 70,000 people. Note the way that the following lines in
rev3.m uses matrix multiplication, rather than explicit summation or
loop- ing, to calculate the predictions. This is an example of code
vectorization in Octave/MATLAB.
2.1 Debugging
Here are some things to keep in mind as we implement gradient descent:
• Octave/MATLAB array indices start from one, not zero. If we’re stor-
ing θ0 and θ1 in a vector called theta, the values will be theta(1) and
theta(2).
• If we are seeing many errors at runtime, inspect wer matrix operations
to make sure that we’re adding and multiplying matrices of compat-
ible dimensions. Printing the dimensions of variables with the size
command will help we debug.
6
25
20
15
Profit in $10,000s
10
Training data
0
Linear regression
−5
4 6 8 10 12 14 16 18 20 22 24
Population of City in 10,000s
7
%initializeJ valstoamatrixof0's
J vals = zeros(length(theta0 vals), length(theta1 vals));
%FilloutJ vals
fori = 1:length(theta0 vals)
forj = 1:length(theta1 vals)
t = [theta0 vals(i); theta1 vals(j)];
J vals(i,j) = computeCost(x, y, t);
end
end
After these lines are executed, we will have a 2-D array of J (θ) values.
The script rev3.m will then use these values to produce surface and
contour plots of J (θ) using the surf and contour commands. The plots
should look something like Figure 3:
3.5
800
3
700
600 2.5
500 2
400
1.5
1
300
200 1
100
0.5
0
4
0
3 10
2 5 −0.5
1 0
0 −5 −1
−1 −10 −10 −8 −6 −4 −2 0 2 4 6 8 10
0
1 0
The purpose of these graphs is to show that how J (θ) varies with
changes in θ0 and θ1. The cost function J (θ) is bowl-shaped and has a global
mininum. (This is easier to see in the contour plot than in the 3D surface
plot). This minimum is the optimal point for θ0 and θ1, and each step of
gradient descent moves closer to this point.
8
9