Linear Regression
Linear Regression
Linear Regression
Assignment No.2
Linear Regression and optimization Functions
1 Introduction
In this exercise, you will implement linear regression and get to see it work on data. Before
starting on this programming exercise, we strongly recommend clearing concepts of linear
regression, gradient descent and feature normalization. To get started with the exercise, you
will need to download the starter code and unzip its contents to the folder. Files incuded
are:
• ex2P1.ipyb: Google Colab Notebook (gradient descent) script that steps you through
the exercise
• ex2P2.ipyb: Google Colab Notebook (Optimization Function) script that steps you
through the exercise
• ex2data1.txt: Dataset Text file for linear regression with one variable
Throughout the exercise, you will be using the scripts ex2P1.ipyb and ex2P2.ipyb. These
scripts set up the dataset for the problems and make calls to functions that you will write.
You do not need to modify either of them. You are only required to modify functions , by
following the instructions in this assignment.
Where to get help?
We also strongly encourage using the online Forums and WhatsApp Group to discuss exer-
cises with other students. However, do not look at any source code written by others
or share your source code with others.
For the first part of this exercise you will be using ex2P1.ipynb script you need to upload
ex2P1.ipynb and ex1data1.txt into your Google drive and open ex2P1 in Google Colab.
Output
Matrix a :
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
1
2 Linear Regression In One variable
Next, the script calls the plotdata(X,Y) function to create a scatter plot of the data.
Your job is to complete plotdata(X,Y) to draw the plot.
Now, when you run the plotdata, our end result should look like Figure 1 Below, with the
same red “x” markers and axis labeled
hθ (x) = θT x = θ0 + θ1 x1 (2)
Recall that the parameters of your model are the θj values. These are the values you
will adjust to minimize cost J(θ).One way to do this is to use the batch gradient descent
algorithm. In batch gradient descent, each iteration performs the update
2
2 Linear Regression In One variable 2.2 Gradient Descent
m
1 X (i)
θj := θj − α (hθ (x(i) ) − y (i) )xj (simultaneously update θj for all j) (3)
m i=1
With each step of gradient descent, your parameters θj come closer to the optimal values
that will achieve the lowest cost J(θ). Another method is to use predefined function that
optimizes or minimizes the error function.
2.2.2 Implementation
In ex2P1, we have already set up the data for linear regression. In the following lines, we add
another dimension to our data to accommodate the θ0 intercept term. Do NOT execute
this cell more than once.
1 m = Y.size # number of training examples
2 X = np.stack([np.ones(m), X], axis=1) # it used to convert X in to (97x2) i.e Add a
column of ones to x
3 print(X.shape)
Output
(97 , 2)
3
2 Linear Regression In One variable 2.2 Gradient Descent
9 ##################################################
10 return J
11
A good way to verify that gradient descent is working correctly is to look at the value
of J(θ) and check that it is decreasing with each step. The starter code for the function
gradientDescent calls computeCost on every iteration and saves the cost to a python
list. Assuming you have implemented gradient descent and computeCost correctly, your
value of J(θ) should converge to a steady value by the end of the algorithm and you should
see the plotted training data and fitted line through training data. 2
1 def gradientDescent(X, y, theta, alpha, num_iters):
2 m = y.shape[0]
3 theta = theta.copy()
4 J_history = []
4
2 Linear Regression In One variable 2.3 Visualization
6 for i in range(num_iters):
7 ########### YOUR GRADIENT DESCENT "theta" HERE ###########
8
10
11
12
13 #########################################################
14 J_history.append(computeCost(X, y, theta))
15 return theta, J_history
16
2.3 Visualization
To understand the cost function J(θ) better, you will now plot the cost over a 2 dimensional
grid of θ0 and θ1 values. You will not need to code anything new for this part, but you
should understand how the code you have written already is creating these images.
In the next cell, the code is set up to calculate J(θ) over a grid of values using the compute-
Cost function that you wrote. After executing the following cell, you will have a 2D array
of J(θ) values. Then, those values are used to produce surface and contour plots of J(θ)
using the matplotlib plot surface and contour functions . The plots should look something
like the figure 3.
The purpose of these graphs is to show you how J(θ) varies with changes in θ0 and θ1 .
The cost function J(θ)is bowl shaped and has a global minimum. (This is easier to see in
the contour plot than in the 3D surface plot). This minimum is the optimal point for θ0 and
θ1 , and each step of gradient descent moves closer to this point.
5
2 Linear Regression In One variable 2.4 Feature Normalization
6
2 Linear Regression In One variable 2.5 Learning rate
• After subtracting the mean, additionally scale (divide) the feature values by their
respective “standard deviations.”
The standard deviation is a way of measuring how much variation there is in the range of
values of a particular feature this is an alternative to taking the range of values (max-min).
In numpy, you can use the std function to compute the standard deviation.
1 def featureNormalize(X):
2
7 ######################################
8
9 for i in range(5):
10 theta = np.zeros(2)
11 theta, J_history = gradientDescent(X ,Y, theta, alpha[i], iterations)
12 # initialize fitting parameters
13 costs.append(J_history)
14 # Plot the convergence graph
15
16 for i in range(5):
7
2 Linear Regression In One variable 2.5 Learning rate
Use your implementation of gradient Descent function and run gradient descent for about
500 iterations at the chosen learning rate. The function should also return the history of
J(θ) values in a vector J. After the last iteration, plot the J values against the number of
the iterations. If you picked a learning rate within a good range, your plot look similar as
the following Figure 4.
If your graph looks very different, especially if your value of J(θ) increases or even blows
up, adjust your learning rate and try again. We recommend trying values of the learning
rate α (alpha) on a log-scale, at multiplicative steps of about 3 times the previous value
(0.3, 0.1, 0.03, 0.01 and 0.001). You may also want to adjust the number of iterations
you are running if that will help you see the overall trend in the curve.
8
3 Part 2: Optimization Functions
7 ################################################
8 lr2.append(J)
9 return J
Now you need to fill in the code for gradf function for fmin cg which only requires 5
m
1 X (i)
(hθ (x(i) ) − y (i) )xj (simultaneously update θj for all j) (5)
m i=1
9 ###################################################
10 #lr2.append(J)
11 return theta
The Final Result should resemble the figure 5 with given labels. since both optimize
equally the result will be same:
9
3 Part 2: Optimization Functions 3.1 Learning Rates
Output
10
3 Part 2: Optimization Functions 3.2 Fitting Second order polynomial
11
3 Part 2: Optimization Functions 3.2 Fitting Second order polynomial
10
11 ############################################################
12 return J
7 for i in range(num_iters):
8 ########### YOUR Vectorize GRADIENT DESCENT "theta" HERE ###########
9 #vactorize implementation (without looping through training data)
10
11
12
13
14
15 ######################################################
16 J_history.append(computeCostVectorize(X, y, theta))
17 return theta, J_history
Make sure your code supports any number of features and is well-vectorized. The end
plot should look like figure 7
12
3 Part 2: Optimization Functions 3.2 Fitting Second order polynomial
Similarly you can fit third order polynomial through data. Figure 8
13