Assignment 1
Assignment 1
Instructions:
1. Your answers to the questions below, including plots and mathematical work, should be submitted as a single
PDF file (named studentrollnum_asgn1.pdf).
2. The codes should be submitted following the naming convention mentioned in each question. In addition,
please submit only your python code (with .py extension); in particular, do not submit in jupyter notebook
format.
3. Zip all the files and name it as studentrollnum_asgn1.zip (use only zip; no other formats are permitted)
4. Usage of machine learning libraries are not permitted. The assignment can be completed using (base) python,
numpy and matplotlib. Pandas can be used for loading the csv dataset.
5. Honor code: The assignments should be submitted individually. Students are encouraged to discuss the assign-
ment. However, copying is not permitted. Students found copying will be subjected to disciplinary action. In
addition, during the course viva, if you are unable to explain your assignment code, you will forfeit your credit
for the assignment.
Exercice 1: Linear Regression (with squared loss) This exercise uses the Boston house-prices dataset1 . One
option to load the Boston dataset is by calling sklearn.datasets.load_boston. The aim of the exercise is to
build a linear regression for predicting the median value of home (medv).
(a) One feature and one label: Derive an analytical expression for the coefficients of the linear regression
of the form y = wx + b. Using the derived analytical expression, write a python program to compute
the linear regression coefficients between the variables lstat (features) and medv (label) in the Boston
dataset. The python file should be name studentrollnum_exp1_a.py. For computing the inverse, you
may use the numpy.linalg.inv().
(b) Multiple features: Derive an analytical expression for the gradient of the training error for the linear
regression of the form y = wT x + b. Using gradient descent (or stochastic gradient descent), write a
python program to compute the linear regression coefficients between all 13 features and medv (label) in
the Boston dataset. The python file should be name studentrollnum_exp1_b.py.
Exercice 2: Ridge Regression This exercise again uses the Boston house-prices dataset. Ridge regression is linear
regression with `2 penalty. The training error for ridge regression is given by
n
1X 2
wT xi + b − yi + λwT w.
J(w, b) =
n i=1
(a) Compute the gradient of J(w, b), and write down the gradient descent algorithm for computing optimal w
and b.
(b) Fix a value of λ. Split the dataset into training set and test set, and write a python program to compute
regression coefficients using gradient descent. At the end, your program should print the training error
and test error. The python file should be named studentrollnum_exp2_b.py.
(c) Choose a reasonable step size. Plot the training loss and the test loss as a function of λ. The goal is to
find λ that minimizes the test loss. It is hard to predict what λ will be,
so you should start your search very
broadly, looking over several orders of magnitude. For example, λ ∈ 10−7 , 10−5 , 10−3 , 10−1 , 1, 10, 100 .
Once you find a range that works better,keep zooming in. Include this plot in your report. The python
file should be named studentrollnum_exp2_c.py.
(d) For the optimum value of λ, compare the coefficents obtained from ridge regression to that obtained using
linear regression. What is your inference?
1 Please check details of the dataset in http://lib.stat.cmu.edu/datasets/boston
Exercice 3: Logistic Regression The data set to be used is Smarket. This data set consists of percentage returns
for the S&P 500 stock index over 1,250 days, from the beginning of 2001 until the end of 2005. For each date,
the percentage returns for each of the five previous trading days were recorded, Lag1 through Lag5. Also
recorded is the Volume (the number of shares traded on the previous day, in billions), Today (the percentage
return on the date in question) and Direction (whether the market was Up or Down on this date).
(a) Write a python program to fit a logistic regression model in order to predict Direction using Lag1 through
Lag5 and Volume. The python file should be named studentrollnum_exp3_a.py.
(b) Confusion Matrix: A common performance measure of classification problems is the confusion ma-
trix. Read up more about the confusion matrix and other classification performance measures in the
following wikipedia link https://en.wikipedia.org/wiki/Confusion_matrix. Write a python program
to compute the confusion matrix of the trained logistic regression model. The python file should be
named studentrollnum_exp3_b.py.
Exercice 4: Neural Network In this exercise, we will build a 2 layer neural network model (one hidden layer)
for binary classification with nonlinear class boundaries. A dataset for this can be generated using sklearn
as sklearn.datasets.make_moons(100, noise=0.1). In the dataset, the input features are the x1 and x2
co-ordinates, and the labels (y) are binary (either 0 or 1).
Given a feature vector (the x1 and x2 co-ordinates), the neural network model should produce 2 outputs,
corresponding to the probability of belonging to either 0 or 1. The equations are as given below:
h = σ(V φ),
score = W h + b,
where, φ = [1, x1 , x2 ], V ∈ RL×3 , W ∈ R2×L , b ∈ R2 , and score ∈ R2 . Here, L denotes the number of latent
features or neurons, and σ is the activation function. You can choose either the tanh, or ReLU activation
function. The score is converted to the output using the following equation:
ŷ = softmax(score),
where the softmax function23 takes as input a vector z of real numbers, and normalizes it into a probability
distribution and is given by
ezi
softmax(z)i = P zi .
ie
(a) Derive the gradient of the loss using the backpropogation algorithm.
(b) Write a python program to train the neural network using gradient descent algorithm. The code should
be named studentrollnum_exp4_b.py.
(c) Split the dataset into train/test sets. Plot the training error and test error as a function of number of
neurons (L). The code should be named studentrollnum_exp4_c.py.
2 https://en.wikipedia.org/wiki/Softmax_function
3 You should implement softmax function carefully to avoid overflow; for details see https://www.tutorialexample.com/
implement-softmax-function-without-underflow-and-overflow-deep-learning-tutorial/
4 https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html