HW 1
HW 1
HW 1
Homework 1
Due Date: October 25, 2020 (11:59pm)
Instructions:
• Only electronic submissions will be accepted. Your main PDF writeup must be typeset in LaTeX (please
also refer to the “Additional Instructions” below).
• The PDF writeup containing your solution has to be submitted via Gradescope https://www.gradescope.
com/.
• We have created your Gradescope account (you should have received the notification). Please use your
IITK CC ID (not any other email ID) to login. Use the “Forgot Password” option to set your password.
Additional Instructions
• We have provided a LaTeX template file hw1sol.tex to help typeset your PDF writeup. There is also
a style file ml.sty that contain shortcuts to many of the useful LaTeX commends for doing things such
as boldfaced/calligraphic fonts for letters, various mathematical/greek symbols, etc., and others. Use of
these shortcuts is recommended (but not necessary).
• Your answer to every question should begin on a new page. The provided template is designed to do this
automatically. However, if it fails to do so, use the \clearpage option in LaTeX before starting the
answer to a new question, to enforce this.
• While submitting your assignment on the Gradescope website, you will have to specify on which page(s)
is question 1 answered, on which page(s) is question 2 answered etc. To do this properly, first ensure that
the answer to each question starts on a different page.
• Be careful to flush all your floats (figures, tables) corresponding to question n before starting the answer
to question n + 1 otherwise, while grading, we might miss your important parts of your answers.
• Your solutions must appear in proper order in the PDF file i.e. solution to question n must be complete in
the PDF file (including all plots, tables, proofs etc) before you present a solution to question n + 1.
1
Problem 1 (15 marks)
(Absolute Loss Regression with Sparsity) The absolute loss regression problem with `1 regularization is
N
X
wopt = argmin |yn − w> xn | + λ||w||1
w
n=1
PD
where ||w||1 = d=1 |wd |, |.| is the absolute value function, and λ > 0 is the regularization hyperparameter.
Is the above objective function convex? You don’t need to prove this formally; just a brief reasoning based on
properties of other functions that are known to be convex/non-convex would be fine.
Derivate the expression for the (sub)gradient vector for this model.
2
Derive an alternating optimization (ALT-OPT) algorithm to learn B and S, clearly writing down the expressions
for the updates of B and S. Are both subproblems (solving for B and solving for S) equally easy/difficult in
this ALT-OPT algorithm? If yes, why? If no, why not?
Note: Since B and S are matrices, if you want, please feel free to use results for matrix derivatives (results you
will need can be found in Sec. 2.5 of the Matrix Cookbook). However, the problem can be solved even without
using matrix derivative results with some rearragement of terms and using vector derivatives.
where X is the N × D feature matrix and y is the N × 1 vector of labels of the N training examples. Note
that the factor of 21 has been used in the above expression just for convenience of derivations required for this
problem and does not change the solution to the problem.
Derive the Newton’s method’s update equations for each iteration. For this model, how many iterations would
the Newton’s method will take to converge?