Week 1
Week 1
Week 1
Answer: D
Solution:
[2, 4, −5] contains 3 components and all of them are real numbers.
Thevector
2
So, 4 ∈ R3 .
−5
∴ Option D is correct.
2. (1 point) Which of the following may not be an appropriate choice of loss function for
regression?
A. n1 ni=1 (f (xi ) − yi )2
P
B. n1 ni=1 |f (xi ) − yi |
P
Answer: C
Solution:
Here, option C that is, Loss = n1 ni=1 1(f (xi ) ̸= yi ) may be a good choice for classifica-
P
tion, but it is not a good choice for regression.
You can see that this loss function will increase when the prediction is not equal to
a label. However, it does this with a fixed loss of 1. Ideally, we would want the loss
increase to be proportionate to the amount of discrepancy between the prediction and
the label.
∴ Option C is correct.
A. Predicting the amount of rainfall in May 2022 in North India based on precip-
itation data of the year 2021.
B. Predicting the price of a land based on its area and distance from the market.
C. Predicting whether an email is spam or not.
D. Predicting the number of Covid cases on a given day based on previous month
data.
Answer: C
Solution:
Here, in options A, B and D, we can see that we have to predict some kind of real
number. Namely, amount of rainfall, price of land and number of cases. These kinds of
problems are more suitable to regression. Option C however, is predicting in which cate-
gory the datapoint (email) falls into. It is an example of binary classification technique.
∴ Option C is correct.
Answer: C
Solution:
A. Since 355 is odd, 355%2 = 1. So, the statement inside the indicator function is
true. That is, 1(355%2 = 1) = 1. Since this option is a true statement, it will not be
marked.
B. Since 788 is even, 788%2 = 0. So, the statement inside the indicator function is
false. That is, 1(788%2 = 1) = 0. Since this option is a true statement, it will not be
marked.
C. Since 355 is odd, 355%2 = 1. So, the statement inside the indicator function is
false. That is, 1(355%2 = 0) = 0. Since this option is a false statement, it will be
marked.
Course: Machine Learning - Foundations Page 3 of 11
D. Since 788 is even, 788%2 = 0. So, the statement inside the indicator function is
true. That is, 1(788%2 = 0) = 1. Since this option is a true statement, it will not be
marked.
5. (1 point) Which of the following is false regarding supervised and unsupervised machine
learning?
A. Unsupervised machine learning helps you to find different kinds of unknown
patterns in data.
B. Regression and classification are two types of supervised machine learning tech-
niques while clustering and density estimation are two types of unsupervised
learning.
C. In unsupervised learning model, the data contains both input and output
variables while in supervised learning model, the data contains only input
data.
Answer: C
Solution:
Here, option C is a false statement. It is infact supervised learning model in which, the
data contains both input and output variables. Also, it is unsupervised learning model
in which, data contains only input data.
∴ Option C is correct.
Answer: C
Solution:
The output of a regression model, linear regression for example, can be any real number.
It is continuous and can be within any range.
∴ Option C is correct.
Course: Machine Learning - Foundations Page 4 of 11
7. (1 point) (Multiple select) Which of the following is/are supervised learning task(s)?
A. Making different groups of customers based on their purchase history.
B. Predicting whether a loan client may default or not based on previous credit
history.
C. Grouping similar Wikipedia articles as per their content.
D. Estimating the revenue of a company for a given year based on number of
items sold.
Answer: B,D
Solution:
8. (1 point) Which of the following is used for predicting a continuous target variable?
A. Classification
B. Regression
C. Density Estimation
D. Dimensionality Reduction
Answer: B
Solution:
Out of the options, the technique used for prediction of a continuous target variable is
regression.
∴ Option B is correct.
Course: Machine Learning - Foundations Page 5 of 11
9. (1 point) Consider the following: “The is used to fit the model; the is used
for model selection; the is used for computing the generalization error.”
Which of the following will fill the above blanks correctly?
A. Test set; Validation set; training set
B. Training set; Test set; Validation set
C. Training set; Validation set; Test set
D. Test set; Training set; Validation set
Answer: C
Solution:
The training set is used to fit our model. After that, the validation set is used to select
the best model. Then, the test set is used for computing the generalization error.
∴ Option C is correct.
Answer: C
Solution:
1. This is the negative log likelihood loss and is used for density estimation.
2. This is computing the error between the reconstructed datapoint and actual datapoint
and is used in dimensionality reduction.
3. This is the squared error loss and it is used for regression.
4. This loss function simply compares if prediction and label are equal or not. This is
used in classification.
∴ Option C is correct.
11. (1 point) Compute the loss when Pair 1 and Pair 2 (shown below) are used for dimen-
sionality reduction for the data given in the following Table:
x1 x2
1 0.5
2 2.3
3 3.1
4 3.9
1 Pn i i 2
Consider the loss function to be i=1 ||g(f (x )) − x || .
n
Here f (x) is the encoder function and g(x) is the decoder function.
Pair 1:
Pair 2:
We are given an encoder (f ) and a decoder (g) function. To solve this question, we
will take each datapoint xi and encode it using encoder function getting f (xi ) and then
decode it to get g(f (xi )). Then the squared error would be given as ||g(f (xi )) − xi ||2 .
We would then take the average of this error over all datapoints to get the loss.
Pair 1:
4
1 X 1
||g f xi − xi ||2 = (0.625 + 10.625 + 19.2245 + 30.425) = 15.224
Loss =
4 i=1 4
Pair 2:
12. (1 point) Consider the following 4 training examples. We want to learn a function
x y
-1 0.0319
0 0.8692
1 1.9566
2 3.0343
f (x) = ax + b which is parameterized by (a,b). Using average squared error as the loss
function, which of the following parameters would be best to model the given data?
A. (1, 1)
B. (1, 2)
C. (2, 1)
D. (2, 2)
Answer: A
Solution: For each of the parameters given, we have a different function to estimate y.
For each function we will estimate each label y i .
4
1X
Then the loss will be given by ||f (xi ) − y i ||2 .
4 i=1
x y x + 1 x + 2 2x + 1 2x + 2
−1 0.0319 0 1 −1 0
0 0.8692 1 2 1 2
1 1.9566 2 3 3 4
2 3.0343 3 4 5 6
Course: Machine Learning - Foundations Page 8 of 11
A. f (x) = x + 1
B. f (x) = x + 2
C. f (x) = 2x + 1
D.f (x) = 2x + 2
Since, the loss for f (x) = x + 1 is the smallest, the parameters (1, 1) are the best fit for
this model.
∴ Option A is correct.
X y
[ 2] 5.8
[ 3] 8.3
[ 6] 18.3
[ 7] 21
[ 8] 22
What will be the amount of loss when the functions g = 3x1 + 1 and h = 2x1 + 2 are used
to represent the regression line. Consider the average squared error as loss function.
g:
Course: Machine Learning - Foundations Page 9 of 11
h:
X y
[ 4, 2] +1
[ 8, 4] +1
[ 2, 6] -1
[ 4, 10] -1
[ 10, 2] +1
[ 12, 8] -1
What will be the average misclassification error when the functions g(X) = sign(x1 −
x2 − 2) and h(X) = sign(x1 + x2 − 10) are used to classify the data points into classes
+1 or −1.
g:
h:
n
1X
The average misclassification error for a function f (x) is given by 1(f (X i ) ̸= y i )
n i=1
X
[1,2,3]
[2,3,4]
[-1,0,1]
[0,1,1]
Give the reconstruction error for this encoder decoder pair. The reconstruction error is
the mean of the squared distance between the reconstructed input and input.
We are given an encoder (f ) and a decoder (g) function. To solve this question, we
will take each datapoint xi and encode it using encoder function getting f (xi ) and then
decode it to get g(f (xi )). Then, the squared error would be given as ||g(f (xi )) − xi ||2 .
We would then take the average of this error over all datapoints to get the loss.
138
So, the loss will be = 34.5.
4
∴ Answer is 34.5.