Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
51 views

Introduction To Machine Learning

Uploaded by

gireesh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Introduction To Machine Learning

Uploaded by

gireesh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

NPTEL Online Certification Courses Indian

Institute of Technology Kharagpur

Introduction to
Machine Learning
Assignment- Week 0
TYPE OF QUESTION: MCQ
Number of questions: 10 Total mark: 10 X 2 = 20

MCQ Question
_______________________________________________________________________

QUESTION 1:

1
Find the maxima and minima of the function 𝑓(𝑥) = 𝑥 + 𝑥
.
A. -1,1
B. 1,-1
C. -2,2
D. 2,-2

Correct Answer: A.
Detailed Solution:
1
𝑓'(𝑥) = 1 − 2 , so at 𝑥 = 1 𝑎𝑛𝑑 − 1, 𝑓'(𝑥) = 0.
𝑥
2
𝑓''(𝑥) = 3 .
𝑥
2
For 𝑥 = 1, 𝑓''(𝑥) = 1
> 0, so 𝑥 = 1 is a point of minima for the function.
2
For 𝑥 = − 1, 𝑓''(𝑥) = − 1
< 0, so 𝑥 =− 1 is a point of maxima for the function.
_______________________________________________________________________

QUESTION 2:
Precision is defined as the fraction of relevant instances among the retrieved instances
and Recall is defined as the fraction of relevant instances that have been retrieved over
the total amount of relevant instances. A typical Information Retrieval system retrieves a
total of 20 documents for a particular query out of which only 5 are relevant. Find the
Precision and Recall of the system. Total set of relevant documents = 10.
A. 0.5,0.25
B. 0.25, 0.5
C. 0.5,0.5
D. 0.25,0.25

Correct Answer: B.
Detailed Solution: Precision = (relevant instances among retrieved instances / total no of
all retrieved instances) = 5/20 = 0.25
Recall = (relevant instances among retrieved instances / total no of relevant instances) =
5/10 = 0.5
_______________________________________________________________________

QUESTION 3:

Entropy associated with each possible data value is the negative logarithm of the
probability mass function for the value. Example Formula is:

𝐻(𝑆) = − ∑ 𝑝𝑖𝑙𝑜𝑔2(𝑝𝑖)
𝑖

Here, 𝐻(𝑆) denotes entropy, 𝑖 represents a class, and 𝑝 denotes the probability of that
𝑖
class.
Given a list of 20 examples including 10 positive, 5 negative and 5 neutral examples. The
entropy of the dataset with respect to this classification is:
A. 3/2
B. 2
C. 5/2
D. 3
Correct Answer: A

H(S) = -((½ log (½) )+(¼ log (¼) )+(¼ log (¼))) = 3/2

_____________________________________________________________________

QUESTION 4:

7+𝑥−3
Find the limit 𝑙𝑖𝑚𝑥−>2 𝑥−2
(Hint: Use L-Hospital’s rule)

A. 1/3
B. 1/6
C. 2/3
D. 5/6

Correct Answer: B
Detailed Solution: Use L-Hospital’s rule.

_____________________________________________________________________
QUESTION 5:

5 runners run a race. How many different ways can the top 3 finishers be selected, if we do not
care about the specific order of these top 3?

A. 5
B. 10
C. 20
D. 30

Correct Answer: B.

Detailed Solution: Top 3 without order can be selected in 5C3 = 10 ways.


_____________________________________________________________________

QUESTION 6:
A busy student must complete 3 problem sets before doing laundry. Each problem set requires
1 day with probability 2/3 and 2 days with probability 1/3. Let B be the number of days a busy
student delays laundry. What is E[B]?

[Here, E[B] denotes the expectation of the event B]

Example: If the first problem set requires 1 day and the second and third problem sets each
requires 2 days, then the student delays for B = 5 days.

A. 2
B. 3
C. 4
D. 5

Correct Answer: C

Detailed Solution:

E[B] = 3*(2/3)*(2/3)*(2/3) + 4*3*(2/3)*(2/3)*(1/3) + 5*3*(2/3)*(1/3)*(1/3) + 6*(1/3)*(1/3)*(1/3) = 4

_______________________________________________________________________
QUESTION 7 :

In a class, there are 15 students who like chocolate. 13 students like vanilla. 10 students
like neither. If there are 35 students in the class, how many students like chocolate and
vanilla?

A. 2
B. 12
C. 3
D. 20

Correct Answer: C.
Detailed Solution:
X: set of students who like chocolate
Y: set of students who like vanilla

|𝑋 ⋃ 𝑌| = |𝑋| + |𝑌| − |𝑋 ⋂ 𝑌|

From the given data, |𝑋| = 15, |𝑌| = 13, |𝑋 ⋃ 𝑌| = 35 − 10 = 25.

|𝑋 ⋂ 𝑌| = 15 + 13 − 25 = 3.

_______________________________________________________________________

QUESTION 8:

Suppose there is a sentence "let's play or not play". The bag-of-words representation
vector of the sentence is the count of each word in the sentence, which corresponds to:

let's play or not


[1 2 1 1].

The point in the space is s = (1,2,1,1).

Now suppose we have some query vectors related to 'play' q1 = (0,1,0,0) and a query
vector related to 'let’s' q2 = (1,0,0,0). Find the nearest query of the sentence vector (s).
Hint : (Use Cosine similarity distance of two points to perform the same).

A. q1
B. q2
Correct Answer: A
Detailed Solution: Compute cosine similarity of q1 and s. Then compute the cosine
similarity of q2 and s. The query with the higher cosine similarity with s is the nearest
query to the sentence.
_______________________________________________________________________

QUESTION 9:

Let u be a n×1 vector, such that uTu = 1. Let I be the n×n identity matrix. The n×n
matrix A is given by (I − kuuT ), where k is a real constant. u itself is an eigenvector
of A, with eigenvalue −1. What is the value of k?

A. -2
B. -1
C. 2
D. 0
Correct Answer: C

Detailed Solution:

(I − kuuT )u = -1.u

u - kuuTu = -u

2u = ku (note: uTu = 1)

k=2
_______________________________________________________________________

QUESTION 10:

Let Am×n be a matrix of real numbers. The matrix AAT has an eigenvector x with
eigenvalue b. Then the eigenvector y of ATA which has eigenvalue b is equal to

A. xTA
B. ATx
C. x
D. Cannot be described in terms of x

Correct Answer: B

Detailed Solution: AATx = bx

multiplying AT to both sides,

ATAATx = bATx
(ATA)(ATx) = b(ATx)

From the equation above ,we observe that ATx is an eigenvector of the matrix ATA
with the eigenvalue b. As y is also an eigenvalue of the same matrix with the same
eigenvalue, y = ATx.

_______________________________________________________________________
END
NPTEL Online Certification Courses Indian
Institute of Technology Kharagpur

Introduction to
Machine Learning
Assignment- Week 1
TYPE OF QUESTION: MCQ
Number of questions: 10 Total mark: 10 X 2 = 20

MCQ Question
_______________________________________________________________________

QUESTION 1:

Which of the following are classification tasks?

A. Detect pneumonia from chest X-ray image


B. Predict the price of a house based on floor area, number of rooms etc.
C. Predict the temperature for the next day
D. Predict the amount of rainfall

Correct Answer: A

Detailed Solution : The number of classes in pneumonia detection is discrete. So, it’s a
classification task. In other options, the output variable is a continuous class, so these are
regression tasks.
_______________________________________________________________________

QUESTION 2:
Which of the following is not a type of supervised learning?
A. Classification
B. Regression
C. Clustering
D. None of the above
Correct Answer: C. Clustering
Detailed Solution : Classification and Regression are both supervised learning methods
as they need class labels or target values for training, but Clustering doesn't need target
values.
_______________________________________________________________________
QUESTION 3:

Which of the following tasks is NOT a suitable machine learning task?

A. Finding the shortest path between a pair of nodes in a graph


B. Predicting if a stock price will rise or fall
C. Predicting the price of petroleum
D. Grouping mails as spams or non-spams

Correct Answer : A. Finding the shortest path between a pair of nodes in a


graph

Detailed Solution : Finding the shortest path is a graph theory based task, whereas
other options are completely suitable for machine learning.
_____________________________________________________________________

QUESTION 4:

Suppose I have 10,000 emails in my mailbox out of which 300 are spams. The spam detection
system detects 150 mails as spams, out of which 50 are actually spams. What is the precision
and recall of my spam detection system?

A. Precision = 33.33%, Recall = 25%


B. Precision = 25%, Recall = 33.33%
C. Precision = 33.33%, Recall = 16.66%
D. Precision = 75%, Recall = 33.33%

Correct Answer: C

Detailed Solution :
𝑇𝑝
Precision = 𝑇𝑝+𝐹𝑝
50
= 50 + 100
= 33. 33%
𝑇𝑝
Recall = 𝑇𝑝+𝐹𝑛
50
= 50+250
= 16. 66%
_______________________________________________________________________

QUESTION 5 :

Which of the following is/are supervised learning problems?


A. Predicting disease from blood samples.

B. Grouping students in the same class based on similar features.

C. Face recognition to unlock your phone.

Correct Answer: A, C
Detailed Solution: Option B is an unsupervised learning problem.

_______________________________________________________________________

QUESTION 6:

Aliens challenge you to a complex game that no human has seen before. They give you
time to learn the game and develop strategies before the final showdown. You choose to
use machine learning because an intelligent machine is your only hope. Which machine
learning paradigm should you choose for this?
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Use a random number generator and hope for the best

Correct Answer: C. Reinforcement learning


Detailed Solution: Reinforcement learning is the optimal method for building agents for
complex games where no expert trajectories exist. It is possible to design a reward
function/signal that depends on the outcome of the game. The objective of the player
agent is to maximize the total reward collected from the game.

_______________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 7:
How many Boolean functions are possible with 𝑁 features?
𝑁

( )
A. 2
2

( 𝑁)
B. 2
2
C. (𝑁 )
𝑁
D. (4 )
𝑁

Correct Answer: A. 2 ( )
2

Detailed Solution : There are 2N possible combinations of N input boolean features.


Each of these input feature vectors can be either True (1) or False (0), so there can be
𝑁
2
2 possible truth tables for the boolean function.
_______________________________________________________________________

QUESTION 8:

What is the use of Validation dataset in Machine Learning?

A. To train the machine learning model.


B. To evaluate the performance of the machine learning model
C. To tune the hyperparameters of the machine learning model
D. None of the above.

Correct Answer : C. To tune the hyperparameters of the machine learning model

Detailed Solution : The validation dataset is used to tune the model's hyperparameters during
training

_______________________________________________________________________
.

_______________________________________________________________________

QUESTION 9:

Regarding bias and variance, which of the following statements are true? (Here ‘high’ and
‘low’ are relative to the ideal model.)
A. Models which overfit have a high bias.
B. Models which overfit have a low bias.
C. Models which underfit have a high variance.
D. Models which underfit have a low variance.

Correct Answer : B, D

Detailed Solution : In supervised learning, underfitting happens when a model is unable


to capture the underlying pattern of the data. These models usually have high bias and
low variance. Overfitting happens when our model captures the noise along with the
underlying pattern in data. These models have low bias and high variance.
_____________________________________________________________________

QUESTION 10:
Which of the following is a categorical feature?

A. Height of a person
B. Price of petroleum
C. Mother tongue of a person
D. Amount of rainfall in a day

Correct Answer: C
Detailed Solution: Categorical variables represent types of data which may be divided
into groups. Mother tongue is a categorical feature. All other options are continuous.
______________________________________________________________________

*******END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Introduction to
Machine Learning
Assignment- Week 2
TYPE OF QUESTION: MCQ
Number of questions: 10 Total mark: 10 X 2 = 20

MCQ Question
QUESTION 1:

In a binary classification problem, out of 30 data points 12 belong to class I and 18 belong
to class II. What is the entropy of the data set?
A. 0.97
B. 0
C. 1
D. 0.67

Answer: A. 0.97
Detailed Solution:
Entropy = - ((12/30)*log2(12/30)+(18/30)*log2(18/30)) = 0.97
__________________________________________________________________

QUESTION 2:

Which of the following properties are characteristics of decision trees?

A. Low bias
B. High variance
C. Lack of smoothness of prediction surfaces
D. None of the above

Correct Answer: A, B, C

Detailed Solution: Decision tree classifiers have low bias and high variance. As decision
trees split the input space into rectangular spaces, the predictor surface or the decision
boundary lacks smoothness.

__________________________________________________________________
QUESTION 3:
Statement: Decision Tree is an unsupervised learning algorithm.
Reason: The splitting criterion uses only the features of the data to calculate their
respective measures.

A. Statement is True. Reason is True.


B. Statement is True. Reason is False.
C. Statement is False. Reason is True.
D. Statement is False. Reason is False.

Correct Answer: D. Statement is False. Reason is False.

Detailed Solution : Decision Tree is a supervised learning algorithm and the reason is
also false.

_______________________________________________________________

QUESTION 4:
In linear regression, our hypothesis is ℎθ(𝑥) = θ0 + θ1𝑥, the training data is given in the
table.
x y
10 5
3 3
6 7
8 6
𝑚
2
If the cost function is 𝐽(θ) =
1
2𝑚 ( )
∑ (ℎθ 𝑥𝑖 − 𝑦𝑖) , where m is no. of training data points.
𝑖=1
What is the value of 𝐽(θ) when θ = (1,1) ?
A. 0
B. 5.75
C. 4.75
D. 6.75

Correct Answer: B. 5.75


Detailed Solution : Substitute θ0 by 1 and θ1 by 1 and compute 𝐽(θ).
_________________________________________________________________
QUESTION 5:

What is a common indicator of overfitting in a decision tree?

A. The training accuracy is high while the test accuracy is low.


B. The tree is shallow.
C. The tree has only a few leaf nodes.
D. The tree’s depth matches the number of attributes in the dataset.

Correct Answer: A. The training accuracy is high while the test accuracy is low.
Detailed Solution: The training accuracy is high while the test accuracy is low.

_________________________________________________________________

QUESTION 6:

What is true for Batch Gradient Descent?


A. In every iteration, model parameters are updated based on one training sample
B. In every iteration, model parameters are updated based on all training samples
C. None of the above
Correct Answer : B. In every iteration model parameters are updated based on all
training samples.
Detailed Solution : In batch gradient descent, all training samples are used in every
iteration. In stochastic gradient descent, one training sample is used to update
parameters in every iteration.
______________________________________________________________
QUESTION 7:
Answer Questions 7-8 with the data given below:

Consider the following dataset. We want to build a decision tree classifier to detect
whether a tumor is malignant or not using several input features such as age, vaccination,
tumor size and tumor site. The target variable is “Malignant” and the other attributes are
input features.

What is the initial entropy of the dataset?


A. 0.543
B. 0.9798
C. 0.8732
D. 1
Correct Answer: B. 0.9798

Detailed Solution:

The entropy of the whole dataset is = -(5/12)log2(5/12)-(7/12)log2(7/12) = 0.9798

________________________________________________________________

QUESTION 8:

For the dataset in Question 7, what is the information gain of Vaccination (If entropy
measure is used to calculate information gain)?

A. 0.4763
B. 0.2102
C. 0.1134
D. 0.9355

Correct Answer: A. 0.4763

Information gain of Vaccination =

________________________________________________________________

QUESTION 9:
Which of the following criteria is typically used for optimizing in linear regression?
A. Maximizing the number of points touched by the line
B. Minimizing the number of points touched by the line
C. Minimizing the sum of squared distance of the line from the points
D. Minimizing the maximum squared distance of a point from a line

Correct Answer: C. Minimizing the sum of squared distance of the line from the
points
Detailed Solution: In linear regression, the objective is to minimize the sum of squared
distance of the line from the points.
________________________________________________________________
QUESTION 10:

The parameters obtained in linear regression

A. can take any value in the real space


B. are strictly integers
C. always lie in the range [0,1]
D. can take only non-zero values

Correct Answer: A. can take any value in the real space

Detailed Solution: The linear regression parameters can take any real number value.

________________________________________________________________

*****END*****
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Introduction to
Machine Learning
Assignment- Week 3
TYPE OF QUESTION: MCQ
Number of questions: 10 Total mark: 10 X 2 = 20

QUESTION 1:

Suppose, you have been given the following data where x1 and x2 are the 2 input
variables and Class is the dependent variable.

x1 x2 Class

-1 1 -

0 1 +

0 2 -

1 -1 -

1 0 +

1 2 +

2 2 -

2 3 +

What will be the class of a new data point x1=1 and x2=1 in 5-NN (k nearest neighbour
with k=5) using euclidean distance measure?
A. + Class
B. – Class
C. Cannot be determined

Correct Answer: A. + Class


Detailed Solution : 5 nearest points to the new point (1,1) are: (0,1), (0,2), (1,0), (1,2),
(2,2). The majority class among these 5 nearest neighbours is + Class.
_______________________________________________________________________

QUESTION 2:

Imagine you are dealing with a 10 class classification problem. What is the maximum
number of discriminant vectors that can be produced by LDA?
A. 20
B. 14
C. 9
D. 10
Correct Answer: C. 9
Detailed Solution : LDA produces at most c − 1 discriminant vectors, c = no of classes

_______________________________________________________________________

QUESTION 3:

Fill in the blanks:


K-Nearest Neighbor is a _____ , _____ algorithm
A. Non-parametric, eager
B. Parametric, eager
C. Non-parametric, lazy
D. Parametric, lazy

Correct Answer: C. Non-parametric, lazy

Detailed Solution: KNN is non-parametric because it does not make any


assumption regarding the underlying data distribution. It is a lazy learning
technique because during training time it just memorizes the data and finally
computes the distance during testing.

_______________________________________________________________________
NPTEL Online Certification Courses Indian
Institute of Technology Kharagpur

QUESTION 4:

Which of the following statements is True about the KNN algorithm?

A. KNN algorithm does more computation on test time rather than train time.
B. KNN algorithm does lesser computation on test time rather than train time.
C. KNN algorithm does an equal amount of computation on test time and train time.
D. None of these.

Correct Answer: A. KNN algorithm does more computation on test time rather than
train time.

Detailed Solution : The training phase of the algorithm consists only of storing the feature
vectors and class labels of the training samples.
In the testing phase, a test point is classified by assigning the label which is the most
frequent among the k training samples nearest to that query point – hence higher
computation.
_______________________________________________________________________

QUESTION 5:
Which of the following necessitates feature reduction in machine learning?
1. Irrelevant and redundant features
2. Curse of dimensionality
3. Limited computational resources.

A. 1 only
B. 2 only
C. 1 and 2 only
D. 1, 2 and 3
Correct Answer: D. 1,2 and 3

Detailed Solution: All these things necessitate feature reduction.


_______________________________________________________________________
QUESTION 6:

When there is noise in data, which of the following options would improve the performance
of the k-NN algorithm?

A. Increase the value of k


B. Decrease the value of k
C. Changing value of k will not change the effect of the noise
D. None of these

Correct Answer: A. Increase the value of k

Detailed Solution : Increasing the value of k reduces the effect of the noise and
improves the performance of the algorithm.
_______________________________________________________________________

QUESTION 7:
Find the value of the Pearson’s correlation coefficient of X and Y from the data in the
following table.
AGE (X) GLUCOSE (Y)
43 99
21 65

25 79

42 75

A. 0.47
B. 0.68
C. 1
D. 0.33
Correct Answer : B. 0.68

∑(𝑋𝑖−𝑋)((𝑌𝑖−𝑌)
Detailed Solution : Pearson Coefficient 𝑟 = 𝑖

2 2
∑(𝑋𝑖−𝑋) ∑(𝑌𝑖−𝑌)
𝑖 𝑖
Where X = [43,21,25,42], Y = [99,65,79,75], 𝑋 = mean of 𝑋𝑖 values and 𝑌 = mean of

𝑌𝑖 values.

_______________________________________________________________________

QUESTION 8:

Which of the following statements is/are true about PCA?

1. PCA is a supervised method


2. It identifies the directions that data have the largest variance
3. Maximum number of principal components <= number of features
4. All principal components are orthogonal to each other

A. Only 2
B. 1, 3 and 4
C. 1, 2 and 3
D. 2, 3 and 4

Correct Answer: D
Detailed Solution : PCA is an unsupervised learning algorithm, so 1 is wrong. Other
statements are true about PCA.
_______________________________________________________________________

QUESTION 9:
In user-based collaborative filtering based recommendation, the items are
recommended based on :
A. Similar users
B. Similar items
C. Both of the above
D. None of the above

Correct Answer: A. Similar users

Detailed Solution: In User-based Collaborative filtering, items are recommended


based on similar users.

______________________________________________________________________
QUESTION 10:
Identify whether the following statement is true or false?
“Linear Discriminant Analysis (LDA) is a supervised method”

A. TRUE
B. FALSE

Correct Answer : A. TRUE

Detailed Solution : LDA is a supervised method as it makes use of the class


labels.

_______________________________________________________________________
******END****
Introduction to Machine Learning
Assignment- Week 4
TYPE OF QUESTION: MCQ
Number of questions: 10 Total mark: 10 X 2 = 20
______________________________________________________________________

QUESTION 1:
A man is known to speak the truth 2 out of 3 times. He throws a die and reports that the
number obtained is 4. Find the probability that the number obtained is actually 4 :
A. 2/3
B. 3/4
C. 5/22
D. 2/7

Correct Answer : D. 2/7


Detailed Solution : Suppose,

𝐴 : 𝑇ℎ𝑒 𝑚𝑎𝑛 𝑟𝑒𝑝𝑜𝑟𝑡𝑠 𝑡ℎ𝑎𝑡 4 𝑖𝑠 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑.


𝐵 : 𝑁𝑢𝑚𝑏𝑒𝑟 4 𝑖𝑠 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑
𝑃(𝐴|𝐵)𝑃(𝐵)
𝑃(𝐵|𝐴) = here,
𝑃(𝐴|𝐵)𝑃(𝐵)+𝑃(𝐴|𝐵)𝑃(𝐵)
2 1 1 5
𝑃(𝐴|𝐵) = 3
, 𝑃(𝐵) = 6 , 𝑃(𝐴|𝐵) = 3
, 𝑃(𝐵) = 6

2
𝑃(𝐵|𝐴) = 7

_________________________________________________________________

QUESTION 2:
Two cards are drawn at random from a deck of 52 cards without replacement. What is
the probability of drawing a 2 and an Ace in that order?
A. 4/51
B. 1/13
C. 4/256
D. 4/663

Correct Answer : D. 4/663


Detailed Solution :

A : Drawing a 2

B : Drawing an Ace from the remaining 51 cards


4 1 4
𝑃(𝐴𝐵) = 𝑃(𝐴) * 𝑃(𝐵|𝐴) here, 𝑃(𝐴) = 52
= 13
, 𝑃(𝐵|𝐴) = 51

1*4 4
𝑃(𝐴𝐵) = 13*51
= 663

______________________________________________________________________

QUESTION 3:
Consider the following graphical model, mark which of the following pair of random
variables are independent given no evidence?

A. a,b
B. c,d
C. e,d
D. c,e
Correct Answer : A. a,b

Detailed Solution : Nodes a and b don’t have any predecessor nodes. As they don’t
have any common parent nodes, a and b are independent.

______________________________________________________________________
QUESTION 4:
Consider the following Bayesian network. The random variables given in the model are
modeled as discrete variables (Rain = R, Sprinkler = S and Wet Grass = W) and the
corresponding probability values are given below. (Note: (¬ X) represents complement
of X)

P(R) = 0.1
P(S) = 0.2
P(W | R, S) = 0.8
P(W | R, ¬ S) = 0.7
P(W | ¬ R, S) = 0.6
P(W | ¬ R, ¬ S) = 0.5

Calculate P(S | W, R).

A. 1
B. 0.5
C. 0.22
D. 0.78

Correct Answer : C. 0.22

𝑃(𝑊,𝑆,𝑅) 𝑃(𝑊𝑆𝑅)
Detailed Solution : 𝑃(𝑆|𝑊, 𝑅) = 𝑃(𝑊,𝑅) =
𝑃(𝑊𝑆𝑅)+𝑃(𝑊𝑆𝑅)
𝑃(𝑊𝑆𝑅) = 𝑃(𝑊|𝑆, 𝑅) * 𝑃(𝑅) * 𝑃(𝑆) = 0. 8 * 0. 1 * 0. 2 = 0. 016
𝑃(𝑊𝑆𝑅) = 𝑃(𝑊|𝑆, 𝑅) * 𝑃(𝑅) * 𝑃(𝑆) = 0. 7 * 0. 1 * 0. 8 = 0. 056
𝑃(𝑊,𝑆,𝑅) 𝑃(𝑊𝑆𝑅) 0.016
𝑃(𝑆|𝑊, 𝑅) = 𝑃(𝑊,𝑅)
= = 0.016+0.056
= 0. 22
𝑃(𝑊𝑆𝑅)+𝑃(𝑊𝑆𝑅)
______________________________________________________________________
QUESTION 5:
What is the naive assumption in a Naive Bayes Classifier?

A. All the classes are independent of each other


B. All the features of a class are independent of each other
C. The most probable feature for a class is the most important feature to be
considered for classification
D. All the features of a class are conditionally dependent on each other.

Correct Answer: B. All the features of a class are independent of each other

Detailed Solution: Naive Bayes Assumption is that all the features of a class are
independent of each other.
______________________________________________________________________

QUESTION 6:
A drug test (random variable T) has 1% false positives (i.e., 1% of those not taking
drugs show positive in the test), and 5% false negatives (i.e., 5% of those taking drugs
test negative). Suppose that 2% of those tested are taking drugs. Determine the
probability that somebody who tests positive is actually taking drugs (random variable
D).
A. 0.66
B. 0.34
C. 0.50
D. 0.91

Correct Answer : A. 0.66


Detailed Solution :
𝑃(𝑇|𝐷)𝑃(𝐷) 95 1 2
𝑃(𝐷|𝑇) = , 𝑃(𝑇|𝐷) = 100
, 𝑃(𝑇|𝐷) = 100
, 𝑃(𝐷) = 100
𝑃(𝑇|𝐷)𝑃(𝐷)+𝑃(𝑇|𝐷)𝑃(𝐷)
𝑃(𝐷|𝑇) = 0. 66
_____________________________________________________________________

QUESTION 7:
It is given that 𝑃(𝐴|𝐵) = 2/3 and 𝑃(𝐴|𝐵) = 1/4. Compute the value of 𝑃(𝐵|𝐴).
A. ½
B. ⅔
C. ¾
D. Not enough information.
Correct Solution : D. Not enough information.
Detailed Solution : There are 3 unknown probabilities 𝑃(𝐴), 𝑃(𝐵), 𝑃(𝐴𝐵) which can not
be computed from the 2 given probabilities. So, we don’t have enough information to
compute 𝑃(𝐵|𝐴).
______________________________________________________________________

QUESTION 8:
Consider the following Bayesian network, where F = having the flu and C = coughing:

Find P(C) and P(F|C).

A. 0.35, 0.23
B. 0.35,0.77
C. 0.24, 0.024
D. 0.5, 0.23

Correct Answer: A. 0.35, 0.23

Detailed Solution :

𝑃(𝐶) = 𝑃(𝐶|𝐹) * 𝑃(𝐹) + 𝑃(𝐶|𝐹) * 𝑃(𝐹)


𝑃(𝐶|𝐹)*𝑃(𝐹)
𝑃(𝐹|𝐶) =
𝑃(𝐶|𝐹)*𝑃(𝐹)+𝑃(𝐶|𝐹)*𝑃(𝐹)

______________________________________________________________________

QUESTION 9:
Bag I contains 4 white and 6 black balls while another Bag II contains 4 white and 3
black balls. One ball is drawn at random from one of the bags and it is found to be
black. Find the probability that it was drawn from Bag I.
A. 1/2
B. 2/3
C. 7/12
D. 9/23

Correct Answer : C. 7/12


Detailed Solution :

Consider the random variables:

B1: “Ball is drawn from bag I”,

B2: “Ball is drawn from bag II”,

W: “Drawn ball is white”,

B: “Drawn ball is black”

We have to find 𝑃(𝐵1|𝐵)

𝑃(𝐵|𝐵1)*𝑃(𝐵1) (6/10)*(1/2) 3/10 7


𝑃(𝐵1|𝐵) = 𝑃(𝐵|𝐵1)*𝑃(𝐵1)+𝑃(𝐵|𝐵2)*𝑃(𝐵2)
= (6/10)*(1/2)+(3/7)*(1/2)
= 3/10+3/14
= 12

______________________________________________________________________

QUESTION 10:
In a Bayesian network a node with only outgoing edge(s) represents

A. a variable conditionally independent of the other variables.


B. a variable dependent on its siblings.
C. a variable whose dependency is uncertain.
D. None of the above.

Correct Answer: A. a variable conditionally independent of the other variables.

Detailed Solution : As there is no incoming edge for the node, the node is not
conditionally dependent on any other node.

______________________________________________________________________

************END*******
Course -Introduction to Machine Learning
Assignment- Week 5 (Logistic Regression, SVM, Kernel Function, Kernel
SVM)
TYPE OF QUESTION: MCQ/MSQ
Number of Question: 10 Total Marks:10x2 =20
__________________________________________________________________
Question 1:
What would be the ideal complexity of the curve which can be used for separating the
two classes shown in the image below?

A) Linear
B) Quadratic
C) Cubic
D) insufficient data to draw conclusion

Correct Answer: A
Detailed Solution: The blue point in the red region is an outlier. The rest of the data is
linearly separable.
__________________________________________________________________

Question 2:

Suppose you have a dataset with n=10 features and m=1000 examples. After training a
logistic regression classifier with gradient descent, you find that it has high training error
and does not achieve the desired performance on training and validation sets. Which of
the following might be promising steps to take?
1. Use SVM with a non-linear kernel function
2. Reduce the number of training examples
3. Create or add new polynomial features

A) 1, 2
B) 1, 3
C) 1, 2, 3
D) None
Correct Answer: B
Detailed Solution: As logistic regression did not perform well, it is highly likely that the
dataset is not linearly separable. SVM with a non-linear kernel works well for
non-linearly separable datasets. Creating new polynomial features will also help in
capturing the non-linearity in the dataset.
__________________________________________________________________

Question 3:

In logistic regression, we learn the conditional distribution p(y|x), where y is the class
label and x is a data point. If h(x) is the output of the logistic regression classifier for an
input x, then p(y|x) equals:

𝑦 (1−𝑦)
A. ℎ(𝑥) (1 − ℎ(𝑥))
𝑦 (1−𝑦)
B. ℎ(𝑥) (1 + ℎ(𝑥))
1−𝑦 𝑦
C. ℎ(𝑥) (1 − ℎ(𝑥))
𝑦 (1+𝑦)
D. ℎ(𝑥) (1 + ℎ(𝑥))

Correct Answer: A
Detailed Solution: Refer to the lecture.
__________________________________________________________________

Question 4:

The output of binary class logistic regression lies in the range:


A. [-1,0]
B. [0,1]
C. [-1,-2]
D. [1,10]

Correct Answer: B
Detailed Solution: The output of binary class logistic regression lies in the range:
[0,1].

__________________________________________________________________

Question 5:

State whether True or False.


“After training an SVM, we can discard all examples which are not support
vectors and can still classify new examples.”
A) TRUE
B) FALSE

Correct Answer: A
Detailed Solution : Using only the support vector points, it is possible to classify new
examples.
__________________________________________________________________

Question 6:

Suppose you are dealing with a 3-class classification problem and you want to train a
SVM model on the data. For that you are using the One-vs-all method. How many
times do we need to train our SVM model in such a case?
A) 1
B) 2
C) 3
D) 4

Correct Answer: C
Detailed Solution: In a N-class classification problem, we have to train the SVM N
times in the one vs all method.
__________________________________________________________________
__________________________________________________________________

Question 7:

What is/are true about kernels in SVM?

1. Kernel function can map low dimensional data to high dimensional space
2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these.

Correct Answer: C
Detailed Solution: Kernels are used in SVMs to map low dimensional data into high
dimensional feature space to classify non-linearly separable data. It also acts as a
similarity function.
_________________________________________________________________

Question 8:

If g(z) is the sigmoid function, then its derivative with respect to z may be written in
term of g(z) as

A) g(z)(g(z)-1)
B) g(z)(1+g(z))
C) -g(z)(1+g(z))
D) g(z)(1-g(z))

Correct Answer: D
Detailed Answer:
−𝑧
𝑑 1 𝑒 1 1
𝑔'(𝑧) = 𝑑𝑧
( −𝑧 ) = −𝑧 2 = −𝑧 (1 − −𝑧 ) = 𝑔(𝑧)(1 − 𝑔(𝑧))
1+𝑒 (1+𝑒 ) 1+𝑒 1+𝑒
__________________________________________________________________
Question 9:

Below are the labelled instances of 2 classes and hand drawn decision boundaries for
logistic regression. Which of the following figures demonstrates overfitting of the
training data?

A) A
B) B
C) C
D) None of these

Correct Answer: C
Detailed Solution: In figure 3, the decision boundary is very complex and unlikely to
generalize the data.
__________________________________________________________________
Question 10:

What do you conclude after seeing the visualization in the previous question (Question
9)?

C1. The training error in the first plot is higher as compared to the second and third
plot.
C2. The best model for this regression problem is the last (third) plot because it
has minimum training error (zero).
C3. Out of the 3 models, the second model is expected to perform best on
unseen data.
C4. All will perform similarly because we have not seen the test data.

A) C1 and C2
B) C1 and C3
C) C2 and C3
D) C4

Correct Answer: B
Detailed Solution: From the visualization, it is clear that the misclassified samples
are more in the plot A when compared to B and C. So, C1 is correct. In figure 3, the
training error is less due to complex boundaries. So, it is unlikely to generalize the
data well. Therefore, option C2 is wrong.
The first model is very simple and underfits the training data. The third model is very
complex and overfits the training data. The second model compared to these models
has less training error and is likely to perform well on unseen data. So, C3 is correct.
We can estimate the performance of the model on unseen data by observing the
nature of the decision boundary. Therefore, C4 is incorrect.

__________________________________________________________________
End
Course Name – Introduction To Machine Learning
Assignment – Week 6 (Neural Networks)
TYPE OF QUESTION: MCQ/MSQ

Number of Question: 10 Total Marks: 10x2 = 20

Question 1:

The neural network given below takes two binary valued inputs 𝑥1, 𝑥2 ∈ {0,1} and the
activation function is the binary threshold function ( ℎ(𝑥) = 1 𝑖𝑓 𝑥 > 0; 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ). Which
of the following logical functions does it compute?

A) AND
B) OR
C) NAND
D) None of the above

Correct Answer. A
Detailed Solution: ℎ(𝑥) = 1 𝑖𝑓 (15𝑥1 + 10𝑥2 − 20) > 0; 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
if we write the truth table for ℎ(𝑥) , it will be:

𝑥1 𝑥2 ℎ(𝑥)

0 0 0

0 1 0

1 0 0

1 1 1
The truth table for ℎ(𝑥) is the same as the truth table for the AND logical function.
_____________________________________________________________________________

Question 2:

What is the sequence of the following tasks in a perceptron?

I) Initialize the weights of the perceptron randomly.


II) Go to the next batch of data set.
III) If the prediction does not match the output, change the weights.
IV) For a sample input, compute an output.

A) I, II, III, IV
B) IV, III, II, I
C) III, I, II, IV
D) I, IV, III, II

Correct Answer: D
Detailed Solution: Refer to the lecture. D is the correct sequence.
_____________________________________________________________________________

Question 3:

Suppose you have inputs as x, y, and z with values -2, 5, and -4 respectively. You have a
neuron ‘q’ and neuron ‘f’ with functions:

q=x+y
f=q*z

Graphical representation of the functions is as follows:

What is the gradient of f with respect to x, y, and z?

A) (-3, 4, 4)
B) (4, 4, 3)
C) (-4, -4, 3)
D) (3, -4, -4)
Correct Answer: C
Detailed Solution: To calculate gradient, we should find out (df/dx), (df/dy) and (df/dz).
𝑑𝑓 𝑑
𝑑𝑥
= 𝑑𝑥 ((𝑥 + 𝑦)𝑧) = 𝑧 · 1 = 𝑧 = − 4
𝑑𝑓 𝑑
𝑑𝑦
= 𝑑𝑦
((𝑥 + 𝑦)𝑧) = 𝑧 · 1 = 𝑧 = −4
𝑑𝑓 𝑑
𝑑𝑧
= 𝑑𝑧
((𝑥 + 𝑦)𝑧) = (𝑥 + 𝑦) = (− 2 + 5) = 3
_____________________________________________________________________________

Question 4:

For a fully-connected neural network with one hidden layer, what effect should increasing the
number of hidden units have on bias and variance?
A. Decrease bias, increase variance
B. Increase bias, increase variance
C. Increase bias, decrease variance
D. No effect

Correct Answer: A
Detailed Solution: Adding more hidden units should decrease bias and increase variance. In
general, more complicated models will result in lower bias but higher variance, and adding
more hidden units certainly makes the model more complex.
_____________________________________________________________________________

Question 5:

Which of the following is true about model capacity (where model capacity means the ability
of a neural network to approximate complex functions)?

A) As number of hidden layers increase, model capacity increases


B) As dropout ratio increases, model capacity increases
C) As learning rate increases, model capacity increases
D) None of these.

Correct Answer: A
Detailed Solution: As the number of hidden layers increase, the ability of the neural network to
model complex functions increases.
_____________________________________________________________________________

Question 6:

The back-propagation learning algorithm applied to a two layer neural network


A) always finds the globally optimal solution.
B) finds a locally optimal solution which may be globally optimal.
C) never finds the globally optimal solution.
D) finds a locally optimal solution which is never globally optimal

Correct Answer. B
Detailed Solution: The back-propagation algorithm finds a local optimal solution, which may
be a global optimal solution.
_____________________________________________________________________________

Question 7:

Which of the following gives non-linearity to a neural network

A) Gradient descent
B) Bias
C) Sigmoid Activation Function
D) None

Correct Answer: C
Detailed Solution: An activation function such as sigmoid gives non-linearity to the neural
network.
_____________________________________________________________________________

Question 8:

The network that involves backward links from outputs to the inputs and hidden layers is called
as

A) Self-organizing Maps
B) Perceptron
C) Recurrent Neural Networks
D) Multi-Layered Perceptron

Correct Answer: C
Detailed Solution: Recurrent Neural Networks involve backward links from outputs to the
inputs and hidden layers.
_____________________________________________________________________________

Question 9:

A Convolutional Neural Network(CNN) is a Deep Neural Network which can extract various
abstract features from an input required for a given task. Given are the operations performed
by a CNN on an input:
1) Max Pooling
2) Convolution Operation
3) Flatten
4) Forward propagation by Fully Connected Network

Identify the correct sequence of operations performed from the options below:

A) 4,3,2,1
B) 2,1,3,4
C) 3,1,2,4
D) 4,2,1,3

Correct Answer: B
Detailed Solution: Follow the lecture slides.
_____________________________________________________________________________

Question 10:

In training a neural network, we notice that the loss does not increase in the first few starting
epochs: What is the reason for this?
A) The learning Rate is low.
B) The Regularization Parameter is High.
C) Stuck at the Local Minima.
D) All of the above could be the reason.

Correct Answer: D
Detailed Solution: The problem can occur due to any one of the reasons above.
_____________________________________________________________________________

END
Course Name: Introduction to Machine Learning
Assignment – Week 7 (Computational Learning theory, PAC Learning, Sample
Complexity, VC Dimension, Ensemble Learning)
TYPE OF QUESTION: MCQ/MSQ

Number of Question: 10 Total Marks: 10X2 = 20


____________________________________________________________________

Question 1:

Which of the following options is / are correct regarding the benefits of ensemble model?

1. Better performance
2. More generalized model
3. Better interpretability

A) 1 and 3
B) 2 and 3
C) 1 and 2
D) 1, 2 and 3

Correct Answer: C
Detailed Solution: 1 and 2 are the benefits of ensemble models. Option 3 is incorrect because
when we ensemble multiple models, we lose interpretability of the models)
____________________________________________________________________

Question 2:

In AdaBoost, we give more weights to points having been misclassified in previous iterations.
Now, if we introduce a limit or cap on the weight that any point can take (for example, say we
introduce a restriction that prevents any point’s weight from exceeding a value of 10). Which
among the following would be the effect of such a modification?

A) It will have no effect on the performance of the Adaboost method.


B) It makes the final classifier robust to outliers.
C) It may result in lower overall performance.
D) None of these.

Correct Answer: B, C
Detailed Solution: Outliers tend to get misclassified. As the number of iterations increases,
the weight corresponding to outlier points can become very large resulting in subsequent
classifier models trying to classify the outlier points correctly. This generally has an adverse
effect on the overall classifier. Restricting the weights is one way of mitigating this problem.
However, this can also lower the performance of the classifier.
____________________________________________________________________

Question 3:

Identify whether the following statement is true or false:


“Boosting is easy to parallelize whereas bagging is inherently a sequential process.”
A) True
B) False

Correct Answer: B) False.


Detailed Solution: Bagging is easy to parallelize whereas boosting is inherently a sequential
process.
________________________________________________________________
________________________________________________________________

Question 4:

Considering the AdaBoost algorithm, which among the following statements is true?
A) In each stage, we try to train a classifier which makes accurate predictions on a subset
of the data points where the subset contains more of the data points which were
misclassified in earlier stages.
B) The weight assigned to an individual classifier depends upon the weighted sum error of
misclassified points for that classifier.
C) Both option A and B are true
D) None of them are true

Correct Answer: C
Detailed Solution: In each stage, Adaboost algorithm tries to train a classifier which makes
accurate predictions on a subset of the data points where the subset contains more of the data
points which were misclassified in earlier stages. The weight assigned to an individual classifier
depends upon the weighted sum error of misclassified points for that classifier.
____________________________________________________________________

Question 5:

Which of the following is FALSE about bagging?


A) Bagging increases the variance of the classifier
B) Bagging can help make robust classifiers from unstable classifiers.
C) Majority Voting is one way of combining outputs from various classifiers which are
being bagged.

Correct Answer: A
Detailed Answer: Bagging decreases the variance of the classifier.
____________________________________________________________________
Question 6:
Suppose the VC dimension of a hypothesis space is 6. Which of the following are true?

A) At least one set of 6 points can be shattered by the hypothesis space.


B) Two sets of 6 points can be shattered by the hypothesis space.
C) All sets of 6 points can be shattered by the hypothesis space.
D) No set of 7 points can be shattered by the hypothesis space.

Correct Answer: A, D
Detailed Solution: If the VC dimension of a hypothesis is d:
● There exists at least one set of d points that can be shattered by the hypothesis space.
● No set of (d+1) points can be shattered by the hypothesis space.
____________________________________________________________________

Question 7:

Identify whether the following statement is true or false:


“Ensembles will yield bad results when there is a significant diversity among the models.”
A) True
B) False

Correct Answer: B
Detailed Solution: Ensemble is a collection of a diverse set of learners to improve the stability
and the performance of the algorithm. So, the more diverse the models are, the better will be the
performance of the ensemble.
____________________________________________________________________

Question 8:

Which of the following algorithms is not an ensemble learning algorithm?


A) Random Forest
B) Adaboost
C) Decision Trees

Correct Answer: C.
Detailed Solution: Decision trees do not aggregate the results of multiple trees, so it is not an
ensemble algorithm.
____________________________________________________________________
Question 9:

Suppose you have run Adaboost on a training set for three boosting iterations. The results are
classifiers h1, h2, and h3, with coefficients α1 = 0.2, α2 = −0.3, and α3 = −0.2. For a given test
input x, you find that the classifiers results are h1(x) = 1, h2(x) = 1, and h3(x) = −1, What is the
class returned by the Adaboost ensemble classifier H on test example x?

A) 1
B) -1

Correct Answer: A
Detailed Solution:
The final output is H(x) = sign((α1*h1(x))+(α2*h2(x))+(α3*h3(x)))
H(x) = sign ((0.2*1) + (−0.3*1) + (−0.2* −1)) = sign(0.1) = 1.
____________________________________________________________________

Question 10:

Generally, an ensemble method works better, if the individual base models have
____________? (Note: Individual models have accuracy greater than 50%)
A) Less correlation among predictions
B) High correlation among predictions
C) Correlation does not have an impact on the ensemble output
D) None of the above.

Correct Answer: A
Detailed Solution: A lower correlation among ensemble model members will increase the
error-correcting capability of the model. So it is preferred to use models with low correlations
when creating ensembles.

____________________________________________________________________
END
Course Name: Introduction to Machine Learning
Assignment – Week 8 (Clustering)
TYPE OF QUESTION: MCQ/MSQ

Number of Question: 10 Total Marks: 10x2 = 20


_____________________________________________________________________________
Question 1:
Do the clustering results of the K-Means algorithm depend on the initial cluster centroid choices?
A) Yes
B) No

Correct Answer: A
Detailed Solution: K-Means clustering algorithm may converge on local minima which might
also correspond to the global minima in some cases but not always. Different initial centroid
choices may produce different clustering results.
_____________________________________________________________________________

Question 2:

Which of the following can act as possible termination conditions in K-Means?


I. Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
II. Centroids do not change between successive iterations.
A) I only
B) II only
C) I and II

Correct Answer: C
Detailed Solution: Both the conditions can act as possible termination conditions.
_____________________________________________________________________________
___________________________________________________________________________

Question 3:
Assume, you want to cluster 7 observations into 3 clusters using K-Means clustering
algorithm. After first iteration the clusters: C1, C2, C3 has the following observations:
C1: {(1,1), (4,4), (7,7)}

C2: {(0,4), (4,0)}

C3: {(5,5), (9,9)}

What will be the cluster centroids after the first iteration?

A) C1: (4,4), C2: (2,2), C3: (7,7)


B) C1: (2,2), C2: (0,0), C3: (5,5)
C) C1: (6,6), C2: (4,4), C3: (9,9)
D) None of these

Correct Answer: A
Detailed Solution:
Finding centroid for data points in cluster C1 = ((1+4+7)/3, (1+4+7)/3) = (4, 4)
Finding centroid for data points in cluster C2 = ((0+4)/2, (4+0)/2) = (2, 2)
Finding centroid for data points in cluster C3 = ((5+9)/2, (5+9)/2) = (7, 7)
Hence, C1: (4,4), C2: (2,2), C3: (7,7)
_____________________________________________________________________________

Question 4:

In single-link clustering, the similarity of two clusters is the similarity of their most similar
members. What is the time complexity of the single-link clustering algorithm? (Note: n is
the number of data points)
A) O(n2)
B) O(n2 log n)
C) O(n3 log n)
D) O(n3)

Correct Answer. A
Detailed Solution: Refer to the lecture.
_____________________________________________________________________________
Question 5:

Given, six points with the following attributes:

Point x coordinate y coordinate

p1 0.4005 0.5306

p2 0.2148 0.3854

p3 0.3457 0.3156

p4 0.2652 0.1875

p5 0.0789 0.4139

p6 0.4548 0.3022

Table 1: x-y coordinates of six points

p1 p2 p3 p4 p5 p6

p1 0.000 0.2357 0.2218 0.3688 0.3421 0.2347

p2 0.2357 0.0000 0.1483 0.2042 0.1388 0.2540

p3 0.2218 0.1483 0.000 0.1513 0.2843 0.1100

p4 0.3688 0.2042 0.1513 0.0000 0.2932 0.2216

p5 0.3421 0.1388 0.2843 0.2932 0.0000 0.3921

p6 0.2347 0.2540 0.1100 0.2216 0.3921 0.0000

Table 2: Distance Matrix for six points


Which of the following clustering representations and dendrogram depicts the use of MIN or
Single link proximity function in hierarchical clustering?

A)

B)
C)

D)
Correct Answer: A
Detailed Solution: For the single link or MIN version of hierarchical clustering, the proximity
of two clusters is defined to be the minimum of the distance between any two points in the
different clusters. For instance, from the table, we see that the distance between points 3 and 6
is 0.11, and that is the height at which they are joined into one cluster in the dendrogram. As
another example, the distance between clusters {3, 6} and {2, 5} is given by dist ({3, 6}, {2,
5}) = min (dis (3, 2), dist (6, 2), dist (3, 5), dist (6, 5)) = min (0.1483, 0.2540, 0.2843, 0.3921)
= 0.1483.
_____________________________________________________________________________

Question 6:

Is it possible that assignment of observations to clusters does not change between successive
iterations of K-means?
A) Yes
B) No
C) Can’t say
D) None of these

Correct Answer: A
Detailed Solution: When the K-means has reached the global or local minima, it will not alter the
assignment of data points to clusters in successive iterations.
____________________________________________________________________

Question 7:
Which of the following is not a clustering approach?

A) Hierarchical
B) Partitioning
C) Bagging
D) Density-Based

Correct Answer: C
Detailed Solution: Bagging is not a clustering technique.
_____________________________________________________________________________

Question 8:
In which of the following cases will K-Means clustering fail to give good results?
A) Data points with outliers
B) Data points with round shapes
C) Data points with non-convex shapes
D) Data points with different densities
Correct Answer: A, C, D
Detailed Solution: K-Means clustering algorithm fails to give good results when the data contains
outliers, the density spread of data points across the data space is different and the data points
follow non-convex shapes.
_____________________________________________________________________________

Question 9:

Given, A = {0,1,2,5,6} and B = {0,2,3,4,5,7,9}, calculate Jaccard Index of these two sets.
A) 0.50
B) 0.25
C) 0.33
D) 0.41

Correct Answer. C
|𝐴⋂𝐵|
3
Detailed Solution: Jaccard Index 𝐽(𝐴, 𝐵) = = 9
= 0. 33
|𝐴⋃𝐵|

_____________________________________________________________________________

Question 10:

Which of the following statements is/are not true about k−means clustering?

A) It is an unsupervised learning algorithm


B) Overlapping of clusters is allowed in k−means clustering
C) It is a hard-clustering technique
D) k is a hyperparameter in k-means

Correct Answer: B

Detailed Solution: Overlapping of clusters is not allowed in k-means.

_____________________________________________________________________________

END

You might also like