ML 21-22 Sem

1 3,4
GROUP – B
(Short Answer Type Questions)
Answer any three from the following: 3×5=15
Marks CO No.
2. (a) What is the difference between a model parameter and a learning
algorithm’s hyper parameter?
Model Parameter:
Model parameters are the internal variables or coefficients that define the structure of the model
and are learned from the training data during the optimization process.
These parameters directly influence the predictions made by the model and are adjusted
through the learning process to minimize the error between the predicted and actual outcomes.
Examples of model parameters include the weights in a neural network, the coefficients in a
linear regression model, or the centroids in a K-means clustering algorithm.
Model parameters are intrinsic to the model and are typically learned automatically during
training.
Learning Algorithm's Hyperparameter:
Hyperparameters are configuration settings external to the model that control the behavior and
performance of the learning algorithm.
They are not learned from the data but are instead specified before the learning process begins
and remain constant throughout training.
Hyperparameters affect the learning process itself, including the optimization algorithm,
regularization techniques, and other aspects of model training.
Examples of hyperparameters include the learning rate in gradient descent optimization, the
regularization parameter in ridge regression, or the number of layers in a neural network.
Hyperparameters need to be tuned and selected carefully to optimize the performance of the
model, often through techniques like grid search, random search, or Bayesian optimization.
(b) What is the purpose of a validation set? 2 1,3

2
Validation set
The validation set is a smaller set of data that you use to tune and optimize your machine
learning algorithm. It contains the input features and the output labels that you do not use for
training, but for evaluating how well your algorithm performs on unseen data. The validation set
is used to compare different versions of your algorithm, such as different hyperparameters,
architectures, or regularization methods, and select the best one based on some metric, such as
accuracy, precision, or recall.
3. (a) Which Linear Regression training algorithm can you use if you have a
training set with millions of features?
3 1,3
If you have a training set with millions of features, traditional linear regression training
algorithms such as Ordinary Least Squares (OLS) may not be suitable due to computational
limitations and potential overfitting issues. In such cases, you can use the following linear
regression training algorithms that are more suitable for high-dimensional data:
Gradient Descent:
Gradient Descent is an iterative optimization algorithm used to minimize the cost function (e.g.,
Mean Squared Error) by adjusting the model parameters iteratively.
It works well for large-scale datasets with many features because it updates the model
parameters based on the gradients of the cost function, rather than computing the inverse of a
large matrix, which can be computationally expensive.
Variants of Gradient Descent, such as Stochastic Gradient Descent (SGD), Mini-batch Gradient
Descent, and Adam, can be used to further improve efficiency and convergence speed.
Stochastic Gradient Descent (SGD):
SGD is a variant of Gradient Descent where the model parameters are updated using a single
training example (or a small subset, known as mini-batch) at a time.
It is well-suited for large-scale datasets because it requires only a small portion of the dataset to
compute each parameter update, making it computationally efficient and scalable.
SGD can handle high-dimensional data efficiently and is commonly used in machine learning
frameworks for training linear regression models with large feature sets.
Coordinate Descent:
Coordinate Descent is an optimization algorithm that updates one model parameter at a time
while holding others fixed.
It is particularly useful for sparse datasets with many zero-valued features, as it only updates the
parameters corresponding to non-zero features.
Coordinate Descent can be parallelized and scaled to handle large feature sets efficiently,
making it suitable for linear regression with millions of features.
(b) Can Gradient Descent be stuck in a local minimum when training a
Logistic Regression model?
No, Gradient Descent is not susceptible to being stuck in a local minimum when training a
Logistic Regression model.
Here's why:
Convex Cost Function: In Logistic Regression, the cost function (often the negative log-
likelihood or the cross-entropy loss) is convex. This means that it has a single global minimum
and no local minima. Therefore, Gradient Descent, being a first-order optimization algorithm,
will always converge to the global minimum regardless of the initial parameters or optimization
path.
Guaranteed Convergence: Gradient Descent, when properly implemented with a suitable

learning rate and convergence criteria, is guaranteed to converge to the global minimum of a
convex cost function. It iteratively updates the model parameters in the direction of the negative
gradient until convergence is reached.
Smoothness of the Cost Function: The cost function in Logistic Regression is smooth and
continuous, without abrupt changes or local irregularities that could lead Gradient Descent to
get stuck in local minima. This smoothness ensures that Gradient Descent can navigate the
parameter space effectively towards the global minimum.
2 1,3
4. (a) What is Bias and Variance in a Machine Learning Model? 3 1,3
In general, a machine learning model analyses the data, find patterns in it and make predictions.
While training, the model learns these patterns in the dataset and applies them to test data for
prediction. While making predictions, a difference occurs between prediction values made by
the model and actual values/expected values, and this difference is known as bias errors or
Errors due to bias. It can be defined as an inability of machine learning algorithms such as
Linear Regression to capture the true relationship between the data points. Each algorithm
begins with some amount of bias because bias occurs from assumptions in the model, which
makes the target function simple to learn. A model has either:
Low Bias: A low bias model will make fewer assumptions about the form of the target function.
High Bias: A model with a high bias makes more assumptions, and the model becomes unable to
capture the important features of our dataset. A high bias model also cannot perform well on
new data.
Generally, a linear algorithm has a high bias, as it makes them learn fast. The simpler the
algorithm, the higher the bias it has likely to be introduced. Whereas a nonlinear algorithm
often has low bias.
Some examples of machine learning algorithms with low bias are Decision Trees, k-Nearest
Neighbours and Support Vector Machines. At the same time, an algorithm with high bias is
Linear Regression, Linear Discriminant Analysis and Logistic Regression.
VARIANCE:
The variance would specify the amount of variation in the prediction if the different training
data was used. In simple words, variance tells that how much a random variable is different
from its expected value. Ideally, a model should not vary too much from one training dataset to
another, which means the algorithm should be good in understanding the hidden mapping
between inputs and output variables. Variance errors are either of low variance or high variance.
Low variance means there is a small variation in the prediction of the target function with
changes in the training data set. At the same time, High variance shows a large variation in the
prediction of the target function with changes in the training dataset.
A model that shows high variance learns a lot and perform well with the training dataset, and
does not generalize well with the unseen dataset. As a result, such a model gives good results
with the training dataset but shows high error rates on the test dataset.
Since, with high variance, the model learns too much from the dataset, it leads to overfitting of
the model. A model with high variance has the below problems:
A high variance model leads to overfitting.
Increase model complexities.
Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.
Bias and Variance in Machine Learning

Some examples of machine learning algorithms with low variance are, Linear Regression,
Logistic Regression, and Linear discriminant analysis. At the same time, algorithms with high
variance are decision tree, Support Vector Machine, and K-nearest neighbours.
(b) What is the Trade-off Between Bias and Variance? 2 1,3
The trade-off between bias and variance is a fundamental concept in machine learning that
relates to the performance of a model. Let's break down this trade-off:
Bias:
Bias refers to the error introduced by approximating a real-world problem with a simplified
model.
A model with high bias tends to make strong assumptions about the underlying data
distribution and may not capture the true relationship between features and the target variable.
Examples of high-bias models include linear regression with too few features or decision trees
with limited depth.
Variance:
Variance refers to the model's sensitivity to fluctuations or noise in the training data.
A model with high variance is highly flexible and captures even the smallest fluctuations in the
training data, leading to a high sensitivity to noise.
Examples of high-variance models include deep neural networks with many layers or decision
trees with high depth.
Now, let's discuss the trade-off between bias and variance:
High Bias-Low Variance:
When a model has high bias and low variance, it means that it makes strong assumptions about
the data and is relatively insensitive to changes in the training set.
This can result in underfitting, where the model fails to capture the underlying patterns in the
data.
Examples include a linear regression model that fits a straight line to data that is inherently
non-linear.
Low Bias-High Variance:
Conversely, when a model has low bias and high variance, it means that it is highly flexible and
captures fine details in the training data.
However, this flexibility can lead to overfitting, where the model learns the noise in the training
data rather than the underlying patterns.
As a result, the model may perform well on the training set but generalize poorly to unseen data.
Examples include complex neural networks or decision trees with high depth that capture noise
in the training data.
The goal in machine learning is to find the right balance between bias and variance, known as
the bias-variance trade-off:
Balanced Model:
A well-balanced model has an appropriate level of complexity that minimizes both bias and
variance.
It captures the underlying patterns in the data without being overly influenced by noise.
Achieving this balance often involves techniques such as regularization, cross-validation, or

ensemble methods.
In summary, the trade-off between bias and variance is about finding the optimal level of model
complexity that minimizes both errors due to bias and errors due to variance, ultimately leading
to better generalization performance on unseen data.
5. (a) Explain the Confusion Matrix with Respect to Machine Learning
Algorithms.
A confusion matrix presents a table layout of the different outcomes of the prediction and results
of a classification problem and helps visualize its outcomes.
It plots a table of all the predicted and actual values of a classifier.
Figure 1: Basic layout of a Confusion Matrix

How to Create a 2x2 Confusion Matrix?
We can obtain four different combinations from the predicted and actual values of a classifier:
Figure 2: Confusion Matrix
 True Positive: The number of times our actual positive values are equal to the predicted
positive. You predicted a positive value, and it is correct.
 False Positive: The number of times our model wrongly predicts negative values as positives.
You predicted a negative value, and it is actually positive.
 True Negative: The number of times our actual negative values are equal to predicted negative
values. You predicted a negative value, and it is actually negative.
 False Negative: The number of times our model wrongly predicts negative values as positives.
You predicted a negative value, and it is actually positive.
3 1,3
(b) What is a False Positive and False Negative and How Are They
Significant?
False Positives (FP):
False positives are cases where the model incorrectly predicts the positive class when the actual
label is negative.
For example, if the model incorrectly predicts that a healthy person has a disease, it's a false
positive.
False Negatives (FN):
False negatives are cases where the model incorrectly predicts the negative class when the actual
label is positive.
For example, if the model incorrectly predicts that a person with a disease does not have the
disease, it's a false negative
2 1,3
6. (a) What is Feed Forward Neural Network? 2 1,3
A feedforward neural network is one of the simplest types of artificial neural networks devised.
In this network, the information moves in only one direction—forward—from the input nodes,
through the hidden nodes (if any), and to the output nodes. There are no cycles or loops in the
network. Feedforward neural networks were the first type of artificial neural network invented
and are simpler than their counterparts like recurrent neural networks and convolutional neural
networks
(b) Explain Back-propagation in detail. 3 1,3
Backpropagation, or backward propagation of errors, is an algorithm that is designed to test for

errors working back from output nodes to input nodes. It's an important mathematical tool for
improving the accuracy of predictions in data mining and machine learning
GROUP – C
(Long Answer Type Questions)
7. (a) How would you define clustering? Can you name a few clustering
algorithms?
5 1,3,4
(b) What is the difference between anomaly detection and novelty
detection?
5 2,4
(c) Can you think of a use case where active learning would be useful?
How would you implement it?
5 2,4
8. (a) Explain Naive Bayes Classifier. 5 3,4

(b) Calculate class probabilities, conditional probabilities and make
predictions on the following data:
Weather Car Class
Sunny Working Go-out
Rainy Broken Go-out
Rainy Broken Stay-home
Sunny Working Stay-home
Sunny Broken Stay-home
5 3,4
(c) Explain k-Nearest Neighbor Classifier. 5 3,4
9. (a) Explain the following Terms in brief:
(i) Confusion Matrix
(ii) Error Rate
(iii) Sensitivity
(iv) Specificity
(v) Precision
5 2,4
(b) Discuss the following Terms in brief:
(i) Accuracy
(ii) Absolute Error
(iii) Squared Error
(iv) Mean Absolute Error

(v) Relative Absolute Error
5 2,4
(c) Explain Bagging and Boosting in case of Ensemble methods. 5 2,4
10. (a) What is Artificial Neural Networks? 5 3,4
(b) Explain the architecture of Artificial Neural Networks. 5 3,4
(c) Design a Neural Network with 2 inputs in the input layer, 2 nodes in a
single hidden layer, and 2 outputs in the output layer. Calculate the
values of hidden nodes and values of output nodes.
5 3,4
11. (a) Given six data points as (1,1), (2,1), (3,5), (4,3) , (4,6), (6,4). Apply
Hierarchical clustering algorithm to develop the dendrogram using
these points.
5 2,4
(b) Find the number of clusters found in the above dendrogram. 5 2,4
(c) Explain Simple, Complete, Average and Centroid Linkages in brief

ML 21-22 Sem

Uploaded by

Copyright:

Available Formats

ML 21-22 Sem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML 21-22 Sem

Uploaded by

Copyright:

Available Formats

1 3,4

(Short Answer Type Questions)

Answer any three from the following: 3×5=15

2. (a) What is the difference between a model parameter and a learning

algorithm’s hyper parameter?

Learning Algorithm's Hyperparameter:

(b) What is the purpose of a validation set? 2 1,3

training set with millions of features?

Stochastic Gradient Descent (SGD):

(b) Can Gradient Descent be stuck in a local minimum when training a

Logistic Regression model?

Guaranteed Convergence: Gradient Descent, when properly implemented with a suitable

4. (a) What is Bias and Variance in a Machine Learning Model? 3 1,3

A high variance model leads to overfitting.

Increase model complexities.

Bias and Variance in Machine Learning

(b) What is the Trade-off Between Bias and Variance? 2 1,3

Now, let's discuss the trade-off between bias and variance:

High Bias-Low Variance:

Low Bias-High Variance:

Achieving this balance often involves techniques such as regularization, cross-validation, or

5. (a) Explain the Confusion Matrix with Respect to Machine Learning

It plots a table of all the predicted and actual values of a classifier.

Figure 1: Basic layout of a Confusion Matrix

Figure 2: Confusion Matrix

False Positives (FP):

6. (a) What is Feed Forward Neural Network? 2 1,3

(b) Explain Back-propagation in detail. 3 1,3

Backpropagation, or backward propagation of errors, is an algorithm that is designed to test for

(Long Answer Type Questions)

(b) What is the difference between anomaly detection and novelty

How would you implement it?

8. (a) Explain Naive Bayes Classifier. 5 3,4

predictions on the following data:

Weather Car Class

Sunny Working Go-out

Rainy Broken Go-out

Sunny Working Go-out

Sunny Working Go-out

Sunny Working Go-out

Rainy Broken Stay-home

Rainy Broken Stay-home

Sunny Working Stay-home

Sunny Broken Stay-home

Rainy Broken Stay-home

(c) Explain k-Nearest Neighbor Classifier. 5 3,4

9. (a) Explain the following Terms in brief:

(i) Confusion Matrix

(ii) Error Rate

(b) Discuss the following Terms in brief:

(ii) Absolute Error

(iii) Squared Error

(iv) Mean Absolute Error

(c) Explain Bagging and Boosting in case of Ensemble methods. 5 2,4

10. (a) What is Artificial Neural Networks? 5 3,4

(b) Explain the architecture of Artificial Neural Networks. 5 3,4

values of hidden nodes and values of output nodes.

Hierarchical clustering algorithm to develop the dendrogram using

(c) Explain Simple, Complete, Average and Centroid Linkages in brief