ML 22-23 Sem, GPT

2. (a) What is Machine Learning? Why is it important?
2 1
Machine Learning (ML) is a branch of artificial intelligence (AI) that focuses on

developing algorithms and models allowing computers to learn from data patterns
without explicit programming. It's about enabling machines to improve their
performance over time through experience.
Importance:
ML enables data-driven insights by efficiently analyzing vast datasets, uncovering

valuable patterns not easily identifiable by humans.
Automation is a key benefit, as ML algorithms automate tasks that would be

impractical or time-consuming to program manually, such as image recognition and
predictive analytics.
Personalization is another crucial aspect, as ML powers recommendation systems and

customizes user experiences, like those on Netflix or Amazon.
ML enhances decision-making by analyzing complex data and making predictions,

helping businesses optimize processes and identify trends.
ML has led to significant advancements in various fields, including healthcare,

finance, and transportation, improving efficiency and innovation.
(b) Discuss a few applications of Machine learning with
example.
Image Recognition:
Example: Image classification for medical diagnosis. ML algorithms can analyze

medical images like X-rays or MRI scans to identify anomalies or diseases, assisting
radiologists in diagnosis. For instance, Google's DeepMind has developed algorithms
that can detect eye diseases like diabetic retinopathy from retinal images.
Natural Language Processing (NLP):
Example: Sentiment analysis for customer feedback. ML models can analyze large
volumes of text data from customer reviews or social media posts to determine
sentiment (positive, negative, neutral). Companies like Airbnb use NLP to analyze
customer reviews and improve their services based on feedback.
Recommendation Systems:
Example: Personalized movie recommendations. Platforms like Netflix and Amazon use
ML algorithms to analyze user preferences and behavior to recommend movies or
products tailored to individual tastes. For example, Netflix's recommendation system
suggests movies based on a user's viewing history and ratings.
Predictive Analytics:
Example: Predictive maintenance in manufacturing. ML algorithms can analyze

equipment sensor data to predict when machinery is likely to fail, allowing companies
to perform maintenance proactively, minimize downtime, and reduce costs. For
instance, General Electric uses predictive analytics to monitor jet engines and predict
maintenance needs before failures occur.
Autonomous Vehicles:
Example: Self-driving cars. ML algorithms analyze sensor data (such as cameras,

LiDAR, and radar) to perceive the environment and make decisions in real-time,
enabling vehicles to navigate safely without human intervention. Companies like
Waymo and Tesla are developing autonomous driving technology using Machine
Learning.
Fraud Detection:
Example: Credit card fraud detection. ML algorithms can analyze transaction data and
identify patterns indicative of fraudulent activity, such as unusual spending patterns
or suspicious transactions. Financial institutions use ML to detect and prevent
fraudulent transactions, protecting customers from identity theft and financial losses.
3. What is Linear regression? 5 4
Linear regression is a type of supervised machine learning algorithm that computes

the linear relationship between the dependent variable and one or more independent
features by fitting a linear equation to observed data.
When there is only one independent feature, it is known as Simple Linear Regression,
and when there are more than one feature, it is known as Multiple Linear Regression.
Similarly, when there is only one dependent variable, it is considered Univariate Linear
Regression, while when there are more than one dependent variables, it is known as
Multivariate Regression.
4. What is Entropy? What is Information Gain? 5 2

Entropy is the measurement of disorder or impurities in the information processed in
machine learning. It determines how a decision tree chooses to split data.
We can understand the term entropy with any simple example: flipping a coin. When
we flip a coin, then there can be two outcomes. However, it is difficult to conclude
what would be the exact outcome while flipping a coin because there is no direct
relation between flipping a coin and its outcomes. There is a 50% probability of both
outcomes; then, in such scenarios, entropy would be high. This is the essence of
entropy in machine learning.
Information gain is defined as the pattern observed in the dataset and reduction in the entropy.
Mathematically, information gain can be expressed with the below formula:
Information Gain = (Entropy of parent node)-(Entropy of child node)
5. (a) What do you understand by overfitting and underfitting in
Machine Learning?
Overfitting:
Overfitting occurs when a model learns the training data too well, capturing noise or
random fluctuations.
Characteristics: It performs well on training data but poorly on unseen test data.
Causes: Using a complex model with too many parameters or insufficient

regularization.
Example: Classifying outliers as part of the main pattern in a classification task.
Underfitting:
Underfitting happens when a model is too simple to capture the underlying structure
of the data.
Characteristics: Poor performance on both training and test data.
Causes: Using a model that's too simple or has too few parameters.
Example: Fitting a linear model to nonlinear data in a regression task
23
(b) Explain how these problems can be addressed. 3 3
Addressing Overfitting:
Regularization: Introduce penalties to the model's parameters to discourage overfitting.

Techniques like L1 (Lasso) and L2 (Ridge) regularization can help.
Cross-validation: Split the data into multiple training and validation sets to assess the
model's performance. Techniques like k-fold cross-validation help in this regard.
Feature selection/reduction: Remove irrelevant or redundant features to simplify the

model and reduce overfitting. Techniques like feature importance ranking or
dimensionality reduction (e.g., PCA) can be useful.
Ensemble methods: Combine multiple models to reduce overfitting effects. Techniques

like bagging, boosting, or stacking can help in this regard.
Early stopping: Stop training the model when performance on a validation set starts to
degrade, preventing it from overfitting to the training data.
Addressing Underfitting:
Increasing model complexity: Use a more complex model architecture or increase the
number of parameters to better capture the underlying patterns in the data.
Feature engineering: Create additional features or transform existing ones to provide

the model with more information to learn from.
Reducing regularization: If regularization is too strong, it may lead to underfitting.

Adjust regularization parameters to allow the model to learn more from the data.
Collect more data: If possible, gather more data to provide the model with more
examples to learn from, particularly if the current dataset is small.
6. (a) What is the difference between a model parameter and a
learning algorithm’s hyper parameter?
Parameters Vs Hyperparameters
Both parameters and hyperparameters are closely associated with model training
process. They have two different functions in the training process.
Parameters
Parameters are variables that allow the model to learn the rules from the data. During
the training process, they are updated by the algorithm. We do not set optimal values
for the parameters. Instead, parameters learn their own values from data. However, in
some cases, we need to initialize parameters with some relevant values in the
beginning. Once the optimal values for the parameters are found, the model has
finished its training. That model is suitable for making predictions on unseen data.
Hyperparameters
Hyperparameters are also variables that control how the model is training. Therefore,
they can control the values of parameters. In other words, the optimal values of
parameters depend on the values of hyperparameters we use. Unlike parameters,
hyperparameters do not learn their values from data. We need to manually specify
them before training the model. Once specified, the hyperparameter values remain
fixed during the model training process.
(b) What is the purpose of a validation set? 2 1, 3
GROUP – C
(Long Answer Type Questions)
Answer any three from the following 3×15=45
Marks CO No.
7. (a) Why does k-Means Clustering use mostly the Euclidean
Distance metric?
54
(b) How does K-Means perform Clustering? 5 4
(c) What is the difference between Softmax and Sigmoid
functions?
54
8. (a) Given six data points as (1,1), (2,1), (3,5), (4,3), (4,6),
(6,4). Apply Hierarchical clustering algorithm to develop
the dendrogram using these points.
54
(b) Find the number of clusters found in the above
dendrogram.
54
(c) Explain Simple, Complete, Average and Centroid
Linkages in brief.
54
9. (a) How would you evaluate the success of an unsupervised

learning model?
54
Evaluating the success of an unsupervised learning model can be challenging because

there are no predefined labels or ground truth to compare against. However, several
methods can be used to assess the performance and effectiveness of an unsupervised
learning model:
Intrinsic Evaluation:
Cluster Quality Metrics: If the model's goal is clustering, metrics such as silhouette
score, Davies-Bouldin index, or Calinski-Harabasz index can be used to evaluate the
quality of the clusters produced.
Dimensionality Reduction: For dimensionality reduction techniques like PCA (Principal

Component Analysis) or t-SNE (t-distributed Stochastic Neighbor Embedding), metrics
such as explained variance ratio or visualization techniques can help assess how well
the data's structure is preserved in the reduced dimensions.
Extrinsic Evaluation:
Downstream Task Performance: Assess how well the learned representations or

clusters perform on downstream tasks. For example, if clustering is performed for
customer segmentation, evaluate if the resulting clusters lead to better marketing
strategies or customer satisfaction.
Human Evaluation: Sometimes, the best evaluation is human judgment. Experts in

the domain can assess the quality and usefulness of the learned representations or
clusters based on their knowledge and intuition.
Visual Inspection:
Visualizing the results can provide insights into the effectiveness of the model. For
example, plotting clusters or visualizing the reduced-dimensional space can help
identify patterns or anomalies in the data.
Stability Evaluation:
Assessing the stability of the results by repeating the clustering or dimensionality

reduction process with different random seeds or subsets of the data. Consistent
results across multiple runs indicate robustness.
Silhouette Analysis:
Silhouette analysis can be used to assess the compactness and separation of clusters.
A higher silhouette score indicates better-defined clusters.
Domain-specific Metrics:
In some cases, domain-specific metrics or criteria may be developed to evaluate the

success of the unsupervised learning model based on the specific objectives and
requirements of the application
(b) What are the most popular measures of performance for an
unsupervised learning model?
Silhouette Score: This measures the compactness and separation of clusters. It ranges
from -1 to 1, where a higher score indicates better-defined clusters.
Davies-Bouldin Index: This evaluates the average similarity between each cluster and
its most similar cluster while also considering cluster dispersion. A lower index
indicates better clustering.
Calinski-Harabasz Index: Also known as the Variance Ratio Criterion, this measures
the ratio of between-cluster dispersion to within-cluster dispersion. A higher index
indicates better clustering.
Inertia: In the context of K-means clustering, inertia measures the sum of squared
distances of samples to their closest cluster center. Lower inertia indicates tighter
clusters.
Explained Variance Ratio: In dimensionality reduction techniques like PCA, this

measures the proportion of variance explained by each principal component. Higher
explained variance ratios indicate that more variance in the data is captured by the
reduced dimensions.
Adjusted Rand Index (ARI) and Adjusted Mutual Information (AMI): These measures
quantify the similarity between the true cluster assignments and the clusters
produced by the model. Scores closer to 1 indicate better agreement.
Homogeneity, Completeness, and V-measure: These metrics evaluate the homogeneity

and completeness of clusters, as well as their harmonic mean (V-measure),
considering both aspects simultaneously.
Dunn Index: This measures the compactness and separation of clusters, similar to the
silhouette score, but it considers the ratio of the smallest distance between points in
different clusters to the largest intra-cluster distance. Higher values indicate better
clustering.
54
(c) Write short notes on precision, recall, and F1 score. 5 4
10. (a) What is Artificial Neural Network? What are the
applications of ANN in ML?
82
An Artificial Neural Network (ANN) is a computational model inspired by the structure

and functioning of biological neural networks in the human brain. It consists of
interconnected nodes, called neurons, organized into layers. Each neuron receives
input signals, processes them through an activation function, and produces an output
signal that is passed to the next layer. ANNs are trained using a learning algorithm,
such as backpropagation, to adjust the strength of connections (weights) between
neurons based on the observed error between predicted and actual outputs.
Applications of Artificial Neural Networks in Machine Learning include:
Image Recognition and Computer Vision: ANNs are widely used for tasks such as
object detection, facial recognition, and image classification. Convolutional Neural
Networks (CNNs), a type of ANN specialized for processing visual data, have achieved
remarkable performance in these tasks.
Natural Language Processing (NLP): ANNs are applied to tasks like sentiment analysis,
text classification, machine translation, and speech recognition. Recurrent Neural
Networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM)
networks, are commonly used for sequential data processing in NLP.
Recommendation Systems: ANNs are used to build recommendation systems in e-

commerce, streaming services, and social media platforms. They analyze user behavior
and preferences to provide personalized recommendations for products, movies,
music, or content.
Financial Forecasting and Trading: ANNs are employed for predicting stock prices,
financial trends, and market behavior. They analyze historical data, market indicators,
and news sentiment to make predictions and inform investment decisions.
Healthcare and Medicine: ANNs are used for medical image analysis, disease
diagnosis, drug discovery, and personalized medicine. They analyze medical images
like MRI scans or X-rays, identify anomalies, and assist radiologists in diagnosis.
ANNs are also applied to genomic data analysis for understanding genetic
predispositions to diseases.
Autonomous Vehicles: ANNs play a crucial role in autonomous driving systems. They
process sensor data from cameras, LiDAR, and radar, interpret the surrounding
environment, and make real-time decisions for navigation and control.
Fraud Detection and Cybersecurity: ANNs are utilized for detecting fraudulent
transactions, cybersecurity threats, and network intrusions. They analyze patterns in
data to identify suspicious behavior and prevent fraudulent activities.
(b) Compare between Biological Neural Network and
Computer Networks with proper diagram.
Comparison:
Processing Mechanism:
BNNs process information through electrochemical signals, whereas CNs process
digital data signals.
BNNs exhibit complex emergent behavior and learning capabilities, while CNs operate
based on predefined algorithms and protocols.
Scale:
BNNs typically consist of billions of neurons interconnected in dense networks, while

CNs can vary in size and scale from small local networks to global interconnected
networks (like the Internet).
Learning and Adaptation:
BNNs can learn and adapt by adjusting synaptic weights based on experience and
feedback, enabling capabilities such as learning, memory, and pattern recognition.
CNs do not possess inherent learning capabilities but can be optimized and adapted
through configuration and management.
Purpose:
BNNs are biological structures that facilitate cognitive functions in organisms,

including perception, cognition, and motor control. CNs are engineered systems
designed for communication and data exchange between devices and users
52
(c) What is perception model? 2 2
erceptron is Machine Learning algorithm for supervised learning of various binary

classification tasks. Further, Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input data computations in business
intelligence.
Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main parameters,
i.e., input values, weights and Bias, net sum, and an activation function.
11. Write short notes on (any three) of the followings. 3×5=15
(a) Gaussian mixture model 5 5

A Gaussian Mixture Model (GMM) is a probabilistic model used for clustering and
density estimation. It represents the distribution of data as a mixture of several
Gaussian (normal) distributions, each characterized by its mean and covariance.
Here's an overview of Gaussian Mixture Models
(b) Kernel-based algorithms 5 5
Kernel-based algorithms are a class of machine learning algorithms that utilize kernel
functions to implicitly map input data into a higher-dimensional space, where it may
be easier to separate classes or find patterns. These algorithms are widely used in
both supervised and unsupervised learning tasks, offering flexibility and efficiency in
handling complex data.
Here's an overview of kernel-based algorithms along with their applications:
Support Vector Machines (SVM):
Principle: SVMs are binary classifiers that find the optimal hyperplane separating
classes in a high-dimensional feature space.
Kernel Trick: SVMs use kernel functions to map input data into a higher-dimensional
space without explicitly computing the transformation. Common kernel functions
include linear, polynomial, radial basis function (RBF), and sigmoid kernels.
Applications: SVMs are used for classification, regression, and outlier detection tasks
in various domains, including text classification, image recognition, bioinformatics,
and finance.
Kernel Ridge Regression:
Principle: Kernel Ridge Regression is a regression algorithm that combines ridge

regression with kernel methods for non-linear regression tasks.
Kernel Trick: It uses kernel functions to map input features into a high-dimensional
space, where ridge regression is performed.
Applications: Kernel Ridge Regression is used for regression tasks where the
relationship between input and output variables is non-linear, such as predicting
stock prices, modeling complex functions, and time-series forecasting.
Kernel PCA (Principal Component Analysis):
Principle: Kernel PCA is a non-linear dimensionality reduction technique that extends

traditional PCA by using kernel functions to project data into a higher-dimensional
space.
Kernel Trick: It applies kernel functions to compute the inner products between data
points in the high-dimensional feature space.
Applications: Kernel PCA is used for feature extraction, visualization, and data
preprocessing tasks, especially when the data has non-linear relationships or complex
structures.
Gaussian Processes (GP):
Principle: Gaussian Processes are probabilistic models that use kernel functions to
define covariance between data points, enabling Bayesian inference and prediction.
Kernel Trick: Different kernel functions can be used to model various assumptions
about the underlying data distribution.
Applications: Gaussian Processes are used for regression, classification, uncertainty

estimation, and optimization tasks in machine learning, particularly in scenarios with
limited data and noisy observations.
Kernel K-Means:
Principle: Kernel K-Means extends the traditional K-Means clustering algorithm by

incorporating kernel functions to measure similarity between data points.
Kernel Trick: It uses kernel functions to compute the pairwise similarities or distances
between data points in a high-dimensional feature space.
Applications: Kernel K-Means is used for clustering tasks where the data is non-
linearly separable, such as image segmentation, text clustering, and pattern
recognition.
(c) Naïve Bayes model for classification 5 5
https://www.javatpoint.com/machine-learning-naive-bayes-classifier
(d) Hierarchical clustering 5 5
https://www.javatpoint.com/hierarchical-clustering-in-machine-learning
(e) Support vector machine
https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm

ML 22-23 Sem, GPT

Uploaded by

Document Informationclick to expand document informationml

Document Informationclick to expand document information

Copyright:

Available Formats

ML 22-23 Sem, GPT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML 22-23 Sem, GPT

Uploaded by

Copyright:

Available Formats

2. (a) What is Machine Learning? Why is it important?

Machine Learning (ML) is a branch of artificial intelligence (AI) that focuses on

ML enables data-driven insights by efficiently analyzing vast datasets, uncovering

Automation is a key benefit, as ML algorithms automate tasks that would be

Personalization is another crucial aspect, as ML powers recommendation systems and

ML enhances decision-making by analyzing complex data and making predictions,

ML has led to significant advancements in various fields, including healthcare,

(b) Discuss a few applications of Machine learning with

Example: Image classification for medical diagnosis. ML algorithms can analyze

Natural Language Processing (NLP):

Example: Predictive maintenance in manufacturing. ML algorithms can analyze

Example: Self-driving cars. ML algorithms analyze sensor data (such as cameras,

3. What is Linear regression? 5 4

Linear regression is a type of supervised machine learning algorithm that computes

4. What is Entropy? What is Information Gain? 5 2

Mathematically, information gain can be expressed with the below formula:

Information Gain = (Entropy of parent node)-(Entropy of child node)

5. (a) What do you understand by overfitting and underfitting in

Causes: Using a complex model with too many parameters or insufficient

Example: Classifying outliers as part of the main pattern in a classification task.

Characteristics: Poor performance on both training and test data.

Example: Fitting a linear model to nonlinear data in a regression task

(b) Explain how these problems can be addressed. 3 3

Regularization: Introduce penalties to the model's parameters to discourage overfitting.

Feature selection/reduction: Remove irrelevant or redundant features to simplify the

Ensemble methods: Combine multiple models to reduce overfitting effects. Techniques

Feature engineering: Create additional features or transform existing ones to provide

Reducing regularization: If regularization is too strong, it may lead to underfitting.

6. (a) What is the difference between a model parameter and a

learning algorithm’s hyper parameter?

(b) What is the purpose of a validation set? 2 1, 3

(Long Answer Type Questions)

Answer any three from the following 3×15=45

7. (a) Why does k-Means Clustering use mostly the Euclidean

(b) How does K-Means perform Clustering? 5 4

(c) What is the difference between Softmax and Sigmoid

(6,4). Apply Hierarchical clustering algorithm to develop

the dendrogram using these points.

(b) Find the number of clusters found in the above

(c) Explain Simple, Complete, Average and Centroid

9. (a) How would you evaluate the success of an unsupervised

Evaluating the success of an unsupervised learning model can be challenging because

Dimensionality Reduction: For dimensionality reduction techniques like PCA (Principal

Downstream Task Performance: Assess how well the learned representations or

Human Evaluation: Sometimes, the best evaluation is human judgment. Experts in

Assessing the stability of the results by repeating the clustering or dimensionality

In some cases, domain-specific metrics or criteria may be developed to evaluate the

(b) What are the most popular measures of performance for an

unsupervised learning model?

Explained Variance Ratio: In dimensionality reduction techniques like PCA, this

Homogeneity, Completeness, and V-measure: These metrics evaluate the homogeneity

10. (a) What is Artificial Neural Network? What are the

applications of ANN in ML?

An Artificial Neural Network (ANN) is a computational model inspired by the structure

Recommendation Systems: ANNs are used to build recommendation systems in e-

(b) Compare between Biological Neural Network and

Computer Networks with proper diagram.