ML 22-23 Sem, GPT
ML 22-23 Sem, GPT
ML 22-23 Sem, GPT
2 1
Importance:
example.
Image Recognition:
Example: Sentiment analysis for customer feedback. ML models can analyze large
volumes of text data from customer reviews or social media posts to determine
sentiment (positive, negative, neutral). Companies like Airbnb use NLP to analyze
customer reviews and improve their services based on feedback.
Recommendation Systems:
Example: Personalized movie recommendations. Platforms like Netflix and Amazon use
ML algorithms to analyze user preferences and behavior to recommend movies or
products tailored to individual tastes. For example, Netflix's recommendation system
suggests movies based on a user's viewing history and ratings.
Predictive Analytics:
Autonomous Vehicles:
Fraud Detection:
Example: Credit card fraud detection. ML algorithms can analyze transaction data and
identify patterns indicative of fraudulent activity, such as unusual spending patterns
or suspicious transactions. Financial institutions use ML to detect and prevent
fraudulent transactions, protecting customers from identity theft and financial losses.
When there is only one independent feature, it is known as Simple Linear Regression,
and when there are more than one feature, it is known as Multiple Linear Regression.
Similarly, when there is only one dependent variable, it is considered Univariate Linear
Regression, while when there are more than one dependent variables, it is known as
Multivariate Regression.
We can understand the term entropy with any simple example: flipping a coin. When
we flip a coin, then there can be two outcomes. However, it is difficult to conclude
what would be the exact outcome while flipping a coin because there is no direct
relation between flipping a coin and its outcomes. There is a 50% probability of both
outcomes; then, in such scenarios, entropy would be high. This is the essence of
entropy in machine learning.
Information gain is defined as the pattern observed in the dataset and reduction in the entropy.
Machine Learning?
Overfitting:
Overfitting occurs when a model learns the training data too well, capturing noise or
random fluctuations.
Characteristics: It performs well on training data but poorly on unseen test data.
Underfitting:
Underfitting happens when a model is too simple to capture the underlying structure
of the data.
Causes: Using a model that's too simple or has too few parameters.
23
Addressing Overfitting:
Cross-validation: Split the data into multiple training and validation sets to assess the
model's performance. Techniques like k-fold cross-validation help in this regard.
Early stopping: Stop training the model when performance on a validation set starts to
degrade, preventing it from overfitting to the training data.
Addressing Underfitting:
Increasing model complexity: Use a more complex model architecture or increase the
number of parameters to better capture the underlying patterns in the data.
Collect more data: If possible, gather more data to provide the model with more
examples to learn from, particularly if the current dataset is small.
Parameters Vs Hyperparameters
Both parameters and hyperparameters are closely associated with model training
process. They have two different functions in the training process.
Parameters
Parameters are variables that allow the model to learn the rules from the data. During
the training process, they are updated by the algorithm. We do not set optimal values
for the parameters. Instead, parameters learn their own values from data. However, in
some cases, we need to initialize parameters with some relevant values in the
beginning. Once the optimal values for the parameters are found, the model has
finished its training. That model is suitable for making predictions on unseen data.
Hyperparameters
Hyperparameters are also variables that control how the model is training. Therefore,
they can control the values of parameters. In other words, the optimal values of
parameters depend on the values of hyperparameters we use. Unlike parameters,
hyperparameters do not learn their values from data. We need to manually specify
them before training the model. Once specified, the hyperparameter values remain
fixed during the model training process.
GROUP – C
Marks CO No.
Distance metric?
54
functions?
54
8. (a) Given six data points as (1,1), (2,1), (3,5), (4,3), (4,6),
54
dendrogram.
54
Linkages in brief.
54
54
Intrinsic Evaluation:
Cluster Quality Metrics: If the model's goal is clustering, metrics such as silhouette
score, Davies-Bouldin index, or Calinski-Harabasz index can be used to evaluate the
quality of the clusters produced.
Extrinsic Evaluation:
Visual Inspection:
Visualizing the results can provide insights into the effectiveness of the model. For
example, plotting clusters or visualizing the reduced-dimensional space can help
identify patterns or anomalies in the data.
Stability Evaluation:
Silhouette Analysis:
Silhouette analysis can be used to assess the compactness and separation of clusters.
A higher silhouette score indicates better-defined clusters.
Domain-specific Metrics:
Silhouette Score: This measures the compactness and separation of clusters. It ranges
from -1 to 1, where a higher score indicates better-defined clusters.
Davies-Bouldin Index: This evaluates the average similarity between each cluster and
its most similar cluster while also considering cluster dispersion. A lower index
indicates better clustering.
Calinski-Harabasz Index: Also known as the Variance Ratio Criterion, this measures
the ratio of between-cluster dispersion to within-cluster dispersion. A higher index
indicates better clustering.
Inertia: In the context of K-means clustering, inertia measures the sum of squared
distances of samples to their closest cluster center. Lower inertia indicates tighter
clusters.
Adjusted Rand Index (ARI) and Adjusted Mutual Information (AMI): These measures
quantify the similarity between the true cluster assignments and the clusters
produced by the model. Scores closer to 1 indicate better agreement.
Dunn Index: This measures the compactness and separation of clusters, similar to the
silhouette score, but it considers the ratio of the smallest distance between points in
different clusters to the largest intra-cluster distance. Higher values indicate better
clustering.
54
(c) Write short notes on precision, recall, and F1 score. 5 4
82
Image Recognition and Computer Vision: ANNs are widely used for tasks such as
object detection, facial recognition, and image classification. Convolutional Neural
Networks (CNNs), a type of ANN specialized for processing visual data, have achieved
remarkable performance in these tasks.
Natural Language Processing (NLP): ANNs are applied to tasks like sentiment analysis,
text classification, machine translation, and speech recognition. Recurrent Neural
Networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM)
networks, are commonly used for sequential data processing in NLP.
Financial Forecasting and Trading: ANNs are employed for predicting stock prices,
financial trends, and market behavior. They analyze historical data, market indicators,
and news sentiment to make predictions and inform investment decisions.
Healthcare and Medicine: ANNs are used for medical image analysis, disease
diagnosis, drug discovery, and personalized medicine. They analyze medical images
like MRI scans or X-rays, identify anomalies, and assist radiologists in diagnosis.
ANNs are also applied to genomic data analysis for understanding genetic
predispositions to diseases.
Autonomous Vehicles: ANNs play a crucial role in autonomous driving systems. They
process sensor data from cameras, LiDAR, and radar, interpret the surrounding
environment, and make real-time decisions for navigation and control.
Fraud Detection and Cybersecurity: ANNs are utilized for detecting fraudulent
transactions, cybersecurity threats, and network intrusions. They analyze patterns in
data to identify suspicious behavior and prevent fraudulent activities.
Comparison:
Processing Mechanism:
BNNs process information through electrochemical signals, whereas CNs process
digital data signals.
BNNs exhibit complex emergent behavior and learning capabilities, while CNs operate
based on predefined algorithms and protocols.
Scale:
BNNs can learn and adapt by adjusting synaptic weights based on experience and
feedback, enabling capabilities such as learning, memory, and pattern recognition.
CNs do not possess inherent learning capabilities but can be optimized and adapted
through configuration and management.
Purpose:
52
Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main parameters,
i.e., input values, weights and Bias, net sum, and an activation function.
Kernel-based algorithms are a class of machine learning algorithms that utilize kernel
functions to implicitly map input data into a higher-dimensional space, where it may
be easier to separate classes or find patterns. These algorithms are widely used in
both supervised and unsupervised learning tasks, offering flexibility and efficiency in
handling complex data.
Principle: SVMs are binary classifiers that find the optimal hyperplane separating
classes in a high-dimensional feature space.
Kernel Trick: SVMs use kernel functions to map input data into a higher-dimensional
space without explicitly computing the transformation. Common kernel functions
include linear, polynomial, radial basis function (RBF), and sigmoid kernels.
Applications: SVMs are used for classification, regression, and outlier detection tasks
in various domains, including text classification, image recognition, bioinformatics,
and finance.
Kernel Trick: It uses kernel functions to map input features into a high-dimensional
space, where ridge regression is performed.
Applications: Kernel Ridge Regression is used for regression tasks where the
relationship between input and output variables is non-linear, such as predicting
stock prices, modeling complex functions, and time-series forecasting.
Kernel Trick: It applies kernel functions to compute the inner products between data
points in the high-dimensional feature space.
Applications: Kernel PCA is used for feature extraction, visualization, and data
preprocessing tasks, especially when the data has non-linear relationships or complex
structures.
Principle: Gaussian Processes are probabilistic models that use kernel functions to
define covariance between data points, enabling Bayesian inference and prediction.
Kernel Trick: Different kernel functions can be used to model various assumptions
about the underlying data distribution.
Kernel K-Means:
Kernel Trick: It uses kernel functions to compute the pairwise similarities or distances
between data points in a high-dimensional feature space.
Applications: Kernel K-Means is used for clustering tasks where the data is non-
linearly separable, such as image segmentation, text clustering, and pattern
recognition.
https://www.javatpoint.com/machine-learning-naive-bayes-classifier
https://www.javatpoint.com/hierarchical-clustering-in-machine-learning
https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm