Machine Learning Midterm
Machine Learning Midterm
ASSIGNMENT
1) What is classification, and what are the different families of classification
algorithms? Explain with examples.
2) What is clustering, and what are the different types of clustering
algorithms? Provide examples of applications of clustering in different
domains.
3) What are the different types of machine learning, and how do they differ
from each other? Provide examples of algorithms for each type.
4) What is a Naive Bayes classifier, and how does it differ from other
classifiers? Provide an example of a problem that can be solved using a
Naive Bayes classifier.
5) What is a K-Nearest Neighbors classifier, and how does it work? Provide
an example of a problem that can be solved using a K-Nearest Neighbors
classifier.
Maximizing Margin:
The margin is the distance between the support vectors and the hyperplane. SVM
selects the hyperplane that maximizes this margin, as it provides the greatest margin
of safety against misclassification.
Optimization:
SVM solves an optimization problem to find the optimal hyperplane. It aims to
minimize the classification error while maximizing the margin. This optimization is
typically performed using techniques such as quadratic programming or gradient
descent.
Kernel Trick:
SVM can handle non-linearly separable data by mapping the input features into a
higher-dimensional space using a kernel function. In this higher-dimensional space,
the classes may become linearly separable, allowing SVM to find a hyperplane.
Common kernel functions include polynomial kernels, radial basis function (RBF)
kernels, and sigmoid kernels.
Example Problem:
An example problem that can be solved using an SVM is classifying emails as spam
or non-spam based on their content (text data). The features in this problem could be
various characteristics of the emails, such as the frequency of certain words or
phrases.
Features:
Word frequency: Number of times specific words or phrases appear in the email.
Email length: Number of words or characters in the email.
Presence of attachments: Boolean indicator of whether the email contains
attachments.
Labels:
Boosting:
o Boosting is an iterative ensemble learning technique where base
learners are trained sequentially, and each subsequent learner focuses
on the examples that were misclassified by the previous ones. The key
components of boosting are as follows:
o Sequential Training: Base learners are trained sequentially, and each
subsequent learner pays more attention to the examples that were
misclassified by the previous ones.
o Weighted Data: During training, each example is assigned a weight
based on its difficulty in the training process. Misclassified examples
are given higher weights to ensure that subsequent base learners focus
more on correcting these mistakes.
o Combining Predictions: Unlike bagging, where predictions are
combined using simple averaging or voting, boosting assigns weights
to each base learner's prediction based on its performance during
training. Final predictions are made by combining these weighted
predictions.
o Boosting aims to reduce bias and variance by iteratively focusing on
the examples that are difficult to classify. Popular boosting algorithms
include AdaBoost (Adaptive Boosting) and Gradient Boosting
Machines (GBM), which differ in their specific methodologies for
assigning weights to base learners and combining predictions.
4. Explain decision tree algorithm in detail? [X2] IMP
Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and
each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
The decisions or the test are performed on the basis of features of the given
dataset.
It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
In order to build a tree, we use the CART algorithm, which stands for
Classification and Regression Tree algorithm.
A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
b.
4) Making Predictions:
a. Once the posterior probabilities are computed for each class, the class
with the highest posterior probability is selected as the predicted class
label for the given set of features.
Types of Generative Models:
Naive Bayes Classifier:
o Naive Bayes is one of the most commonly used generative probabilistic
classifiers. It assumes that the features are conditionally independent given
the class label, which simplifies the modeling process.
Gaussian Naive Bayes:
o Gaussian Naive Bayes is a variant of the Naive Bayes classifier that
assumes that the class-conditional distributions of the features given the
class labels are Gaussian (normal) distributions.
Linear Discriminant Analysis (LDA):
o LDA is another generative model that assumes that the class-conditional
distributions of the features given the class labels are multivariate
Gaussian distributions with a shared covariance matrix.
9. What are the goals of machine learning? [X2]
The goals of machine learning can be broadly categorized into two main
objectives:
Prediction:
o Prediction is one of the primary goals of machine learning, where the
aim is to develop models that can accurately predict future outcomes or
behaviors based on historical data.
o This includes tasks such as regression, classification, and time series
forecasting, where the model learns patterns and relationships from
data to make predictions about unseen instances.
Understanding and Interpretation:
o Another important goal of machine learning is to gain insights and
understanding from data, helping humans comprehend complex
phenomena or make informed decisions.
o This involves developing models that not only provide accurate
predictions but also offer explanations or interpretations of how and
why certain predictions are made.
o Interpretability is crucial in domains like healthcare, finance, and law,
where decisions based on machine learning models need to be
transparent and explainable.
10.What is learning? Discuss any four Learning Techniques. [x2]
Learning, in the context of machine learning, refers to the process of training a
model to recognize patterns or relationships in data and make predictions or
decisions based on those patterns. It involves the model adjusting its
parameters or internal representations through exposure to examples (training
data), with the goal of improving its performance on unseen data.
Four Learning Techniques:
Supervised Learning:
o Supervised learning is a type of machine learning where the model is
trained on labeled data, meaning each input is paired with an output
label. The goal is to learn a mapping from inputs to outputs, such that
the model can accurately predict the output for new inputs.
o Common algorithms include:
o Linear Regression: Predicts a continuous output variable based on
input features.
o Logistic Regression: Predicts the probability of an input belonging to a
certain class.
o Support Vector Machines (SVM): Finds the optimal hyperplane that
separates different classes in feature space.
Unsupervised Learning:
o Unsupervised learning involves training a model on unlabeled data,
where the goal is to uncover hidden patterns or structures in the data
without explicit guidance.
o Common techniques include:
o Clustering: Groups similar data points together based on their
characteristics, such as K-means clustering and hierarchical clustering.
Reinforcement Learning:
o Reinforcement learning is a type of learning where an agent learns to
interact with an environment by performing actions and receiving
rewards or penalties in return. The goal is to learn a policy that
maximizes cumulative rewards over time.
o Key components include:
o Agent: The learner or decision-maker that interacts with the
environment.
o Environment: The external system or world that the agent interacts
with.
o Rewards: Feedback signals provided by the environment to indicate
the desirability of actions taken by the agent.
o Policy: Strategy or behavior that the agent uses to select actions in
different states.
Semi-Supervised Learning:
o Semi-supervised learning combines elements of supervised and
unsupervised learning, where the model is trained on a combination of
labeled and unlabeled data.
o This approach leverages the abundance of unlabeled data and a smaller
amount of labeled data to improve model performance.
o Techniques include:
o Self-training: Initially, the model is trained on labeled data. Then, it
uses its predictions on unlabeled data to generate pseudo-labels, which
are used to retrain the model iteratively.
o Co-training: The model is trained on different subsets of features or
views of the data. Each subset of features contributes to the learning
process independently.
11.What do you understand by Reinforcement Learning?
.
Reinforcement Learning (RL) is the science of decision making. It is about
learning the optimal behavior in an environment to obtain maximum reward.
In RL, the data is accumulated from machine learning systems that use a trial-
and-error method. Data is not part of the input that we would find in
supervised or unsupervised machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide
which action to take next. After each action, the algorithm receives feedback
that helps it determine whether the choice it made was correct, neutral or
incorrect. It is a good technique to use for automated systems that have to
make a lot of small decisions without human guidance.
Reinforcement learning is an autonomous, self-teaching system that
essentially learns by trial and error. It performs actions with the aim of
maximizing rewards, or in other words, it is learning by doing in order to
achieve the best outcomes.
Prediction Phase:
o When given a new instance to classify, the classifier calculates the
posterior probability of each class label given the features using the
Naive Bayes theorem.
o The class label with the highest posterior probability is assigned as the
predicted label for the new instance.
Advantages of Naive Bayes Classifier:
o Simplicity: Naive Bayes classifiers are simple and easy to implement.
They have few parameters to tune and are computationally efficient.
o Scalability: Naive Bayes classifiers can handle large datasets with high
dimensionality efficiently, making them suitable for big data
applications.
o Robustness to Irrelevant Features: Naive Bayes classifiers are robust to
irrelevant features and noise in the data due to the conditional
independence assumption.
Applications of Naive Bayes Classifier:
o Text Classification: Naive Bayes classifiers are widely used for text
classification tasks such as spam detection, sentiment analysis, and
document categorization.
o Medical Diagnosis: Naive Bayes classifiers can be applied to medical
diagnosis tasks, such as predicting the presence or absence of a disease
based on patient symptoms.
o Recommendation Systems: Naive Bayes classifiers can be used in
recommendation systems to predict user preferences or recommend
products based on user behavior.
23.What do you understand by noise in data? What could be implications on
the result, if noise is not treated properly?
24.When should we use classification over regression? Explain using
example.
The choice between classification and regression depends on the nature of the
problem and the type of output variable being predicted. Here are some
guidelines for when to use classification over regression:
Nature of the Output Variable:
o If the output variable is categorical or qualitative in nature (e.g., classes
or labels), classification is typically more appropriate. For example,
predicting whether an email is spam or not spam, classifying images
into different categories, or identifying whether a patient has a certain
disease.
Discrete Output Space:
o Classification is suitable when the output space is discrete and consists
of a finite number of distinct classes or categories. Each instance is
assigned to one of these predefined classes based on its features.
Interpretability of Results:
o In many cases, classification models provide more interpretable results
than regression models. Class labels are often easier to understand and
communicate, making classification outputs more intuitive for
decision-making.
Imbalanced Data:
o If the dataset is highly imbalanced, with one class significantly
outnumbering the others, classification algorithms are often more
effective in handling such scenarios. They can focus on accurately
identifying instances of the minority class, which may be of particular
interest in certain applications such as fraud detection or medical
diagnosis.
Error Analysis:
o Classification models are often evaluated using metrics such as
accuracy, precision, recall, and F1-score, which provide insights into
the performance of the model in terms of correctly classifying
instances into their respective classes. These metrics are well-suited for
evaluating classification tasks.
25.Define the terms-Precision. Recall, F1-score and accuracy?
26.Define LDA and any of its two limitations.
27.Explain Bayesian estimation and maximum likelihood estimation in
generative learning.
28.Write short note on : Support Vector Machine (SVM)
29.Given a data set and set of machine algorithms, how to choose an
appriopriate algorithm.