0% found this document useful (0 votes)

16 views

Machine Learning

Uploaded by

Malli

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Machine Learning

Uploaded by

Malli

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Machine Learning

Machine Learning refers “Samples of all variables in the data set and
is divided into training, selection, and test samples”. A sample contains
one (or) more features and possibly a label. Samples can be labeled (or)
unlabeled. It deals with algorithms based on multi-layered Artificial Neural
Networks (ANN) that are inspired by the structure of the human brain.

AI refers to computer systems capable of performing complex tasks that

historically only a human could do, such as reasoning, making decisions, (or)
solving problems.

Deep learning algorithms are less linear, more complex, and

hierarchical, capable of learning from “huge amounts of data, and able
to produce highly accurate results”. Language translation, image
recognition, and personalized medicines are some examples of deep learning
applications.
1) The problems that can be solved with machine learning:-

It can be solved a wide range of problems across various

domains. They are
1. Image and Speech Recognition: ML algorithms can recognize and
classify objects in images (or) transcribe and understand spoken language.
Example:- facial recognition systems and virtual assistants like Siri (or)
Alexa use ML techniques.
2. Predictive Analytics: ML can forecast future trends based on historical
data. This is used in stock market predictions, weather forecasting, and
predicting customer behavior (or) sales trends.
3. Natural Language Processing (NLP): ML models can analyze, interpret,
and generate human language. Applications include chatbots, translation
services, and sentiment analysis.
4. Recommendation Systems: ML algorithms power recommendations on
platforms like Netflix (or) Amazon by analyzing past behavior and
preferences to suggest products (or) content.
5. Anomaly Detection: ML can identify unusual patterns that might indicate
fraud, network intrusions, (or) defects in manufacturing processes.
6. Healthcare Diagnostics: ML models can assist in diagnosing diseases,
predicting patient outcomes, and personalizing treatment plans based on
patient data.
7. Autonomous Vehicles: ML is used in self-driving cars to interpret sensor
data, recognize objects, and make real-time decisions for safe navigation.
8. Personalization: ML can tailor user experiences by personalizing content,
advertisements, and interactions based on individual preferences and
behavior.
9. Optimization Problems: ML can solve complex optimization problems in
areas like logistics, supply chain management, and resource allocation.
10. Finance: ML can be applied to algorithmic trading, risk management,
credit scoring, and identifying financial patterns or trends.

Evaluating Performance In Machine Learning :-

A model performs and ensuring it meets the desired objectives. They are
1. Define the Task and Metrics:-

 Identify the Task Type:-

 Classification: Categorizing data into predefined classes.

 Regression: Predicting continuous values.
 Clustering: Grouping similar data points.
 Ranking: Ordering items by relevance.

 Select Appropriate Metrics:-

 Classification:

 Accuracy: Proportion of correct predictions. Suitable for balanced

datasets.
 Precision: Precision = TP / (TP + FP). Focuses on the accuracy of
positive predictions.
 Recall: Recall = TP / (TP + FN). Measures the ability to capture all
relevant instances.
 F1 Score: F1 = 2 * (Precision * Recall) / (Precision + Recall). Balances
precision and recall.
 ROC-AUC: Measures the model’s ability to distinguish between classes
across different thresholds.

 Regression:-

 Mean Absolute Error (MAE): MAE = (1/n) * Σ|yi - ŷi|. Measures average
absolute errors.
 Mean Squared Error (MSE): MSE = (1/n) * Σ(yi - ŷi)². Emphasizes larger
errors.
 Root Mean Squared Error (RMSE): RMSE = √MSE. Provides error in
the same units as the target variable.
 R-Squared (R²): R² = 1 - (SS_res / SS_tot). Indicates the proportion of
variance explained by the model.

 Clustering:-

 Silhouette Score: Silhouette = (b - a) / max(a, b). Measures how similar

an object is to its own cluster compared to other clusters.
 Davies-Bouldin Index: Measures average similarity of each cluster with
its most similar one.
 Within-Cluster Sum of Squares (WCSS): Measures the total variance
within each cluster.

 Ranking:-
 Mean Average Precision (MAP): Measures the average precision at
each relevant item.
 Normalized Discounted Cumulative Gain (NDCG): Evaluates the
quality of ranked results based on relevance.

2. Cross-Validation:-

 K-Fold Cross-Validation: Divide the dataset into k folds, train the

model on k-1 folds, and test on the remaining fold. Repeat for each fold
and average the performance.
 Leave-One-Out Cross-Validation (LOOCV): Use one data point for
testing and the rest for training, repeated for each data point.

3. Train-Test Split:-

 Holdout Method: Split the dataset into training and testing sets (e.g.,
70% training, 30% testing). Evaluate the model on the test set to assess
its generalization ability.

4. Analyze Results:-

 Compare Metrics: Evaluate how different models or algorithms perform

using the chosen metrics.
 Error Analysis: Examine cases where the model performs poorly to
identify patterns or causes of errors.
 Visualize Performance: Use confusion matrices, ROC curves, precision-
recall curves, or residual plots to better understand model performance.

5. Model Tuning and Iteration:-

 Hyperparameter Tuning: Adjust model parameters to improve

performance using techniques like grid search or random search.
 Feature Engineering: Refine features used in the model to improve its
predictive power.
 Retrain and Reevaluate: After adjustments, retrain the model and
reassess its performance to ensure improvements.

6. Document and Communicate Findings:-

 Record Results: Keep detailed notes on performance metrics, model

parameters, and evaluation methods.
 Share Insights: Communicate findings with stakeholders using
visualizations and clear explanations of performance and limitations.
Example

For a classification task, you might use the following evaluation approach:

1. Task: Binary classification (e.g., spam detection).

2. Metrics: Precision, recall, F1 score, ROC-AUC.
3. Cross-Validation: Perform 10-fold cross-validation.
4. Train-Test Split: 80% training, 20% testing.
5. Analysis: Compare F1 scores across different models and inspect the
ROC curve.
6. Tuning: Optimize hyper parameters using grid search.
7. Documentation: Record the precision-recall trade-off and present ROC
curves.

********************************

Models: The Output Of Machine Learning:-

 It is defined as a “Mathematical (or) computational representations
that generate predictions or insights based on input data”.
 It can improve automatically through experience & old data and build the
model.
 A machine learning model is similar to computer software designed to
recognize patterns (or) behaviors based on previous experience
(or) data.
 The output of a machine learning model is a result of applying learned
patterns or rules from training data to new, unseen data
Supervised Learning :-
 It is the simplest machine learning “model to understand in which input
data is called training data and has a known label (or) result as an
output”.
 It works on the principle of input-output pairs.
 It requires creating a function that can be trained using a training data
set, and then it is applied to unknown data and makes some predictive
performance. Supervised learning is task-based and tested on labeled
data sets.
 It can be divided into two categories
 Classification
 Regression
1. Classification Models:-
Classification models are the second type of Supervised
Learning techniques, which are used to generate conclusions from observed
values in the categorical form.
Example:- It can identify if the email is spam or not; a buyer will purchase the
product or not, etc.
There are two types of classifications in machine learning:
o Binary classification: If the problem has only two possible classes,
called a binary classifier. For example, cat or dog, Yes or No,
o Multi-class classification: If the problem has more than two possible
classes, it is a multi-class classifier.
Purpose: To assign inputs into predefined categories (or) classes.
Examples:
 Logistic Regression: Outputs probabilities for binary (or) multi-class
classification problems. The final class is determined based on a
threshold (e.g., probability > 0.5).

 Support Vector Machines (SVM): Support vector machine or SVM is

the popular machine learning algorithm, which is widely used for
classification and regression tasks. However, specifically, it is used to
solve classification problems. The main aim of SVM is to find the best
decision boundaries in an N-dimensional space, which can segregate
data points into classes, and the best decision boundary is known as
Hyperplane. SVM selects the extreme vector to find the hyperplane, and
these vectors are known as support vectors.

 Decision Trees/Random Forests: Decision trees are the popular

machine learning models that can be used for both regression and
classification problems. A decision tree uses a tree-like structure of
decisions along with their possible consequences and outcomes. In this,
each internal node is used to represent a test on an attribute; each
branch is used to represent the outcome of the test. The more nodes a
decision tree has, the more accurate the result will be
 Random Forest:-Random Forest is the ensemble learning method,
which consists of a large number of decision trees. Each decision tree in
a random forest predicts an outcome, and the prediction with the
majority of votes is considered as the outcome. A random forest model
can be used for both regression and classification problems.
Output:
 Class Label: The predicted category or class (e.g., spam or not spam).
 Probability Scores: For each class, indicating the model's confidence in
its predictions.

2. Regression Models:-
Purpose: To predict continuous numerical values.
Examples:
 Linear Regression: predict one output variable using one or more input
variables. The representation of linear regression is a linear equation,
which combines a set of input values(x) and predicted output(y) for the
set of those input values. It is represented in the form of a line:
Y = bx+ c
 Polynomial Regression: Extends linear regression to model non-linear
relationships using polynomial terms.
 Support Vector Regression (SVR): Predicts continuous values with a
margin of tolerance.
Output:
 Continuous Value: The predicted numerical value (e.g., house price,
temperature).
3. Clustering Models:-
Purpose: To group similar data points into clusters without predefined labels.
Examples:
 K-Means Clustering: Assigns data points to a fixed number of clusters
(k) based on feature similarity.
 Hierarchical Clustering: Builds a tree of clusters, representing data
point relationships at various levels of granularity.
 DBSCAN: Identifies clusters of varying shapes and sizes based on
density.
Output:
 Cluster Labels: The assigned cluster for each data point.
 Cluster Centers: In methods like K-Means, the centroid of each cluster.

4. Dimensionality Reduction Models

Purpose: To reduce the number of features while preserving essential
information.
Examples:
 Principal Component Analysis (PCA): Projects data onto a lower-
dimensional space while retaining the most variance.
 t-Distributed Stochastic Neighbor Embedding (t-SNE): Reduces
dimensions for visualization while preserving the data’s local structure.
Output:
 Reduced Feature Set: Transformed data with fewer dimensions
(features) than the original.
5. Anomaly Detection Models
Purpose: To identify unusual or outlier data points.
Examples:
 Isolation Forest: Detects anomalies by isolating observations through
random partitioning.
 One-Class SVM: Identifies outliers by learning the boundary of the
normal class.
Output:
 Anomaly Scores: A measure of how unusual or outlying a data point is.
 Binary Classification: Anomalous vs. normal data points.
6. Generative Models
Purpose: To generate new data samples that resemble the training data.
Examples:
 Generative Adversarial Networks (GANs): Generates new data
samples by learning the distribution of the training data.
 Variational Autoencoders (VAEs): Produces new data samples by
encoding input into a latent space and then decoding it.
Output:
 Generated Samples: New, synthetic data points that resemble the
original data distribution.
7. Recommendation Systems
Purpose: To suggest items or actions based on user preferences and
behaviors.
Examples:
 Collaborative Filtering: Recommends items based on user-item
interactions and similarities between users.
 Content-Based Filtering: Suggests items based on the features of
items and user preferences.
Output:
 Recommendations: A list of suggested items or actions for a user.
Summary
In ML, the output of a model depends on the type of task it is designed to
solve:
 Classification models produce class labels (or) probabilities.
 Regression models generate continuous numerical predictions.
 Clustering models assign data points to clusters.
 Dimensionality reduction models provide a reduced feature set.
 Anomaly detection models yield anomaly scores (or) classifications.
 Generative models create new data samples.
 Recommendation systems offer personalized suggestions.

Binary Classification And Related Tasks:-

Binary classification is a type of supervised machine learning task .It is used to
classify instances into one of two distinct classes. This type of classification is
fundamental in ML and is widely used in various applications, such as spam
detection, medical diagnosis, and fraud detection.

Key Concepts in Binary Classification

1. Classes:
o Positive Class: One of the two categories the model is predicting.
Often, this is the class of interest.
o Negative Class: The other category, representing cases where the
event or characteristic of interest is absent.

Output:

o The model predicts whether an instance belongs to the positive

class or the negative class. This is typically represented as 0 or 1,
where 1 indicates the positive class and 0 indicates the negative
class.

We can evaluate a binary classifier based on the following parameters:

 True Positive (TP): The patient is diseased and the model predicts
"diseased"
 False Positive (FP): The patient is healthy but the model predicts
"diseased"
 True Negative (TN): The patient is healthy and the model predicts
"healthy"
 False Negative (FN): The patient is diseased and the model predicts
"healthy"
After obtaining these values, we can compute the accuracy score of the
binary classifier as follows

The following is a confusion matrix, which represents the above parameters :

2. Common Algorithms:
o Logistic Regression: A statistical model that estimates the
probability of a binary outcome using a logistic function.
o Support Vector Machines (SVM): Finds the optimal hyperplane
that separates the two classes with the maximum margin.
o Decision Trees: Uses a tree-like model of decisions and their
possible consequences to classify instances.
o Random Forest: An ensemble of decision trees that improves
classification performance by averaging predictions from multiple
trees.
o Naive Bayes: Applies Bayes' theorem with an assumption of
independence between features.
o Neural Networks: Models with multiple layers that learn complex
patterns and representations from data.
o K-Nearest Neighbors (KNN): Classifies instances based on the
majority class of the k-nearest neighbors.

3. Evaluation Metrics:
o Accuracy: The ratio of correctly classified instances to the total
number of instances. While useful, it can be misleading in
imbalanced datasets.
o Precision (Positive Predictive Value): The ratio of true positive
predictions to the sum of true positive and false positive predictions.
It measures how many of the predicted positives are actually
positive.
o Recall (Sensitivity, True Positive Rate): The ratio of true
positive predictions to the sum of true positive and false negative
predictions. It measures how many of the actual positives were
captured by the model.
o F1 Score: The harmonic mean of precision and recall. It balances
the trade-off between precision and recall.
o ROC Curve (Receiver Operating Characteristic Curve): A
graphical plot of the true positive rate against the false positive rate
at various threshold settings.
o AUC (Area Under the ROC Curve): A single value that
summarizes the performance of the model across all classification
thresholds.

Common Uses:

o Spam Detection: Classifying emails as spam or not spam.

o Medical Diagnosis: Determining whether a patient has a
particular disease or not based on medical tests.
o Fraud Detection: Identifying fraudulent transactions in financial
systems.
o Sentiment Analysis: Classifying text as positive or negative
sentiment.

Example:-

1. Data Collection: Gather labeled data with features (inputs) and binary
labels (0 or 1).
2. Data Preprocessing: Clean and prepare the data by handling missing
values, encoding categorical variables, and normalizing numerical
features.
3. Model Training: Choose a suitable binary classification algorithm and
train the model on the training dataset.
4. Model Evaluation: Assess the model's performance using metrics such
as accuracy, precision, recall, and F1 score. Use a validation set or cross-
validation to tune hyperparameters.
5. Model Testing: Evaluate the model on a test set to estimate its
performance on unseen data.
6. Deployment: Deploy the model to make predictions on new, real-world
data.
7. Monitoring and Maintenance: Continuously monitor the model's
performance and update it as needed based on new data or changes in
the underlying patterns.

DSF Unit 4
No ratings yet
DSF Unit 4
12 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
19 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
ML notes
No ratings yet
ML notes
16 pages
Class Notes: The Basics of Machine Learning
No ratings yet
Class Notes: The Basics of Machine Learning
4 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
ML Module 1
No ratings yet
ML Module 1
12 pages
Disruptive Technologies AI Lecture 2
No ratings yet
Disruptive Technologies AI Lecture 2
12 pages
data science notes c
No ratings yet
data science notes c
4 pages
Chapter 01 machine learning
No ratings yet
Chapter 01 machine learning
22 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Unit-I
No ratings yet
Unit-I
23 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
Machine Learning in Unit-1
No ratings yet
Machine Learning in Unit-1
10 pages
ML Revision
No ratings yet
ML Revision
207 pages
ML Basics
No ratings yet
ML Basics
3 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
22 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
20CB913 Machine Learning Module 2
No ratings yet
20CB913 Machine Learning Module 2
52 pages
Lecture Notes on Machine Learning Concepts.docx
No ratings yet
Lecture Notes on Machine Learning Concepts.docx
5 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-10-33
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-10-33
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML (Theory)
No ratings yet
ML (Theory)
11 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
Introduction To Machine Learning Notes
No ratings yet
Introduction To Machine Learning Notes
26 pages
ML NOTES
No ratings yet
ML NOTES
101 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
ML (AutoRecovered)
No ratings yet
ML (AutoRecovered)
5 pages
Ids Ashber
No ratings yet
Ids Ashber
9 pages
Basic of Machine Learning
No ratings yet
Basic of Machine Learning
7 pages
ML
No ratings yet
ML
39 pages
Course Overview
No ratings yet
Course Overview
33 pages
sdl unit 1
No ratings yet
sdl unit 1
7 pages
Unit 3
No ratings yet
Unit 3
13 pages
Module_-1
No ratings yet
Module_-1
9 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
Assignment
No ratings yet
Assignment
5 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
Chap 10-Machine Learning
No ratings yet
Chap 10-Machine Learning
25 pages
Unit
No ratings yet
Unit
9 pages
Noida Institute of Engineering and Technology
No ratings yet
Noida Institute of Engineering and Technology
24 pages
ML Video
No ratings yet
ML Video
8 pages
presenttion33
No ratings yet
presenttion33
2 pages
ML Notes
No ratings yet
ML Notes
52 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
21cs743 Solutions
No ratings yet
21cs743 Solutions
19 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
MCA -ML Question Bank Answer
No ratings yet
MCA -ML Question Bank Answer
139 pages
Machine Learning Neeru
No ratings yet
Machine Learning Neeru
170 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
BTP Report On Text To Image Synthesis
No ratings yet
BTP Report On Text To Image Synthesis
62 pages
Resume NLP V1
No ratings yet
Resume NLP V1
1 page
FDP On ML
No ratings yet
FDP On ML
10 pages
Random Forest Thesis
100% (3)
Random Forest Thesis
6 pages
Deeplearningsmartnetworks 190505233523
100% (1)
Deeplearningsmartnetworks 190505233523
101 pages
Download Complete (Ebook) Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning by Alex J. Gutman, Jordan Goldmeier ISBN 9781119741763, 1119741769 PDF for All Chapters
100% (8)
Download Complete (Ebook) Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning by Alex J. Gutman, Jordan Goldmeier ISBN 9781119741763, 1119741769 PDF for All Chapters
55 pages
CSE
No ratings yet
CSE
20 pages
Unit III
No ratings yet
Unit III
37 pages
A9 Road Traffic Vehicle Dtection With Custom Dataset
No ratings yet
A9 Road Traffic Vehicle Dtection With Custom Dataset
61 pages
Diabetes Prediciton Model
100% (1)
Diabetes Prediciton Model
23 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
6 pages
Computer Vision Based Early Fire Detection Using Machine Learning
No ratings yet
Computer Vision Based Early Fire Detection Using Machine Learning
11 pages
Sde Profile Ashutosh 1 1
No ratings yet
Sde Profile Ashutosh 1 1
1 page
Cse 2019 Curriculum
No ratings yet
Cse 2019 Curriculum
11 pages
Uncertainty-Aware Gait Recognition Via Learning From Dirichlet Distribution-Based Evidence
No ratings yet
Uncertainty-Aware Gait Recognition Via Learning From Dirichlet Distribution-Based Evidence
12 pages
Earthquake Prediction Technique: A Comparative Study
No ratings yet
Earthquake Prediction Technique: A Comparative Study
7 pages
Final-Paper Semaphore
No ratings yet
Final-Paper Semaphore
5 pages
Patel Dwiti Thesis 2022
No ratings yet
Patel Dwiti Thesis 2022
133 pages
Deep Learning: CS229 Lecture Notes
No ratings yet
Deep Learning: CS229 Lecture Notes
16 pages
ANNtoSNN PDF
No ratings yet
ANNtoSNN PDF
12 pages
Wa0006.
No ratings yet
Wa0006.
8 pages
Image and Video Analytics Syllabus
100% (1)
Image and Video Analytics Syllabus
3 pages
Digital Twin Based Beam Prediction: Can We Train in The Digital World and Deploy in Reality?
No ratings yet
Digital Twin Based Beam Prediction: Can We Train in The Digital World and Deploy in Reality?
6 pages
PHD Research Publications
No ratings yet
PHD Research Publications
6 pages
Tutorials Book on Deep Chem
No ratings yet
Tutorials Book on Deep Chem
654 pages
Major Project b10
No ratings yet
Major Project b10
94 pages
Rastogi 2020
No ratings yet
Rastogi 2020
6 pages
Problems of Implementing Artificial Intelligence in Nigeria
No ratings yet
Problems of Implementing Artificial Intelligence in Nigeria
11 pages
Stock Market Analysis 0th Review
No ratings yet
Stock Market Analysis 0th Review
27 pages
Schedule
No ratings yet
Schedule
13 pages