Issues in ML and Generating Algo

Uploaded by

akasingh1105

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Issues in ML and Generating Algo

Uploaded by

akasingh1105

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Issues in Machine Learning

• Machine Learning is defined as the study of computer algorithms

for automatically constructing computer software through past
experience and training data.

• "Machine Learning" is one of the most popular technology among

all data scientists.
• It is the most effective Artificial Intelligence technology that helps
create automated learning systems to take future decisions
without being constantly programmed. It can be considered an
algorithm that automatically constructs various computer
software using past experience and training data. It can be seen in
every industry, such as healthcare, education, finance,
automobile, marketing, shipping, infrastructure, automation, etc.
Commonly used Algorithms in Machine Learning

• Linear Regression
• Logistic Regression
• Decision Tree
• Bayes Theorem and Naïve Bayes Classification
• Support Vector Machine (SVM) Algorithm
• K-Nearest Neighbor (KNN) Algorithm
• K-Means
• Gradient Boosting algorithms
• Dimensionality Reduction Algorithms
• Random Forest
Common issues in Machine Learning

Although machine learning is being used in

every industry and helps organizations make
more informed and data-driven choices that
are more effective than classical
methodologies, it still has so many problems
that cannot be ignored. Here are some
common issues in Machine Learning that
professionals face to inculcate ML skills and
create an application from scratch.
Common issues in Machine Learning

• 1. Inadequate Training Data

– Noisy Data
– Incorrect data
– Generalizing of output data
• 2. Poor quality of data
• 3. Non-representative training data
• 4. Overfitting and Underfitting
• 5. Monitoring and maintenance
• 6. Getting bad recommendations
1. Inadequate Training Data
The major issue that comes while using machine learning algorithms is the
lack of quality as well as quantity of data. Although data plays a vital role in
the processing of machine learning algorithms, many data scientists claim
that inadequate data, noisy data, and unclean data are extremely
exhausting the machine learning algorithms. For example, a simple task
requires thousands of sample data, and an advanced task such as speech or
image recognition needs millions of sample data examples. Further, data
quality is also important for the algorithms to work ideally, but the absence
of data quality is also found in Machine Learning applications. Data quality
can be affected by some factors as follows:
– Noisy Data- It is responsible for an inaccurate prediction that affects the decision
as well as accuracy in classification tasks.
– Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.
– Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.
2. Poor quality of data

• Data plays a significant role in machine

learning, and it must be of good quality as
well. Noisy data, incomplete data, inaccurate
data, and unclean data lead to less accuracy in
classification and low-quality results. Hence,
data quality can also be considered as a major
common problem while processing machine
learning algorithms.
3. Non-representative training data

• To make sure our training model is generalized well or not,

we have to ensure that sample training data must be
representative of new cases that we need to generalize.
The training data must cover all cases that are already
occurred as well as occurring.
• Non-representative training data in the model, it results in
less accurate predictions. A machine learning model is said
to be ideal if it predicts well for generalized cases and
provides accurate decisions. If there is less training data,
then there will be a sampling noise in the model, called the
non-representative training set. It won't be accurate in
predictions. To overcome this, it will be biased against one
class or a group.
4. Overfitting and Underfitting
• Overfitting:
• Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model. Let's understand with a simple example where we have a few
training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then
there is a considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set; hence prediction got negatively
affected. The main reason behind overfitting is using non-linear methods used in machine
learning algorithms as they build non-realistic data models. We can overcome overfitting by
using linear and parametric algorithms in the machine learning models.
• Methods to reduce overfitting:
• Increase training data in a dataset.
• Reduce model complexity by simplifying the model by selecting one with fewer parameters
• Ridge Regularization and Lasso Regularization
• Early stopping during the training phase
• Reduce the noise
• Reduce the number of attributes in training data.
• Constraining the model.
Underfitting:
• Underfitting is just the opposite of overfitting. Whenever a machine learning
model is trained with fewer amounts of data, and as a result, it provides
incomplete and inaccurate data and destroys the accuracy of the machine
learning model.
• Underfitting occurs when our model is too simple to understand the base
structure of the data, just like an undersized pant. This generally happens
when we have limited data into the data set, and we try to build a linear model
with non-linear data. In such scenarios, the complexity of the model destroys,
and rules of the machine learning model become too easy to be applied on
this data set, and the model starts doing wrong predictions as well.
• Methods to reduce Underfitting:
• Increase model complexity
• Remove noise from the data
• Trained on increased and better features
• Reduce the constraints
• Increase the number of epochs to get better results.
5. Monitoring and maintenance

• As we know that generalized output data is

mandatory for any machine learning model;
hence, regular monitoring and maintenance
become compulsory for the same. Different
results for different actions require data
change; hence editing of codes as well as
resources for monitoring them also become
necessary.
6. Getting bad recommendations

• A machine learning model operates under a specific context

which results in bad recommendations and concept drift in
the model. Let's understand with an example where at a
specific time customer is looking for some gadgets, but now
customer requirement changed over time but still machine
learning model showing same recommendations to the
customer while customer expectation has been changed.
This incident is called a Data Drift. It generally occurs when
new data is introduced or interpretation of data changes.
However, we can overcome this by regularly updating and
monitoring data according to the expectations.
7. Lack of skilled resources

• Although Machine Learning and Artificial

Intelligence are continuously growing in the
market, still these industries are fresher in
comparison to others. The absence of skilled
resources in the form of manpower is also an
issue. Hence, we need manpower having in-
depth knowledge of mathematics, science,
and technologies for developing and managing
scientific substances for machine learning.
8. Customer Segmentation

• Customer segmentation is also an important

issue while developing a machine learning
algorithm. To identify the customers who paid
for the recommendations shown by the model
and who don't even check them. Hence, an
algorithm is necessary to recognize the
customer behavior and trigger a relevant
recommendation for the user based on past
experience.
9. Process Complexity of Machine Learning

• The machine learning process is very complex, which is

also another major issue faced by machine learning
engineers and data scientists. However, Machine Learning
and Artificial Intelligence are very new technologies but
are still in an experimental phase and continuously being
changing over time. There is the majority of hits and trial
experiments; hence the probability of error is higher than
expected. Further, it also includes analyzing the data,
removing data bias, training data, applying complex
mathematical calculations, etc., making the procedure
more complicated and quite tedious.
10. Data Bias
• Data Biasing is also found a big challenge in Machine Learning. These
errors exist when certain elements of the dataset are heavily weighted or
need more importance than others. Biased data leads to inaccurate
results, skewed outcomes, and other analytical errors. However, we can
resolve this error by determining where data is actually biased in the
dataset. Further, take necessary steps to reduce it.
• Methods to remove Data Bias:
• Research more for customer segmentation.
• Be aware of your general use cases and potential outliers.
• Combine inputs from multiple sources to ensure data diversity.
• Include bias testing in the development process.
• Analyze data regularly and keep tracking errors to resolve them easily.
• Review the collected and annotated data.
• Use multi-pass annotation such as sentiment analysis, content
moderation, and intent recognition.
11. Lack of Explainability

• This basically means the outputs cannot be

easily comprehended as it is programmed in
specific ways to deliver for certain conditions.
Hence, a lack of explainability is also found in
machine learning algorithms which reduce the
credibility of the algorithms.
12. Slow implementations and results

• This issue is also very commonly seen in machine

learning models. However, machine learning
models are highly efficient in producing accurate
results but are time-consuming. Slow
programming, excessive requirements' and
overloaded data take more time to provide
accurate results than expected. This needs
continuous maintenance and monitoring of the
model for delivering accurate results.
13. Irrelevant features

• Although machine learning models are

intended to give the best possible outcome, if
we feed garbage data as input, then the result
will also be garbage. Hence, we should use
relevant features in our training sample. A
machine learning model is said to be good if
training data has a good set of features or less
to no irrelevant features.
• Certainly! Here are some common issues in machine learning that you
could include in your PowerPoint presentation:
• Data Quality: Ensuring the data is clean, relevant, and unbiased.
• Overfitting: When a model learns the training data too well, including
noise and outliers.
• Underfitting: When a model is too simple and cannot capture the
underlying trend of the data.
• Model Generalization: Making sure the model performs well on new,
unseen data.
• Computational Complexity: Managing the trade-off between model
complexity and computation time.
• Ethical Concerns: Addressing issues like privacy, security, and fairness
in algorithms.
Basic design issues and approaches in machine learning include

• Data quality: Ensuring clean, relevant, and

representative data.
• Concept drift: Handling changes in data distribution
over time.
• Reproducibility: Documenting and sharing code and
data to reproduce results.
• Bias: Addressing biases in data and algorithms.
• Explainability: Making ML models interpretable.
• Subjective elements: Recognizing that design involves
more than just data.
Generative algorithms
• Generative algorithms in machine learning are
designed to create new data instances that
resemble your training data.
• Examples include Generative Adversarial
Networks (GANs), Variational Auto encoders
(VAEs), and more. They’re often used for tasks
like image generation, text-to-image synthesis,
and more.
What is Generative Machine Learning?

• Generative Machine Learning is an interesting subset of artificial

intelligence, where models are trained to generate new data
samples similar to the original training data. In this article, we’ll
explore the fundamentals of generative machine learning,
compare it with discriminative models, delve into its applications,
and conclude with insights into its significance in the AI
landscape.
• Generative machine learning involves the development of
models that learn the underlying distribution of the training data.
These models are capable of generating new data samples,
which have similar characteristics to the original dataset.
Fundamentally, generative models aim to understand the core of
the data in order to generate unique and diverse outputs.
• Generative The basic components of generative learning involve
appreciation probability distributions, which are used to carry out the
process of generating a sample data set. As GANs, VAEs and MCMCs are
among the most popular methods that are employed in generative
learning.
Generative vs Discriminative Models
• One of the main things that differentiates machine learning
models from each other is whether they are generative or
discriminative ones. Classifying variables use the boundary
to separate different classes or categories in the data. For
instance, a classifier for discriminating between cats and
dogs would learn to do so depending on their features (such
as size and color).
• Contrastingly, the generative models adopt the approach of
learning the underlying distribution of the data, not just the
class boundaries. In this way, generative models are able to
create new data points consistent with the training data
which is very helpful in application to data augmentation,
image synthesis and artificial intelligence.
• Discriminative
Generative Algorithms

• Algorithms of this type try to model “how to populate the

dataset.” Sampling the model gives generated, synthetic data points.
• We estimate the probability distributions. Formally, the generative model
estimates the conditional probability for a given target . For example, the
Naive Bayes algorithm models and then transforms the probabilities into
conditional probabilities by applying the Bayes rule.
• Some popular generative algorithms are:
– Naive Bayes Classifier
– Generative Adversarial Networks
– Gaussian Mixture Model
– Hidden Markov Model
– Probabilistic context-free grammar
Discriminative Algorithms
• Discriminative algorithms focus on modeling a direct solution. For
example, the logistic regression algorithm models a decision boundary.
Then it decides on the outcome of an observation based on where it
stands relative to the decision boundary.
• Discriminative algorithms estimate posterior probabilities. Unlike the
generative algorithms, they don’t model the underlying probability
distributions. Formally, we model the conditional probability of
target given an observation .
• Some popular discriminative algorithms are:
– k-nearest neighbors (k-NN)
– Logistic regression
– Support Vector Machines (SVMs)
– Decision Trees
– Random Forest
– Artificial Neural Networks (ANNs)
• The generative approach focuses on modeling, whereas the
discriminative approach focuses on a solution. So, we can use
generative algorithms to generate new data points.
Discriminative algorithms don’t serve that purpose.
• Still, discriminative algorithms generally perform better for
classification tasks. That’s because they focus on solving the
actual problem directly instead of solving a more general
problem first.
• Yet, the real strength of generative algorithms lies in their
ability to express complex relationships between variables. In
other words, they have explanatory power. As a result, they
have successful use cases in NLP and medicine.
Generative Model Discriminative Model

Learns Probabilistic model Decision boundary

Estimates

Strength Converges faster Smaller error

Express complex
Explainability Low to none
relationships

Naive Bayes Classifier,

Examples Linear Regression, SVM
GAN

Laboratory Studies in Integrated Principles of Zoology 18th Edition
No ratings yet
Laboratory Studies in Integrated Principles of Zoology 18th Edition
54 pages
Enzyme Kinetics Lab Report
100% (2)
Enzyme Kinetics Lab Report
20 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Issues in ML
No ratings yet
Issues in ML
2 pages
Unit 1 Notes_FML
No ratings yet
Unit 1 Notes_FML
95 pages
Machine Learning Unit - 1
No ratings yet
Machine Learning Unit - 1
154 pages
Common Issues in Machine Learning (1)
No ratings yet
Common Issues in Machine Learning (1)
6 pages
Chapter 1-ML
No ratings yet
Chapter 1-ML
27 pages
What Are Issues in Machine Learning
No ratings yet
What Are Issues in Machine Learning
2 pages
Data Science-Unit-4- 05.10.23
No ratings yet
Data Science-Unit-4- 05.10.23
59 pages
7 Machine Learning and Deep Learning Mistakes and Limitations To Avoid
No ratings yet
7 Machine Learning and Deep Learning Mistakes and Limitations To Avoid
10 pages
lec 2
No ratings yet
lec 2
23 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
LECTURE - 1
No ratings yet
LECTURE - 1
35 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
26 pages
NN-7
No ratings yet
NN-7
26 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
UNIT 3__ML
No ratings yet
UNIT 3__ML
15 pages
Twenty Frequently Asked Interview Questions and Answers
No ratings yet
Twenty Frequently Asked Interview Questions and Answers
8 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Note - Before Use Check Answers According To Your Syllabus.: Importance
No ratings yet
Note - Before Use Check Answers According To Your Syllabus.: Importance
31 pages
2-ML
No ratings yet
2-ML
80 pages
UNIT-1-Intro of ML
No ratings yet
UNIT-1-Intro of ML
33 pages
Common Issues in Machine Learning
No ratings yet
Common Issues in Machine Learning
12 pages
Unit-1-MLF-1
No ratings yet
Unit-1-MLF-1
33 pages
Machine Learning Unit-1
No ratings yet
Machine Learning Unit-1
22 pages
11 July Unit 1 - Copy
No ratings yet
11 July Unit 1 - Copy
47 pages
Antim Prahar 2024 AI and ML for Business
No ratings yet
Antim Prahar 2024 AI and ML for Business
43 pages
Seminar
No ratings yet
Seminar
26 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Chapter 4
No ratings yet
Chapter 4
20 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
L 4 and 5-Data Cleaning DS-Sa
No ratings yet
L 4 and 5-Data Cleaning DS-Sa
44 pages
Underfitting and Overfitting
No ratings yet
Underfitting and Overfitting
4 pages
CSL0777 L08
No ratings yet
CSL0777 L08
29 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Machine Learning by Sahil
No ratings yet
Machine Learning by Sahil
15 pages
L3 - Supervised and Unsupervised Learning
100% (3)
L3 - Supervised and Unsupervised Learning
24 pages
Unit-2
No ratings yet
Unit-2
125 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Data Science Real World Applications
No ratings yet
Data Science Real World Applications
19 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
HW_626710_2CH12&
No ratings yet
HW_626710_2CH12&
37 pages
Week 15
No ratings yet
Week 15
41 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Bias_and_Variance
No ratings yet
Bias_and_Variance
4 pages
ML & DL
No ratings yet
ML & DL
19 pages
5.3 Model
No ratings yet
5.3 Model
26 pages
imp
No ratings yet
imp
63 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Week 2 - Select and Train A Model
No ratings yet
Week 2 - Select and Train A Model
29 pages
Week5 Modified
No ratings yet
Week5 Modified
25 pages
w1 - Introduction To ML
No ratings yet
w1 - Introduction To ML
41 pages
Chapter 4- Machine Learning
No ratings yet
Chapter 4- Machine Learning
81 pages
Data Science Lifecycle
No ratings yet
Data Science Lifecycle
3 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Chapter 4 Mechanical Properties of Metals
No ratings yet
Chapter 4 Mechanical Properties of Metals
19 pages
Bhaktivedanta_Vidyapitha_SB_Canto_10_Overview
No ratings yet
Bhaktivedanta_Vidyapitha_SB_Canto_10_Overview
208 pages
What Is Leave Travel Allowance or LTA
No ratings yet
What Is Leave Travel Allowance or LTA
3 pages
Accounting Long Quiz
No ratings yet
Accounting Long Quiz
2 pages
Dzexams 1as Anglais 473300
No ratings yet
Dzexams 1as Anglais 473300
5 pages
Delphi Login
No ratings yet
Delphi Login
4 pages
Request For Quotation of Prices: LFN Trading
No ratings yet
Request For Quotation of Prices: LFN Trading
5 pages
Learn: Half Yearly Examination (2013 - 2014) Class Ix Foundation of Information Technology Time: 3 Hrs. M.M.: 90
No ratings yet
Learn: Half Yearly Examination (2013 - 2014) Class Ix Foundation of Information Technology Time: 3 Hrs. M.M.: 90
23 pages
Lectures 1 - 10 Introduction To Classical Mechanics: Prof. N. Harnew University of Oxford MT 2016
No ratings yet
Lectures 1 - 10 Introduction To Classical Mechanics: Prof. N. Harnew University of Oxford MT 2016
102 pages
SIP Viva Template
No ratings yet
SIP Viva Template
11 pages
Determination 446 Pesticides Residue by GC Ms and LC Ms
No ratings yet
Determination 446 Pesticides Residue by GC Ms and LC Ms
32 pages
BOPPPS Session Plan - Respect For Diversity
No ratings yet
BOPPPS Session Plan - Respect For Diversity
2 pages
Marine Pumps: Grundfos Industrial Solutions Marine
100% (1)
Marine Pumps: Grundfos Industrial Solutions Marine
7 pages
Agile Lean
No ratings yet
Agile Lean
52 pages
Ajp Assignment 1 To 5
No ratings yet
Ajp Assignment 1 To 5
6 pages
Corporate and Business Law (LW-ENG) : Syllabus and Study Guide
No ratings yet
Corporate and Business Law (LW-ENG) : Syllabus and Study Guide
15 pages
Jadual Tariff Miceca Part2
No ratings yet
Jadual Tariff Miceca Part2
50 pages
Homework Oh Homework Poetry
100% (1)
Homework Oh Homework Poetry
6 pages
Violino 1 Signed, Sealed, Delivered - Stevie Wonder
No ratings yet
Violino 1 Signed, Sealed, Delivered - Stevie Wonder
1 page
Fire Resistant Cable
No ratings yet
Fire Resistant Cable
2 pages
Digital Business Innovation
No ratings yet
Digital Business Innovation
7 pages
Tobii User Story - Sebastian Jansson
No ratings yet
Tobii User Story - Sebastian Jansson
2 pages
Cell Injury Seqs-1
No ratings yet
Cell Injury Seqs-1
4 pages
SST - Python Projects List (2022)
No ratings yet
SST - Python Projects List (2022)
11 pages
Lub Oil Report I
No ratings yet
Lub Oil Report I
15 pages
EnglishPage - Simple Present
No ratings yet
EnglishPage - Simple Present
4 pages
Adjective
No ratings yet
Adjective
11 pages
Transparent Governance in An Age of Abundance
No ratings yet
Transparent Governance in An Age of Abundance
452 pages