Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Issues in ML and Generating Algo

Uploaded by

akasingh1105
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Issues in ML and Generating Algo

Uploaded by

akasingh1105
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Issues in Machine Learning

• Machine Learning is defined as the study of computer algorithms


for automatically constructing computer software through past
experience and training data.

• "Machine Learning" is one of the most popular technology among


all data scientists.
• It is the most effective Artificial Intelligence technology that helps
create automated learning systems to take future decisions
without being constantly programmed. It can be considered an
algorithm that automatically constructs various computer
software using past experience and training data. It can be seen in
every industry, such as healthcare, education, finance,
automobile, marketing, shipping, infrastructure, automation, etc.
Commonly used Algorithms in Machine Learning

• Linear Regression
• Logistic Regression
• Decision Tree
• Bayes Theorem and Naïve Bayes Classification
• Support Vector Machine (SVM) Algorithm
• K-Nearest Neighbor (KNN) Algorithm
• K-Means
• Gradient Boosting algorithms
• Dimensionality Reduction Algorithms
• Random Forest
Common issues in Machine Learning

Although machine learning is being used in


every industry and helps organizations make
more informed and data-driven choices that
are more effective than classical
methodologies, it still has so many problems
that cannot be ignored. Here are some
common issues in Machine Learning that
professionals face to inculcate ML skills and
create an application from scratch.
Common issues in Machine Learning

• 1. Inadequate Training Data


– Noisy Data
– Incorrect data
– Generalizing of output data
• 2. Poor quality of data
• 3. Non-representative training data
• 4. Overfitting and Underfitting
• 5. Monitoring and maintenance
• 6. Getting bad recommendations
1. Inadequate Training Data
The major issue that comes while using machine learning algorithms is the
lack of quality as well as quantity of data. Although data plays a vital role in
the processing of machine learning algorithms, many data scientists claim
that inadequate data, noisy data, and unclean data are extremely
exhausting the machine learning algorithms. For example, a simple task
requires thousands of sample data, and an advanced task such as speech or
image recognition needs millions of sample data examples. Further, data
quality is also important for the algorithms to work ideally, but the absence
of data quality is also found in Machine Learning applications. Data quality
can be affected by some factors as follows:
– Noisy Data- It is responsible for an inaccurate prediction that affects the decision
as well as accuracy in classification tasks.
– Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.
– Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.
2. Poor quality of data

• Data plays a significant role in machine


learning, and it must be of good quality as
well. Noisy data, incomplete data, inaccurate
data, and unclean data lead to less accuracy in
classification and low-quality results. Hence,
data quality can also be considered as a major
common problem while processing machine
learning algorithms.
3. Non-representative training data

• To make sure our training model is generalized well or not,


we have to ensure that sample training data must be
representative of new cases that we need to generalize.
The training data must cover all cases that are already
occurred as well as occurring.
• Non-representative training data in the model, it results in
less accurate predictions. A machine learning model is said
to be ideal if it predicts well for generalized cases and
provides accurate decisions. If there is less training data,
then there will be a sampling noise in the model, called the
non-representative training set. It won't be accurate in
predictions. To overcome this, it will be biased against one
class or a group.
4. Overfitting and Underfitting
• Overfitting:
• Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model. Let's understand with a simple example where we have a few
training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then
there is a considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set; hence prediction got negatively
affected. The main reason behind overfitting is using non-linear methods used in machine
learning algorithms as they build non-realistic data models. We can overcome overfitting by
using linear and parametric algorithms in the machine learning models.
• Methods to reduce overfitting:
• Increase training data in a dataset.
• Reduce model complexity by simplifying the model by selecting one with fewer parameters
• Ridge Regularization and Lasso Regularization
• Early stopping during the training phase
• Reduce the noise
• Reduce the number of attributes in training data.
• Constraining the model.
Underfitting:
• Underfitting is just the opposite of overfitting. Whenever a machine learning
model is trained with fewer amounts of data, and as a result, it provides
incomplete and inaccurate data and destroys the accuracy of the machine
learning model.
• Underfitting occurs when our model is too simple to understand the base
structure of the data, just like an undersized pant. This generally happens
when we have limited data into the data set, and we try to build a linear model
with non-linear data. In such scenarios, the complexity of the model destroys,
and rules of the machine learning model become too easy to be applied on
this data set, and the model starts doing wrong predictions as well.
• Methods to reduce Underfitting:
• Increase model complexity
• Remove noise from the data
• Trained on increased and better features
• Reduce the constraints
• Increase the number of epochs to get better results.
5. Monitoring and maintenance

• As we know that generalized output data is


mandatory for any machine learning model;
hence, regular monitoring and maintenance
become compulsory for the same. Different
results for different actions require data
change; hence editing of codes as well as
resources for monitoring them also become
necessary.
6. Getting bad recommendations

• A machine learning model operates under a specific context


which results in bad recommendations and concept drift in
the model. Let's understand with an example where at a
specific time customer is looking for some gadgets, but now
customer requirement changed over time but still machine
learning model showing same recommendations to the
customer while customer expectation has been changed.
This incident is called a Data Drift. It generally occurs when
new data is introduced or interpretation of data changes.
However, we can overcome this by regularly updating and
monitoring data according to the expectations.
7. Lack of skilled resources

• Although Machine Learning and Artificial


Intelligence are continuously growing in the
market, still these industries are fresher in
comparison to others. The absence of skilled
resources in the form of manpower is also an
issue. Hence, we need manpower having in-
depth knowledge of mathematics, science,
and technologies for developing and managing
scientific substances for machine learning.
8. Customer Segmentation

• Customer segmentation is also an important


issue while developing a machine learning
algorithm. To identify the customers who paid
for the recommendations shown by the model
and who don't even check them. Hence, an
algorithm is necessary to recognize the
customer behavior and trigger a relevant
recommendation for the user based on past
experience.
9. Process Complexity of Machine Learning

• The machine learning process is very complex, which is


also another major issue faced by machine learning
engineers and data scientists. However, Machine Learning
and Artificial Intelligence are very new technologies but
are still in an experimental phase and continuously being
changing over time. There is the majority of hits and trial
experiments; hence the probability of error is higher than
expected. Further, it also includes analyzing the data,
removing data bias, training data, applying complex
mathematical calculations, etc., making the procedure
more complicated and quite tedious.
10. Data Bias
• Data Biasing is also found a big challenge in Machine Learning. These
errors exist when certain elements of the dataset are heavily weighted or
need more importance than others. Biased data leads to inaccurate
results, skewed outcomes, and other analytical errors. However, we can
resolve this error by determining where data is actually biased in the
dataset. Further, take necessary steps to reduce it.
• Methods to remove Data Bias:
• Research more for customer segmentation.
• Be aware of your general use cases and potential outliers.
• Combine inputs from multiple sources to ensure data diversity.
• Include bias testing in the development process.
• Analyze data regularly and keep tracking errors to resolve them easily.
• Review the collected and annotated data.
• Use multi-pass annotation such as sentiment analysis, content
moderation, and intent recognition.
11. Lack of Explainability

• This basically means the outputs cannot be


easily comprehended as it is programmed in
specific ways to deliver for certain conditions.
Hence, a lack of explainability is also found in
machine learning algorithms which reduce the
credibility of the algorithms.
12. Slow implementations and results

• This issue is also very commonly seen in machine


learning models. However, machine learning
models are highly efficient in producing accurate
results but are time-consuming. Slow
programming, excessive requirements' and
overloaded data take more time to provide
accurate results than expected. This needs
continuous maintenance and monitoring of the
model for delivering accurate results.
13. Irrelevant features

• Although machine learning models are


intended to give the best possible outcome, if
we feed garbage data as input, then the result
will also be garbage. Hence, we should use
relevant features in our training sample. A
machine learning model is said to be good if
training data has a good set of features or less
to no irrelevant features.
• Certainly! Here are some common issues in machine learning that you
could include in your PowerPoint presentation:
• Data Quality: Ensuring the data is clean, relevant, and unbiased.
• Overfitting: When a model learns the training data too well, including
noise and outliers.
• Underfitting: When a model is too simple and cannot capture the
underlying trend of the data.
• Model Generalization: Making sure the model performs well on new,
unseen data.
• Computational Complexity: Managing the trade-off between model
complexity and computation time.
• Ethical Concerns: Addressing issues like privacy, security, and fairness
in algorithms.
Basic design issues and approaches in machine learning include

• Data quality: Ensuring clean, relevant, and


representative data.
• Concept drift: Handling changes in data distribution
over time.
• Reproducibility: Documenting and sharing code and
data to reproduce results.
• Bias: Addressing biases in data and algorithms.
• Explainability: Making ML models interpretable.
• Subjective elements: Recognizing that design involves
more than just data.
Generative algorithms
• Generative algorithms in machine learning are
designed to create new data instances that
resemble your training data.
• Examples include Generative Adversarial
Networks (GANs), Variational Auto encoders
(VAEs), and more. They’re often used for tasks
like image generation, text-to-image synthesis,
and more.
What is Generative Machine Learning?

• Generative Machine Learning is an interesting subset of artificial


intelligence, where models are trained to generate new data
samples similar to the original training data. In this article, we’ll
explore the fundamentals of generative machine learning,
compare it with discriminative models, delve into its applications,
and conclude with insights into its significance in the AI
landscape.
• Generative machine learning involves the development of
models that learn the underlying distribution of the training data.
These models are capable of generating new data samples,
which have similar characteristics to the original dataset.
Fundamentally, generative models aim to understand the core of
the data in order to generate unique and diverse outputs.
• Generative The basic components of generative learning involve
appreciation probability distributions, which are used to carry out the
process of generating a sample data set. As GANs, VAEs and MCMCs are
among the most popular methods that are employed in generative
learning.
Generative vs Discriminative Models
• One of the main things that differentiates machine learning
models from each other is whether they are generative or
discriminative ones. Classifying variables use the boundary
to separate different classes or categories in the data. For
instance, a classifier for discriminating between cats and
dogs would learn to do so depending on their features (such
as size and color).
• Contrastingly, the generative models adopt the approach of
learning the underlying distribution of the data, not just the
class boundaries. In this way, generative models are able to
create new data points consistent with the training data
which is very helpful in application to data augmentation,
image synthesis and artificial intelligence.
• Discriminative
Generative Algorithms

• Algorithms of this type try to model “how to populate the


dataset.” Sampling the model gives generated, synthetic data points.
• We estimate the probability distributions. Formally, the generative model
estimates the conditional probability for a given target . For example, the
Naive Bayes algorithm models and then transforms the probabilities into
conditional probabilities by applying the Bayes rule.
• Some popular generative algorithms are:
– Naive Bayes Classifier
– Generative Adversarial Networks
– Gaussian Mixture Model
– Hidden Markov Model
– Probabilistic context-free grammar
Discriminative Algorithms
• Discriminative algorithms focus on modeling a direct solution. For
example, the logistic regression algorithm models a decision boundary.
Then it decides on the outcome of an observation based on where it
stands relative to the decision boundary.
• Discriminative algorithms estimate posterior probabilities. Unlike the
generative algorithms, they don’t model the underlying probability
distributions. Formally, we model the conditional probability of
target given an observation .
• Some popular discriminative algorithms are:
– k-nearest neighbors (k-NN)
– Logistic regression
– Support Vector Machines (SVMs)
– Decision Trees
– Random Forest
– Artificial Neural Networks (ANNs)
• The generative approach focuses on modeling, whereas the
discriminative approach focuses on a solution. So, we can use
generative algorithms to generate new data points.
Discriminative algorithms don’t serve that purpose.
• Still, discriminative algorithms generally perform better for
classification tasks. That’s because they focus on solving the
actual problem directly instead of solving a more general
problem first.
• Yet, the real strength of generative algorithms lies in their
ability to express complex relationships between variables. In
other words, they have explanatory power. As a result, they
have successful use cases in NLP and medicine.
Generative Model Discriminative Model

Learns Probabilistic model Decision boundary

Estimates

Strength Converges faster Smaller error

Express complex
Explainability Low to none
relationships

Naive Bayes Classifier,


Examples Linear Regression, SVM
GAN

You might also like