Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
32 views

Business Data Mining Week 5

Business Data Mining

Uploaded by

pm6566
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Business Data Mining Week 5

Business Data Mining

Uploaded by

pm6566
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Week 5 - LAQ's

Explain in detail about the types of Machine


Learning?

Machine learning is the branch of Artificial Intelligence that focuses on


developing models and algorithms that let computers learn from data and improve from
previous experience without being explicitly programmed for every task. In simple
words, ML teaches the systems to think and understand like humans by learning from
the data.

Types of Machine Learning


There are several types of machine learning, each with special characteristics and
applications. Some of the main types of machine learning algorithms are as follows:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning


Supervised learning is defined as when a model gets trained on a “Labelled Dataset”.
Labelled datasets have both input and output parameters. In Supervised Learning
algorithms learn to map points between inputs and correct outputs. It has both training
and validation datasets labelled.

Pragalath EA2252001010013 1
Supervised Learning

Let’s understand it with the help of an example.


Example: Consider a scenario where you have to build an image classifier to differentiate
between cats and dogs. If you feed the datasets of dogs and cats labelled images to the
algorithm, the machine will learn to classify between a dog or a cat from these labeled
images. When we input new dog or cat images that it has never seen before, it will use
the learned algorithms and predict whether it is a dog or a cat. This is how supervised
learning works, and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:
 Classification
 Regression

Classification
Classification deals with predicting categorical target variables, which represent discrete
classes or labels. For instance, classifying emails as spam or not spam, or predicting
whether a patient has a high risk of heart disease. Classification algorithms learn to map
the input features to one of the predefined classes.
Here are some classification algorithms:
 Logistic Regression
 Support Vector Machine
 Random Forest

Pragalath EA2252001010013 2
 Decision Tree
 K-Nearest Neighbors (KNN)
 Naive Bayes

Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size,
location, and amenities, or forecasting the sales of a product. Regression algorithms learn
to map the input features to a continuous numerical value.
Here are some regression algorithms:
 Linear Regression
 Polynomial Regression
 Ridge Regression
 Lasso Regression
 Decision tree
 Random Forest

Advantages of Supervised Machine Learning


 Supervised Learning models can have high accuracy as they are trained on labelled
data.
 The process of decision-making in supervised learning models is often interpretable.
 It can often be used in pre-trained models which saves time and resources when
developing new models from scratch.

Disadvantages of Supervised Machine Learning


 It has limitations in knowing patterns and may struggle with unseen or unexpected
patterns that are not present in the training data.
 It can be time-consuming and costly as it relies on labeled data only.
 It may lead to poor generalizations based on new data.

Pragalath EA2252001010013 3
Applications of Supervised Learning
Supervised learning is used in a wide variety of applications, including:
 Image classification: Identify objects, faces, and other features in images.
 Natural language processing: Extract information from text, such as sentiment,
entities, and relationships.
 Speech recognition: Convert spoken language into text.
 Recommendation systems: Make personalized recommendations to users.
 Predictive analytics: Predict outcomes, such as sales, customer churn, and stock
prices.
 Medical diagnosis: Detect diseases and other medical conditions.
 Fraud detection: Identify fraudulent transactions.
 Autonomous vehicles: Recognize and respond to objects in the environment.
 Email spam detection: Classify emails as spam or not spam.
 Quality control in manufacturing: Inspect products for defects.
 Credit scoring: Assess the risk of a borrower defaulting on a loan.
 Gaming: Recognize characters, analyze player behavior, and create NPCs.
 Customer support: Automate customer support tasks.
 Weather forecasting: Make predictions for temperature, precipitation, and other
meteorological parameters.
 Sports analytics: Analyze player performance, make game predictions, and optimize
strategies.

2. Unsupervised Machine Learning


Unsupervised Learning Unsupervised learning is a type of machine learning technique in
which an algorithm discovers patterns and relationships using unlabeled data. Unlike
supervised learning, unsupervised learning doesn’t involve providing the algorithm with
labeled target outputs. The primary goal of Unsupervised learning is often to discover
hidden patterns, similarities, or clusters within the data, which can then be used for
various purposes, such as data exploration, visualization, dimensionality reduction, and
more.

Pragalath EA2252001010013 4
Unsupervised Learning

Let’s understand it with the help of an example.


Example: Consider that you have a dataset that contains information about the purchases
you made from the shop. Through clustering, the algorithm can group the same
purchasing behavior among you and other customers, which reveals potential customers
without predefined labels. This type of information can help businesses get target
customers as well as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
 Clustering
 Association

Clustering
Clustering is the process of grouping data points into clusters based on their similarity.
This technique is useful for identifying patterns and relationships in data without the need
for labeled examples.
Here are some clustering algorithms:
 K-Means Clustering algorithm
 Mean-shift algorithm
 DBSCAN Algorithm
 Principal Component Analysis
 Independent Component Analysis

Pragalath EA2252001010013 5
Association
Association rule learning is a technique for discovering relationships between items in a
dataset. It identifies rules that indicate the presence of one item implies the presence of
another item with a specific probability.
Here are some association rule learning algorithms:
 Apriori Algorithm
 Eclat
 FP-growth Algorithm

Advantages of Unsupervised Machine Learning


 It helps to discover hidden patterns and various relationships between the data.
 Used for tasks such as customer segmentation, anomaly detection, and data
exploration.
 It does not require labeled data and reduces the effort of data labeling.

Disadvantages of Unsupervised Machine Learning


 Without using labels, it may be difficult to predict the quality of the model’s output.
 Cluster Interpretability may not be clear and may not have meaningful interpretations.
 It has techniques such as autoencoders and dimensionality reduction that can be used
to extract meaningful features from raw data.

Applications of Unsupervised Learning


Here are some common applications of unsupervised learning:
 Clustering: Group similar data points into clusters.
 Anomaly detection: Identify outliers or anomalies in data.
 Dimensionality reduction: Reduce the dimensionality of data while preserving its
essential information.

Pragalath EA2252001010013 6
 Recommendation systems: Suggest products, movies, or content to users based on
their historical behavior or preferences.
 Topic modeling: Discover latent topics within a collection of documents.
 Density estimation: Estimate the probability density function of data.
 Image and video compression: Reduce the amount of storage required for
multimedia content.
 Data preprocessing: Help with data preprocessing tasks such as data cleaning,
imputation of missing values, and data scaling.
 Market basket analysis: Discover associations between products.
 Genomic data analysis: Identify patterns or group genes with similar expression
profiles.
 Image segmentation: Segment images into meaningful regions.
 Community detection in social networks: Identify communities or groups of
individuals with similar interests or connections.
 Customer behavior analysis: Uncover patterns and insights for better marketing and
product recommendations.
 Content recommendation: Classify and tag content to make it easier to recommend
similar items to users.
 Exploratory data analysis (EDA): Explore data and gain insights before defining
specific tasks.

3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the
supervised and unsupervised learning so it uses both labelled and unlabelled data. It’s
particularly useful when obtaining labeled data is costly, time-consuming, or resource-
intensive. This approach is useful when the dataset is expensive and time-consuming.
Semi-supervised learning is chosen when labeled data requires skills and relevant
resources in order to train or learn from it.
We use these techniques when we are dealing with data that is a little bit labeled and the
rest large portion of it is unlabeled. We can use the unsupervised techniques to predict
labels and then feed these labels to supervised techniques. This technique is mostly
applicable in the case of image data sets where usually all images are not labeled.

Pragalath EA2252001010013 7
Semi-Supervised Learning

Let’s understand it with the help of an example.


Example: Consider that we are building a language translation model, having labeled
translations for every sentence pair can be resources intensive. It allows the models to
learn from labeled and unlabeled sentence pairs, making them more accurate. This
technique has led to significant improvements in the quality of machine translation
services.

Types of Semi-Supervised Learning Methods


There are a number of different semi-supervised learning methods each with its own
characteristics. Some of the most common ones include:
 Graph-based semi-supervised learning: This approach uses a graph to represent the
relationships between the data points. The graph is then used to propagate labels from
the labeled data points to the unlabeled data points.
 Label propagation: This approach iteratively propagates labels from the labeled data
points to the unlabeled data points, based on the similarities between the data points.
 Co-training: This approach trains two different machine learning models on different
subsets of the unlabeled data. The two models are then used to label each other’s
predictions.
 Self-training: This approach trains a machine learning model on the labeled data and
then uses the model to predict labels for the unlabeled data. The model is then
retrained on the labeled data and the predicted labels for the unlabeled data.

Pragalath EA2252001010013 8
 Generative adversarial networks (GANs): GANs are a type of deep learning
algorithm that can be used to generate synthetic data. GANs can be used to generate
unlabeled data for semi-supervised learning by training two neural networks, a
generator and a discriminator.

Advantages of Semi- Supervised Machine Learning


 It leads to better generalization as compared to supervised learning, as it takes both
labeled and unlabeled data.
 Can be applied to a wide range of data.

Disadvantages of Semi- Supervised Machine Learning


 Semi-supervised methods can be more complex to implement compared to other
approaches.
 It still requires some labeled data that might not always be available or easy to obtain.
 The unlabeled data can impact the model performance accordingly.

Applications of Semi-Supervised Learning


Here are some common applications of semi-supervised learning:
 Image Classification and Object Recognition: Improve the accuracy of models by
combining a small set of labeled images with a larger set of unlabeled images.
 Natural Language Processing (NLP): Enhance the performance of language models
and classifiers by combining a small set of labeled text data with a vast amount of
unlabeled text.
 Speech Recognition: Improve the accuracy of speech recognition by leveraging a
limited amount of transcribed speech data and a more extensive set of unlabeled audio.
 Recommendation Systems: Improve the accuracy of personalized recommendations
by supplementing a sparse set of user-item interactions (labeled data) with a wealth
of unlabeled user behavior data.
 Healthcare and Medical Imaging: Enhance medical image analysis by utilizing a
small set of labeled medical images alongside a larger set of unlabeled images.

Pragalath EA2252001010013 9
4. Reinforcement Machine Learning
Reinforcement machine learning algorithm is a learning method that interacts with the
environment by producing actions and discovering errors. Trial, error, and delay are the
most relevant characteristics of reinforcement learning. In this technique, the model keeps
on increasing its performance using Reward Feedback to learn the behavior or pattern.
These algorithms are specific to a particular problem e.g. Google Self Driving car,
AlphaGo where a bot competes with humans and even itself to get better and better
performers in Go Game. Each time we feed in data, they learn and add the data to their
knowledge which is training data. So, the more it learns the better it gets trained and hence
experienced.
Here are some of most common reinforcement learning algorithms:
 Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function, which
maps states to actions. The Q-function estimates the expected reward of taking a
particular action in a given state.
 SARSA (State-Action-Reward-State-Action): SARSA is another model-free RL
algorithm that learns a Q-function. However, unlike Q-learning, SARSA updates the
Q-function for the action that was actually taken, rather than the optimal action.
 Deep Q-learning: Deep Q-learning is a combination of Q-learning and deep learning.
Deep Q-learning uses a neural network to represent the Q-function, which allows it to
learn complex relationships between states and actions.

Reinforcement Machine Learning

Let’s understand it with the help of examples.

Pragalath EA2252001010013 10
Example: Consider that you are training an AI agent to play a game like chess. The agent
explores different moves and receives positive or negative feedback based on the
outcome. Reinforcement Learning also finds applications in which they learn to perform
tasks by interacting with their surroundings.

Types of Reinforcement Machine Learning


There are two main types of reinforcement learning:
Positive reinforcement
 Rewards the agent for taking a desired action.
 Encourages the agent to repeat the behavior.
 Examples: Giving a treat to a dog for sitting, providing a point in a game for a correct
answer.

Negative reinforcement
 Removes an undesirable stimulus to encourage a desired behavior.
 Discourages the agent from repeating the behavior.
 Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by
completing a task.

Advantages of Reinforcement Machine Learning


 It has autonomous decision-making that is well-suited for tasks and that can learn to
make a sequence of decisions, like robotics and game-playing.
 This technique is preferred to achieve long-term results that are very difficult to
achieve.
 It is used to solve a complex problems that cannot be solved by conventional
techniques.

Disadvantages of Reinforcement Machine Learning


 Training Reinforcement Learning agents can be computationally expensive and time-
consuming.

Pragalath EA2252001010013 11
 Reinforcement learning is not preferable to solving simple problems.
 It needs a lot of data and a lot of computation, which makes it impractical and costly.

Applications of Reinforcement Machine Learning


Here are some applications of reinforcement learning:
 Game Playing: RL can teach agents to play games, even complex ones.
 Robotics: RL can teach robots to perform tasks autonomously.
 Autonomous Vehicles: RL can help self-driving cars navigate and make decisions.
 Recommendation Systems: RL can enhance recommendation algorithms by learning
user preferences.
 Healthcare: RL can be used to optimize treatment plans and drug discovery.
 Natural Language Processing (NLP): RL can be used in dialogue systems and
chatbots.
 Finance and Trading: RL can be used for algorithmic trading.
 Supply Chain and Inventory Management: RL can be used to optimize supply
chain operations.
 Energy Management: RL can be used to optimize energy consumption.
 Game AI: RL can be used to create more intelligent and adaptive NPCs in video
games.
 Adaptive Personal Assistants: RL can be used to improve personal assistants.
 Virtual Reality (VR) and Augmented Reality (AR): RL can be used to create
immersive and interactive experiences.
 Industrial Control: RL can be used to optimize industrial processes.
 Education: RL can be used to create adaptive learning systems.
 Agriculture: RL can be used to optimize agricultural operations.

5. Self-Supervised Learning
SSL is a type of machine learning where the model is trained without explicit human-
labeled data. Instead, the learning process involves the model generating its labels from the
input data by exploiting the inherent structure or context of the data. This approach falls
under the broader category of unsupervised learning but is distinct in using its predictions

Pragalath EA2252001010013 12
as supervision.

Example of Self-Supervised Learning


A common example of SSL is in the domain of natural language processing. Consider the
task of predicting the next word in a sentence. The model, such as BERT (Bidirectional
Encoder Representations from Transformers), is given sentences where some words are
masked. The model's job is to predict the masked words based on the context of the other
unmasked words in the sentence.

Categories of Self-Supervised Learning


 Generative SSL: The model learns to generate or reconstruct parts of the input data. For
instance, an image processing model might be trained to reconstruct an image with
some parts removed.
 Contrastive SSL: The model learns by contrasting similar and dissimilar instances. For
example, in image processing, the model is trained to recognize that two different views
of the same object are more similar than views of different objects.

Advantages of Self-Supervised Learning


 Reduced Need for Labelled Data: SSL significantly reduces the reliance on large,
expensive and time-consuming labeled datasets.
 Better Generalization: By learning from the data's inherent structure, SSL models can
generalize better to new, unseen data than models trained on narrow, human-labeled
datasets.
 Flexible and Scalable: SSL can be applied to any data type without needing specific
annotations, making it flexible across different domains and scalable to large datasets.
 Robust Features: Models trained using SSL often learn more robust and comprehensive
features that can be useful for multiple tasks beyond the one they were trained for.
 Efficiency in Data Utilization: SSL maximizes the utility of available data, extracting
meaningful patterns and structures without needing explicit labels.

Disadvantages of Self-Supervised Learning


 Dependency on Data Quality: SSL's success heavily depends on the quality and

Pragalath EA2252001010013 13
diversity of the input data. Poor data quality can lead to poor model performance.
 Complex Model Architectures: SSL often requires more complex model architectures
and training processes to learn from unlabeled data effectively.
 Limited by Data Intrinsic Structure: If the intrinsic structure of the data does not provide
meaningful information for learning, SSL may not perform effectively.

Applications of Self-Supervised Learning


 Natural Language Processing: SSL is used in models like BERT for tasks such as
sentence completion, translation, and sentiment analysis.
 Computer Vision: SSL techniques are used to improve image classification, object
detection, and even medical image analysis by learning from unlabeled images.
 Speech Recognition: SSL helps develop models to understand and transcribe speech by
learning from raw audio data.
 Robotics: Robots can use SSL to learn from their interactions with the environment,
improving their understanding and interaction capabilities without human intervention.
 Anomaly Detection: SSL can be used to understand what 'normal' data looks like and
identify outliers or anomalies, which is crucial in cybersecurity and fraud detection.

Understanding the Impact of Machine Learning


Machine Learning has had a transformative impact across various industries,
revolutionizing traditional processes and paving the way for innovation. Let's explore some
of these impacts:

“Machine learning is the most transformative technology of our time. It’s going to
transform every single vertical.”

- Satya Nadella, CEO at Microsoft

Healthcare
In healthcare, machine learning is used to predict disease outbreaks, personalize patient
treatment plans, and improve medical imaging accuracy. For instance, Google's
DeepMind Health is working with doctors to build machine learning models to detect
diseases earlier and improve patient care.

Pragalath EA2252001010013 14
Finance
The finance sector has also greatly benefited from machine learning. It's used for credit
scoring, algorithmic trading, and fraud detection. A recent survey found that 56% of global
executives said that artificial intelligence (AI) and machine learning have been
implemented into financial crime compliance programs.

Transportation
Machine learning is at the heart of the self-driving car revolution. Companies like Tesla
and Waymo use machine learning algorithms to interpret sensor data in real-time, allowing
their vehicles to recognize objects, make decisions, and navigate roads autonomously.
Similarly, the Swedish Transport Administration recently started working with computer
vision and machine learning specialists to optimize the country’s road infrastructure
management.

Some Applications of Machine Learning


Machine learning applications are all around us, often working behind the scenes to
enhance our daily lives. Here are some real-world examples:

Recommendation systems
Recommendation systems are one of the most visible applications of machine learning.
Companies like Netflix and Amazon use machine learning to analyze your past behavior
and recommend products or movies you might like. Learn how to build a
recommendation engine in Python with our online course.

Voice assistants
Voice assistants like Siri, Alexa, and Google Assistant use machine learning to understand
your voice commands and provide relevant responses. They continually learn from your
interactions to improve their performance.

Fraud detection
Banks and credit card companies use machine learning to detect fraudulent transactions.

Pragalath EA2252001010013 15
By analyzing patterns of normal and abnormal behavior, they can flag suspicious activity
in real-time. We have a fraud detection in Python course, which explores the concept in
more detail.

Social media
Social media platforms use machine learning for a variety of tasks, from personalizing your
feed to filtering out inappropriate content.

Our machine learning cheat sheet covers different algorithms and their uses

Machine Learning Tools


In the world of machine learning, having the right tools is just as important as understanding
the concepts. These tools, which include programming languages and libraries, provide the
building blocks to implement and deploy machine learning algorithms. Let's explore some
of the most popular tools in machine learning:

Pragalath EA2252001010013 16
Python for machine learning
Python is a popular language for machine learning due to its simplicity and readability,
making it a great choice for beginners. It also has a strong ecosystem of libraries that are
tailored for machine learning.

Libraries such as NumPy and Pandas are used for data manipulation and analysis, while
Matplotlib is used for data visualization. Scikit-learn provides a wide range of machine
learning algorithms, and TensorFlow and PyTorch are used for building and training neural
networks.

Resources to get you started


 Machine Learning Fundamentals with Python Skill Track
 Machine Learning Scientist with Python Career Track
 Introduction to Machine Learning in Python Tutorial

R for machine learning


R is another language widely used in machine learning, particularly for statistical analysis.
It has a rich ecosystem of packages that make it easy to implement machine learning
algorithms.

Packages like caret, mlr, and randomForest provide a variety of machine learning
algorithms, from regression and classification to clustering and dimensionality reduction.

Resources to get you started


 Machine Learning Fundamentals in R Skill Track
 Machine Learning Scientist with R Career Track
 Machine Learning in R for beginners Tutorial

TensorFlow
TensorFlow is a powerful open-source library for numerical computation, particularly well-
suited for large-scale machine learning. It was developed by the Google Brain team and
supports both CPUs and GPUs.

Pragalath EA2252001010013 17
TensorFlow allows you to build and train complex neural networks, making it a popular
choice for deep learning applications.

Resources to get you started


 Introduction to TensorFlow in Python Course
 TensorFlow Tutorial For Beginners
 Python Convolutional Neural Networks (CNN) with TensorFlow Tutorial

Scikit-learn
Scikit-learn is a Python library that provides a wide range of machine learning algorithms
for both supervised and unsupervised learning. It's known for its clear API and detailed
documentation.

Scikit-learn is often used for data mining and data analysis, and it integrates well with other
Python libraries like NumPy and Pandas.

Resources to get you started


 Machine Learning with scikit-learn Course | DataCamp
 Supervised Learning with scikit-learn Course | DataCamp
 Python Machine Learning: Scikit-Learn Tutorial
 Scikit-Learn Cheat Sheet: Python Machine Learning

Keras
Keras is a high-level neural networks API, written in Python and capable of running on top
of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast
experimentation.

Keras provides a user-friendly interface for building and training neural networks, making
it a great choice for beginners in deep learning.

Resources to get you started


 Introduction to Deep Learning with Keras Course
 Advanced Deep Learning with Keras Course
 Keras Tutorial: Deep Learning in Python

Pragalath EA2252001010013 18
 Keras Cheat Sheet: Neural Networks in Python

PyTorch
PyTorch is an open-source machine learning library based on the Torch library. It's known
for its flexibility and efficiency, making it popular among researchers.

PyTorch supports a wide range of applications, from computer vision to natural language
processing. One of its key features is the dynamic computational graph, which allows for
flexible and optimized computation.

Resources to get you started


 Introduction to Deep Learning in PyTorch Course
 Deep Learning with PyTorch Course
 PyTorch Tutorial: Building a Simple Neural Network From Scratch
 PyTorch 2.0: Unveiling the Latest Updates and Insights with Code Examples

Pragalath EA2252001010013 19

You might also like