Business Data Mining Week 5
Business Data Mining Week 5
Pragalath EA2252001010013 1
Supervised Learning
Classification
Classification deals with predicting categorical target variables, which represent discrete
classes or labels. For instance, classifying emails as spam or not spam, or predicting
whether a patient has a high risk of heart disease. Classification algorithms learn to map
the input features to one of the predefined classes.
Here are some classification algorithms:
Logistic Regression
Support Vector Machine
Random Forest
Pragalath EA2252001010013 2
Decision Tree
K-Nearest Neighbors (KNN)
Naive Bayes
Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size,
location, and amenities, or forecasting the sales of a product. Regression algorithms learn
to map the input features to a continuous numerical value.
Here are some regression algorithms:
Linear Regression
Polynomial Regression
Ridge Regression
Lasso Regression
Decision tree
Random Forest
Pragalath EA2252001010013 3
Applications of Supervised Learning
Supervised learning is used in a wide variety of applications, including:
Image classification: Identify objects, faces, and other features in images.
Natural language processing: Extract information from text, such as sentiment,
entities, and relationships.
Speech recognition: Convert spoken language into text.
Recommendation systems: Make personalized recommendations to users.
Predictive analytics: Predict outcomes, such as sales, customer churn, and stock
prices.
Medical diagnosis: Detect diseases and other medical conditions.
Fraud detection: Identify fraudulent transactions.
Autonomous vehicles: Recognize and respond to objects in the environment.
Email spam detection: Classify emails as spam or not spam.
Quality control in manufacturing: Inspect products for defects.
Credit scoring: Assess the risk of a borrower defaulting on a loan.
Gaming: Recognize characters, analyze player behavior, and create NPCs.
Customer support: Automate customer support tasks.
Weather forecasting: Make predictions for temperature, precipitation, and other
meteorological parameters.
Sports analytics: Analyze player performance, make game predictions, and optimize
strategies.
Pragalath EA2252001010013 4
Unsupervised Learning
Clustering
Clustering is the process of grouping data points into clusters based on their similarity.
This technique is useful for identifying patterns and relationships in data without the need
for labeled examples.
Here are some clustering algorithms:
K-Means Clustering algorithm
Mean-shift algorithm
DBSCAN Algorithm
Principal Component Analysis
Independent Component Analysis
Pragalath EA2252001010013 5
Association
Association rule learning is a technique for discovering relationships between items in a
dataset. It identifies rules that indicate the presence of one item implies the presence of
another item with a specific probability.
Here are some association rule learning algorithms:
Apriori Algorithm
Eclat
FP-growth Algorithm
Pragalath EA2252001010013 6
Recommendation systems: Suggest products, movies, or content to users based on
their historical behavior or preferences.
Topic modeling: Discover latent topics within a collection of documents.
Density estimation: Estimate the probability density function of data.
Image and video compression: Reduce the amount of storage required for
multimedia content.
Data preprocessing: Help with data preprocessing tasks such as data cleaning,
imputation of missing values, and data scaling.
Market basket analysis: Discover associations between products.
Genomic data analysis: Identify patterns or group genes with similar expression
profiles.
Image segmentation: Segment images into meaningful regions.
Community detection in social networks: Identify communities or groups of
individuals with similar interests or connections.
Customer behavior analysis: Uncover patterns and insights for better marketing and
product recommendations.
Content recommendation: Classify and tag content to make it easier to recommend
similar items to users.
Exploratory data analysis (EDA): Explore data and gain insights before defining
specific tasks.
3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the
supervised and unsupervised learning so it uses both labelled and unlabelled data. It’s
particularly useful when obtaining labeled data is costly, time-consuming, or resource-
intensive. This approach is useful when the dataset is expensive and time-consuming.
Semi-supervised learning is chosen when labeled data requires skills and relevant
resources in order to train or learn from it.
We use these techniques when we are dealing with data that is a little bit labeled and the
rest large portion of it is unlabeled. We can use the unsupervised techniques to predict
labels and then feed these labels to supervised techniques. This technique is mostly
applicable in the case of image data sets where usually all images are not labeled.
Pragalath EA2252001010013 7
Semi-Supervised Learning
Pragalath EA2252001010013 8
Generative adversarial networks (GANs): GANs are a type of deep learning
algorithm that can be used to generate synthetic data. GANs can be used to generate
unlabeled data for semi-supervised learning by training two neural networks, a
generator and a discriminator.
Pragalath EA2252001010013 9
4. Reinforcement Machine Learning
Reinforcement machine learning algorithm is a learning method that interacts with the
environment by producing actions and discovering errors. Trial, error, and delay are the
most relevant characteristics of reinforcement learning. In this technique, the model keeps
on increasing its performance using Reward Feedback to learn the behavior or pattern.
These algorithms are specific to a particular problem e.g. Google Self Driving car,
AlphaGo where a bot competes with humans and even itself to get better and better
performers in Go Game. Each time we feed in data, they learn and add the data to their
knowledge which is training data. So, the more it learns the better it gets trained and hence
experienced.
Here are some of most common reinforcement learning algorithms:
Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function, which
maps states to actions. The Q-function estimates the expected reward of taking a
particular action in a given state.
SARSA (State-Action-Reward-State-Action): SARSA is another model-free RL
algorithm that learns a Q-function. However, unlike Q-learning, SARSA updates the
Q-function for the action that was actually taken, rather than the optimal action.
Deep Q-learning: Deep Q-learning is a combination of Q-learning and deep learning.
Deep Q-learning uses a neural network to represent the Q-function, which allows it to
learn complex relationships between states and actions.
Pragalath EA2252001010013 10
Example: Consider that you are training an AI agent to play a game like chess. The agent
explores different moves and receives positive or negative feedback based on the
outcome. Reinforcement Learning also finds applications in which they learn to perform
tasks by interacting with their surroundings.
Negative reinforcement
Removes an undesirable stimulus to encourage a desired behavior.
Discourages the agent from repeating the behavior.
Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by
completing a task.
Pragalath EA2252001010013 11
Reinforcement learning is not preferable to solving simple problems.
It needs a lot of data and a lot of computation, which makes it impractical and costly.
5. Self-Supervised Learning
SSL is a type of machine learning where the model is trained without explicit human-
labeled data. Instead, the learning process involves the model generating its labels from the
input data by exploiting the inherent structure or context of the data. This approach falls
under the broader category of unsupervised learning but is distinct in using its predictions
Pragalath EA2252001010013 12
as supervision.
Pragalath EA2252001010013 13
diversity of the input data. Poor data quality can lead to poor model performance.
Complex Model Architectures: SSL often requires more complex model architectures
and training processes to learn from unlabeled data effectively.
Limited by Data Intrinsic Structure: If the intrinsic structure of the data does not provide
meaningful information for learning, SSL may not perform effectively.
“Machine learning is the most transformative technology of our time. It’s going to
transform every single vertical.”
Healthcare
In healthcare, machine learning is used to predict disease outbreaks, personalize patient
treatment plans, and improve medical imaging accuracy. For instance, Google's
DeepMind Health is working with doctors to build machine learning models to detect
diseases earlier and improve patient care.
Pragalath EA2252001010013 14
Finance
The finance sector has also greatly benefited from machine learning. It's used for credit
scoring, algorithmic trading, and fraud detection. A recent survey found that 56% of global
executives said that artificial intelligence (AI) and machine learning have been
implemented into financial crime compliance programs.
Transportation
Machine learning is at the heart of the self-driving car revolution. Companies like Tesla
and Waymo use machine learning algorithms to interpret sensor data in real-time, allowing
their vehicles to recognize objects, make decisions, and navigate roads autonomously.
Similarly, the Swedish Transport Administration recently started working with computer
vision and machine learning specialists to optimize the country’s road infrastructure
management.
Recommendation systems
Recommendation systems are one of the most visible applications of machine learning.
Companies like Netflix and Amazon use machine learning to analyze your past behavior
and recommend products or movies you might like. Learn how to build a
recommendation engine in Python with our online course.
Voice assistants
Voice assistants like Siri, Alexa, and Google Assistant use machine learning to understand
your voice commands and provide relevant responses. They continually learn from your
interactions to improve their performance.
Fraud detection
Banks and credit card companies use machine learning to detect fraudulent transactions.
Pragalath EA2252001010013 15
By analyzing patterns of normal and abnormal behavior, they can flag suspicious activity
in real-time. We have a fraud detection in Python course, which explores the concept in
more detail.
Social media
Social media platforms use machine learning for a variety of tasks, from personalizing your
feed to filtering out inappropriate content.
Our machine learning cheat sheet covers different algorithms and their uses
Pragalath EA2252001010013 16
Python for machine learning
Python is a popular language for machine learning due to its simplicity and readability,
making it a great choice for beginners. It also has a strong ecosystem of libraries that are
tailored for machine learning.
Libraries such as NumPy and Pandas are used for data manipulation and analysis, while
Matplotlib is used for data visualization. Scikit-learn provides a wide range of machine
learning algorithms, and TensorFlow and PyTorch are used for building and training neural
networks.
Packages like caret, mlr, and randomForest provide a variety of machine learning
algorithms, from regression and classification to clustering and dimensionality reduction.
TensorFlow
TensorFlow is a powerful open-source library for numerical computation, particularly well-
suited for large-scale machine learning. It was developed by the Google Brain team and
supports both CPUs and GPUs.
Pragalath EA2252001010013 17
TensorFlow allows you to build and train complex neural networks, making it a popular
choice for deep learning applications.
Scikit-learn
Scikit-learn is a Python library that provides a wide range of machine learning algorithms
for both supervised and unsupervised learning. It's known for its clear API and detailed
documentation.
Scikit-learn is often used for data mining and data analysis, and it integrates well with other
Python libraries like NumPy and Pandas.
Keras
Keras is a high-level neural networks API, written in Python and capable of running on top
of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast
experimentation.
Keras provides a user-friendly interface for building and training neural networks, making
it a great choice for beginners in deep learning.
Pragalath EA2252001010013 18
Keras Cheat Sheet: Neural Networks in Python
PyTorch
PyTorch is an open-source machine learning library based on the Torch library. It's known
for its flexibility and efficiency, making it popular among researchers.
PyTorch supports a wide range of applications, from computer vision to natural language
processing. One of its key features is the dynamic computational graph, which allows for
flexible and optimized computation.
Pragalath EA2252001010013 19