Module 4 ISML
Module 4 ISML
Algorithms
Gahan A V
Assistant Professor
Department of Electronics and Communication Engineering
Bangalore Institute of Technology
BANGALORE INSTITUTE OF TECHNOLOGY
Module 4
2
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY
Machine Learning (ML) is when computers learn from data to make decisions or predictions without being explicitly
programmed.
•Examples: Spam filters, recommendations, and voice search.
•What ML is NOT: Downloading data (like Wikipedia) doesn’t mean the computer "learns." Learning involves
recognizing patterns or making predictions.
Key types of ML:
1.Supervised Learning: Learning with labeled data (e.g., email marked as "spam" or "not spam").
2.Unsupervised Learning: Finding patterns in unlabeled data (e.g., grouping similar customers).
3.Online vs. Batch Learning: Online updates continuously; batch learns in chunks.
4.Instance-based vs. Model-based: Instance uses specific examples; model creates general rules.
ML projects include collecting data, training models, evaluating them, and fine-tuning for better performance.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY
Machine Learning (ML) is about teaching computers to learn from data and improve at tasks without being explicitly
programmed.
Key points:
1.Definition: ML is when a computer improves at a task (T) using experience (E) and is measured by performance (P).
1. Example: A spam filter learns to detect spam (T) by analyzing training data (E) and is judged by its accuracy
(P).
2.How it works: ML uses a training set (examples it learns from) and measures improvement over time.
3.What ML is NOT: Just having more data (like downloading Wikipedia) doesn’t mean a computer "learns." It must
apply data to improve a specific task.
Traditional programming to build a spam filter involves manually creating rules based on observed patterns in spam
emails. Here's how it works:
1.Analyze Patterns: Identify common spam features like specific words (“free,” “credit card”) or unusual sender details.
2.Write Rules: Manually code algorithms to detect these patterns.
3.Test and Refine: Test the program, adjust rules, and repeat until it performs well.
This approach is time-consuming and inflexible because new patterns require rewriting the code. In contrast, Machine
Learning automates this by letting the program learn patterns from data.
Machine Learning excels in tasks that are too complex for traditional programming or where no clear rules exist.
1.Spam Filtering:
1. It notices new patterns, like "For U" becoming common in spam, and adapts automatically based on user feedback.
2. No manual updates are needed, unlike rule-based systems.
2.Speech Recognition:
1. A traditional program might detect specific features (e.g., high-pitch sounds for "T"), but this doesn't scale for thousands of
words across accents, noise, and languages.
2. ML instead learns patterns from many recordings, making it effective and adaptable for diverse speech inputs.
This ability to learn and adapt makes ML ideal for dynamic and complex problems.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY
Machine Learning doesn't just solve problems—it also helps humans learn:
1.Understanding Patterns: ML models can show what they’ve learned, like the words or combinations that best predict spam.
1. Example: A trained spam filter might reveal unexpected patterns, such as new spam trends.
2.Discovering Insights: By analyzing large datasets, ML can uncover hidden patterns or relationships that humans might miss.
1. This process is called data mining, and it’s used to gain deeper insights into complex problems.
ML not only automates tasks but also enhances human understanding through its ability to analyze and reveal data-driven insights.
Types of Machine Learning Systems can be classified based on the following criteria:
1.Supervision in Training:
1. Supervised Learning: Trained with labeled data (e.g., spam vs. not spam).
2. Unsupervised Learning: Trained on unlabeled data to find patterns (e.g., grouping customers).
3. Semi-supervised Learning: Uses a mix of labeled and unlabeled data.
4. Reinforcement Learning: Learns by trial and error, receiving feedback as rewards or penalties.
2. Learning Style:
1. Online Learning: Learns incrementally as new data arrives.
2. Batch Learning: Trains on all the data at once and doesn’t update until retrained.
3. Approach to Learning:
1. Instance-Based Learning: Compares new data directly to known data (e.g., k-Nearest Neighbors).
2. Model-Based Learning: Detects patterns in training data and builds a predictive model (e.g., neural networks).
These categories can combine. For example, a spam filter could be supervised, online, and model-based, learning incrementally
with labeled data using a neural network.
•Unsupervised Learning:
•The system gets unlabeled data and figures out patterns on its own.
•Like a detective finding hidden groups or trends in a dataset.
•Example: Grouping customers into similar segments for marketing.
•Semi-supervised Learning:
•A mix of both: some data is labeled, and the rest is not.
•Like a student learning with a few examples and figuring out the rest by guessing.
•Example: Identifying objects in photos when only a few are labeled.
DEPARTMENT
. OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY
•Reinforcement Learning:
•The system learns by trial and error and gets rewards for
good actions.
•Think of training a dog: you reward it when it performs the
right trick.
•Example: Teaching a robot to navigate a maze by rewarding it
for reaching the exit
Supervised learning
In supervised learning, the algorithm learns using training data that includes both the
input (example) and the correct output (label).
For example, a spam filter:
•You give it a bunch of emails labeled as "spam" or "not spam" (ham).
•The system studies these examples and learns to recognize patterns.
•Later, it uses what it learned to decide if a new email is spam or not.
This process of learning to classify things (like spam or not spam) is called classification.
In regression, the goal is to predict a numeric value (like a car's price) based on given features (like mileage, age, or brand).
For example, predicting a car's price:
•You give the system many examples of cars with their features (predictors) and their actual prices (labels).
•The system learns patterns to estimate the price of a new car based on its features.
Key Points:
•Regression predicts numbers (e.g., prices).
•Some algorithms, like Logistic Regression, can also do classification. For instance, it can predict the probability of an email
being spam (e.g., 20%).
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY
Unsupervised learning
In unsupervised learning, the training data does not have labels (no answers are provided). The system figures out patterns or
structures on its own, like exploring a puzzle without guidance.
Key Tasks and Algorithms:
1.Clustering: Grouping similar items together.
1. Examples:
1. K-Means: Groups data into clusters based on similarity.
2. DBSCAN: Finds clusters of any shape.
3. HCA (Hierarchical Cluster Analysis): Creates a tree of clusters.
2.Anomaly Detection & Novelty Detection: Spotting unusual data points.
1. Examples:
1. One-class SVM: Identifies outliers.
2. Isolation Forest: Detects anomalies by isolating them in the data.
3. Visualization & Dimensionality Reduction: Simplifies data to make it easier to visualize or work with.
1. Examples:
1. PCA (Principal Component Analysis): Reduces data dimensions.
2. t-SNE: Visualizes data in 2D or 3D.
4. Association Rule Learning: Finding relationships between items in data.
1. Examples:
1. Apriori: Discovers itemset rules (e.g., "People who buy bread often buy butter").
2. Eclat: Similar to Apriori but uses a different approach to find associations.
Unsupervised learning is like finding hidden patterns or insights in raw data without any predefined guidance!
In unsupervised learning, like clustering, the system organizes data into groups without any labels or predefined categories.
Example: Blog Visitors
You have data about your blog's visitors but don't know much about their preferences.
•A clustering algorithm analyzes the data and finds patterns, grouping visitors with similar traits.
•For example, it might identify:
• 40% are males who love comic books and read in the evenings.
• 20% are young sci-fi fans who visit on weekends.
Using this insight, you can:
•Write posts tailored for each group.
•If you use a hierarchical clustering algorithm, it can break these groups into smaller, more specific subgroups, helping you
refine your targeting even further.
This way, you better understand and engage with your audience!
Visualization algorithms are a type of unsupervised learning that help you understand complex, unlabeled data by converting it
into a simpler form, like a 2D or 3D plot.
How It Works:
•You give the algorithm a lot of data.
•It processes the data and creates a visual representation, trying to maintain the structure and relationships within the data.
•For example, if there are clusters in the data, it ensures they stay distinct in the visualization.
Why It’s Useful:
•Helps you see how data is organized.
•Makes it easier to spot patterns or trends you didn’t know existed.
Example:
•t-SNE or PCA might turn a large dataset of customer behavior into a 2D graph where similar customers are grouped together.
This helps you understand your audience better or find new insights!
Anomaly detection is an unsupervised learning task where the system identifies unusual or rare data points that don’t fit the
normal pattern.
Examples:
•Detecting fraudulent credit card transactions, Spotting manufacturing defects in products., Removing outliers from a
dataset to improve analysis.
How It Works:
•During training, the system mostly sees normal data and learns its typical patterns.
•When it encounters new data, it checks if it looks normal or if it might be an anomaly.
Novelty Detection vs. Anomaly Detection:
•Novelty Detection:
• Trained with only normal data.
• Used to detect completely new or unusual patterns.
•Anomaly Detection:
• Tolerates a small percentage of anomalies in the
training set.
• Designed to spot rare outliers in a broader context.
Both tasks are critical in areas like fraud prevention, quality
control, and data preprocessing.
Semisupervised learning
Semisupervised learning is a type of machine learning where the system learns from a mix of labeled and unlabeled data, often
with a small amount of labeled data and a large amount of unlabeled data.
Example: Google Photos
1.Unsupervised part:
1. The system groups photos by recognizing patterns, like identifying that the same person appears in photos 1, 5, and 11
(clustering).
2.Supervised part:
1. You provide a label for each person (e.g., "This is John").
2. The system learns from your labels and identifies the same person in all other photos.
Why Use Semisupervised Learning?
•Labeled data is expensive and time-consuming to create, but unlabeled data is abundant.
•Combining both helps improve learning with minimal labeled data.
Example Algorithm:
•Deep Belief Networks (DBNs):
• Start with unsupervised learning to train parts of the system (e.g., Restricted Boltzmann Machines).
• Then, refine the entire system with supervised learning for better accuracy.
Semisupervised learning bridges the gap between unsupervised and supervised learning, making it a powerful tool in
practical applications.
Reinforcement Learning
Reinforcement Learning (RL) is a unique type of learning where an agent learns by interacting with its environment and
receiving feedback in the form of rewards (positive) or penalties (negative).
Key Concepts:
•Agent: The learner or decision-maker.
•Environment: The system the agent interacts with.
•Actions: Choices the agent can make.
•Policy: A strategy that tells the agent which action to take in a given situation.
•Goal: Maximize total rewards over time by learning the best policy.
Example:
•A robot learning to walk:
• It tries different movements, gets rewards for successful steps, and penalties for falling.
• Over time, it learns the best way to walk steadily.
•AlphaGo (by DeepMind):
• Learned to play the game of Go by analyzing millions of games and practicing against itself.
• During actual matches (e.g., against world champion Ke Jie), it followed the policy it had already learned to make
winning moves.
Why It’s Unique:
Reinforcement Learning is about trial and error, exploring actions to learn the best strategy without explicit instructions,
making it ideal for complex tasks like robotics, games, and autonomous systems.
2. Online Learning
•How it works:
• The system learns incrementally, processing data instances one at a time or in small groups (mini-batches).
• It can adapt to new data on the fly, making it suitable for real-time or continuously changing environments.
•Advantages:
• Efficient and fast for large datasets or streaming data.
• Requires less memory as old data can be discarded after learning.
• Ideal for scenarios like stock price prediction or autonomous systems with limited resources.
Generalization in Machine Learning refers to the ability of a system to apply what it has learned from training data to new,
unseen examples. The goal is not just to perform well on the training data, but to make accurate predictions on new data as
well. There are two primary approaches to generalization:
1. Instance-based Learning
•How it works:
•The system "learns by heart" and memorizes the training examples.
•To make predictions, the system compares new instances to the training examples using a similarity measure.
•For example, a spam filter might flag an email as spam if it is similar to previously flagged spam emails.
•The system doesn't build a model but instead relies on finding the closest matches in the training data.
•Example:
•If you have a set of known spam emails, the filter will flag a new email as spam if it shares many common words with the
spam emails.
•Advantages:
•Simple and intuitive.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY
2. Model-based Learning
•How it works:
•The system learns a model from the training data, which can then be used to make predictions on new data.
•Instead of memorizing the data, the system tries to generalize by creating a model (like a decision tree, neural
network, or linear regression).
•This model can then be applied to new, unseen data to make predictions.
•Advantages:
•More scalable and efficient for large datasets.
•Builds a more flexible, generalizable model that can handle variations in new data.
•Disadvantages:
•The model may not always generalize well, especially if it's too complex (leading to overfitting) or too simple (leading to
underfitting).
Summary:
•Instance-based Learning: Directly compares new data to training data using similarity, useful for simpler tasks but can be
inefficient and less flexible.
•Model-based Learning: Builds a model from training data and uses it for predictions, more scalable and flexible for complex
tasks.
"Bad data" refers to data that is incomplete, incorrect, or poorly structured, which can negatively affect the
performance of a learning algorithm. Examples include:
1.Missing values: Some data points might have missing information.
2.Incorrect labels: In supervised learning, mislabeled data can confuse the model.
3.Noise: Data with a lot of irrelevant or random information can lead to poor results.
4.Outliers: Extreme values that don't represent the majority of the data can distort the model.
These issues make it harder for the algorithm to learn patterns accurately, leading to poor predictions.
• A toddler can quickly learn what an apple is just by being shown one and hearing the word "apple." They can then
recognize apples of different colors and shapes.
• Machine learning, however, needs much more data to learn. For even simple problems, algorithms typically
require thousands of examples to perform well.
• For complex tasks like recognizing images or speech, millions of examples are often needed unless you can use
parts of an already trained model to help.
By adding missing data or making sure that the training dataset is representative of the variety of new cases you want to
generalize to, you can help the model generalize better to new data, avoiding biases or gaps in its predictions.
When you train a linear model on incomplete or nonrepresentative data, the model may not generalize well to new, unseen
cases. This happens because the model is trying to fit a simple line to data that doesn’t actually follow a simple trend.
Example:
•You train a linear model on data about countries' happiness versus their income.
•The solid line represents the updated model after adding the missing countries, while the dotted line is the old model trained on
incomplete data.
•The new model reveals that:
• Very rich countries don’t seem much happier than moderately rich ones.
• Some poor countries appear happier than many rich countries.
This is a key problem: the old model, based on incomplete data, over-simplifies the relationship between income and happiness,
and fails to capture the complexity of the actual data.
Key Takeaways:
1.Nonrepresentative Data: Using a training set that is missing data or doesn't reflect the diversity of new cases will likely
lead to a model that doesn't make accurate predictions for certain groups (e.g., very rich or very poor countries).
2.Sampling Bias: If the data collection process is flawed or incomplete, it can introduce sampling bias:
1. Small sample: Leads to sampling noise, where the data might not represent the full range of possibilities.
2. Large sample with flawed sampling: Even a large dataset can be nonrepresentative if the method of sampling is
biased.
3.Representative Training Set: To build a reliable model, the training set must be representative of the full range of
data the model will encounter in the real world. Otherwise, the model may struggle to generalize correctly to new or
unusual cases.
In summary, it's essential to ensure that your training data covers the full spectrum of scenarios, especially when predicting
real-world outcomes that can be influenced by many factors.
Poor-Quality Data
When training a machine learning model, the quality of your training data plays a critical role in the performance of your
model. If the data is noisy, contains errors, or has outliers, the model will have a harder time identifying the true underlying
patterns, and its performance may suffer.
Key Steps to Improve Data Quality:
1.Dealing with Outliers:
1. Outliers are data points that are significantly different from others and may distort the model.
2. If outliers are due to errors, they can often be discarded or corrected manually.
3. If they are genuine, you may need to decide whether they should be kept based on their relevance to the problem.
The saying "garbage in, garbage out" highlights that the quality of the training data directly affects how well a
machine learning model will perform. If the data is full of errors or irrelevant information, the model will struggle to
learn useful patterns.
A critical part of building a successful machine learning system is ensuring that the training data contains the right
information (relevant features). This is where feature engineering comes in, which involves:
1. Feature Selection:
•This means choosing the most useful features from the existing data. For example, in a dataset predicting house
prices, features like square footage and location are useful, while color of the house might not be.
2. Feature Extraction:
•This involves combining or transforming existing features to create more useful ones. For example, you might
combine height and weight into a new feature, body mass index (BMI), which could be more informative for a health
prediction model.
Why It Matters:
•Good features lead to better models, so spending time on feature engineering is crucial.
•If your features are irrelevant or poorly chosen, the model will not learn useful patterns, leading to poor predictions or
performance.
Next, after preparing the data, it’s important to also ensure that the algorithms you use are suitable for the task. Just having
clean and relevant data isn’t enough if the algorithm is flawed.
• Overfitting in machine learning is when a model learns the details and noise in the training data too well, but fails to
generalize to new, unseen data.
• It's like memorizing answers for a test without understanding the material — you may do well on the test you studied
for, but struggle with any new questions.
Example:
• Imagine you're trying to predict life satisfaction using a model. If you use a high-degree polynomial (a very complex
model), it might fit the training data very closely, showing perfect results on the data you trained it with.
• But, this can lead to overfitting, as it captures every small fluctuation in the data, including noise or random variations
that don’t reflect real-world patterns.
Underfitting occurs when a model is too simple to capture the underlying patterns in the data. It happens when the model doesn't
learn enough from the training data, resulting in poor performance both on the training data and new data.
Example:
If you try to predict life satisfaction using a simple linear model (a straight line), the model may not capture the complexity of
reality, since life satisfaction depends on many factors (e.g., income, health, social relationships). The linear model will give
inaccurate predictions because it’s too simplistic.
Definition: A computer program "learns" if it gets better at a task over time by using experience (E), and its performance is
measured by some performance measure (P).
Summary:
In these examples, machine learning helps systems improve by learning from experience, whether it's recognizing
speech, driving a car, analyzing space data, or playing games.
When designing a program to learn to play checkers, there are key decisions to make about the training process:
1.Direct vs. Indirect Feedback:
1. Direct feedback involves learning from specific board states and correct moves.
2. Indirect feedback gives information based on game outcomes, requiring the system to figure out which moves were
responsible for winning or losing (this is called the "credit assignment problem").
2.Control over Training:
1. The learner can either rely on a teacher to select training examples, or the learner can generate its own examples by
playing against itself, making the learning process more autonomous.
3.Representing the Distribution of Training Data:
1. The training data should ideally represent the types of situations the system will face during the test (e.g., in the world
checkers tournament). If the training data is not representative (such as only playing against itself), the system may
perform poorly against real opponents.
In this case, the system will learn by playing games against itself, generating training data independently, and allowing for
unlimited practice.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY
3.Defining V:
1. If the game is won, the board state gets a score of +100.
2. If the game is lost, the score is -100.
3. If the game is drawn, the score is 0.
4. If the game is still ongoing, the score of the current board state is based on the best possible move to a final state.
4. Challenge of Efficient Computation:
This evaluation function requires checking all possible future moves, which is computationally expensive. The goal of the
learning process is to find an efficient version of the evaluation function that can be used quickly, which is known as function
approximation.
In summary, the program will learn an evaluation function (V) that helps it choose the best move without needing to compute
the entire game’s outcome.
• Machine learning can be thought of as searching for the best solution from a large set of possibilities (hypotheses).
• The goal is to find a hypothesis that matches the given training data and any prior knowledge.
Different learning algorithms (like for decision trees, neural networks, or linear functions) explore the hypothesis space using
strategies suited to their structure. This search process is guided by:
1.The size of the hypothesis space.
2.The amount of training data.
3.The confidence that the hypothesis will work well on new, unseen examples.
In summary, machine learning is about finding patterns (hypotheses) in data by systematically searching through possible
solutions.
Concept learning is about understanding general ideas (concepts) from specific examples. For instance, we might learn
the concept of "bird" by looking at animals labeled as either "bird" or "not bird."
Each concept can be seen as:
1.A group of objects (e.g., animals that are birds).
2.A yes-or-no function that says "true" for birds and "false" for everything else.
The goal of concept learning is to figure out this function based on examples. For example, given labeled examples of
birds and non-birds, the system learns the rules that define what makes an animal a bird.
The CANDIDATE-ELIMINATION algorithm works by refining a set of hypotheses that are consistent with the training
data. As more training examples are observed, the algorithm narrows down the space of possible hypotheses. The version
space, which is the set of all hypotheses that match the training data, converges to the correct target concept when:
1.No errors in the training data: The algorithm assumes the data is correct.
2.The target concept is within the hypothesis space: There is at least one hypothesis in the space that describes the target
concept.
• In this scenario, the learner can conduct experiments by choosing instances (queries) and then getting feedback
(classification) from an external oracle (like a teacher or nature).
• The goal is to use these queries to reduce the space of possible hypotheses (called the version space) by discriminating
between competing hypotheses.
Query Strategy:
The best query is one that splits the version space roughly in half. This means the learner should choose an instance that is
classified as positive by some hypotheses and negative by others. This way, after the oracle classifies it, the learner can
eliminate half of the hypotheses, quickly narrowing down the target concept.
For example, if the learner asks about an instance like (Sunny, Warm, Normal, Light, Warm, Same), which satisfies half of
the hypotheses, the classification (positive or negative) will help eliminate half of the remaining hypotheses in the version
space.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY