Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Module 4 ISML

ISML

Uploaded by

avg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 4 ISML

ISML

Uploaded by

avg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Intelligent Systems and Machine Learning

Algorithms

Gahan A V
Assistant Professor
Department of Electronics and Communication Engineering
Bangalore Institute of Technology
BANGALORE INSTITUTE OF TECHNOLOGY

Module 4

2
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

Machine Learning (ML) is when computers learn from data to make decisions or predictions without being explicitly
programmed.
•Examples: Spam filters, recommendations, and voice search.
•What ML is NOT: Downloading data (like Wikipedia) doesn’t mean the computer "learns." Learning involves
recognizing patterns or making predictions.
Key types of ML:
1.Supervised Learning: Learning with labeled data (e.g., email marked as "spam" or "not spam").
2.Unsupervised Learning: Finding patterns in unlabeled data (e.g., grouping similar customers).
3.Online vs. Batch Learning: Online updates continuously; batch learns in chunks.
4.Instance-based vs. Model-based: Instance uses specific examples; model creates general rules.

ML projects include collecting data, training models, evaluating them, and fine-tuning for better performance.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

What Is Machine Learning?

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Machine Learning (ML) is about teaching computers to learn from data and improve at tasks without being explicitly
programmed.
Key points:
1.Definition: ML is when a computer improves at a task (T) using experience (E) and is measured by performance (P).
1. Example: A spam filter learns to detect spam (T) by analyzing training data (E) and is judged by its accuracy
(P).
2.How it works: ML uses a training set (examples it learns from) and measures improvement over time.
3.What ML is NOT: Just having more data (like downloading Wikipedia) doesn’t mean a computer "learns." It must
apply data to improve a specific task.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Why Use Machine Learning?

Traditional programming to build a spam filter involves manually creating rules based on observed patterns in spam
emails. Here's how it works:
1.Analyze Patterns: Identify common spam features like specific words (“free,” “credit card”) or unusual sender details.
2.Write Rules: Manually code algorithms to detect these patterns.
3.Test and Refine: Test the program, adjust rules, and repeat until it performs well.
This approach is time-consuming and inflexible because new patterns require rewriting the code. In contrast, Machine
Learning automates this by letting the program learn patterns from data.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

A Machine Learning-based spam filter works differently:


1.Automatic Learning: Instead of manually writing rules, the filter learns patterns from data, identifying words or phrases (e.g.,
"4U") that appear more often in spam compared to regular emails.
2.Easier Maintenance: The program is shorter, easier to update, and more accurate.
3.Adaptive: If spammers switch from "4U" to "For U," the ML model can automatically adapt by learning from new spam data,
avoiding the need for constant manual updates.
This makes ML-based systems more robust and efficient compared to traditional programming approaches.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Machine Learning excels in tasks that are too complex for traditional programming or where no clear rules exist.
1.Spam Filtering:
1. It notices new patterns, like "For U" becoming common in spam, and adapts automatically based on user feedback.
2. No manual updates are needed, unlike rule-based systems.
2.Speech Recognition:
1. A traditional program might detect specific features (e.g., high-pitch sounds for "T"), but this doesn't scale for thousands of
words across accents, noise, and languages.
2. ML instead learns patterns from many recordings, making it effective and adaptable for diverse speech inputs.
This ability to learn and adapt makes ML ideal for dynamic and complex problems.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

Machine Learning doesn't just solve problems—it also helps humans learn:
1.Understanding Patterns: ML models can show what they’ve learned, like the words or combinations that best predict spam.
1. Example: A trained spam filter might reveal unexpected patterns, such as new spam trends.
2.Discovering Insights: By analyzing large datasets, ML can uncover hidden patterns or relationships that humans might miss.
1. This process is called data mining, and it’s used to gain deeper insights into complex problems.
ML not only automates tasks but also enhances human understanding through its ability to analyze and reveal data-driven insights.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

In summary, Machine Learning is ideal for:


1.Simplifying Rule-Based Solutions: It reduces the need for extensive manual coding and often performs better.
2.Solving Complex Problems: ML provides solutions where traditional methods fail.
3.Adapting to Change: ML systems adjust to new data in dynamic environments.
4.Revealing Insights: It helps uncover patterns and insights in large, complex datasets.
This makes ML a powerful tool for automation, problem-solving, and data analysis.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Types of Machine Learning Systems

Types of Machine Learning Systems can be classified based on the following criteria:
1.Supervision in Training:
1. Supervised Learning: Trained with labeled data (e.g., spam vs. not spam).
2. Unsupervised Learning: Trained on unlabeled data to find patterns (e.g., grouping customers).
3. Semi-supervised Learning: Uses a mix of labeled and unlabeled data.
4. Reinforcement Learning: Learns by trial and error, receiving feedback as rewards or penalties.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

2. Learning Style:
1. Online Learning: Learns incrementally as new data arrives.
2. Batch Learning: Trains on all the data at once and doesn’t update until retrained.
3. Approach to Learning:
1. Instance-Based Learning: Compares new data directly to known data (e.g., k-Nearest Neighbors).
2. Model-Based Learning: Detects patterns in training data and builds a predictive model (e.g., neural networks).
These categories can combine. For example, a spam filter could be supervised, online, and model-based, learning incrementally
with labeled data using a neural network.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
Supervised/Unsupervised Learning
•Supervised Learning:
•The system learns from labeled examples (data with correct answers).
•Think of it like a teacher showing flashcards and telling the student the right answer.
•Example: Predicting house prices based on features like size and location.

•Unsupervised Learning:
•The system gets unlabeled data and figures out patterns on its own.
•Like a detective finding hidden groups or trends in a dataset.
•Example: Grouping customers into similar segments for marketing.

•Semi-supervised Learning:
•A mix of both: some data is labeled, and the rest is not.
•Like a student learning with a few examples and figuring out the rest by guessing.
•Example: Identifying objects in photos when only a few are labeled.
DEPARTMENT
. OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

•Reinforcement Learning:
•The system learns by trial and error and gets rewards for
good actions.
•Think of training a dog: you reward it when it performs the
right trick.
•Example: Teaching a robot to navigate a maze by rewarding it
for reaching the exit

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Supervised learning

In supervised learning, the algorithm learns using training data that includes both the
input (example) and the correct output (label).
For example, a spam filter:
•You give it a bunch of emails labeled as "spam" or "not spam" (ham).
•The system studies these examples and learns to recognize patterns.
•Later, it uses what it learned to decide if a new email is spam or not.
This process of learning to classify things (like spam or not spam) is called classification.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

In regression, the goal is to predict a numeric value (like a car's price) based on given features (like mileage, age, or brand).
For example, predicting a car's price:
•You give the system many examples of cars with their features (predictors) and their actual prices (labels).
•The system learns patterns to estimate the price of a new car based on its features.
Key Points:
•Regression predicts numbers (e.g., prices).
•Some algorithms, like Logistic Regression, can also do classification. For instance, it can predict the probability of an email
being spam (e.g., 20%).
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Unsupervised learning
In unsupervised learning, the training data does not have labels (no answers are provided). The system figures out patterns or
structures on its own, like exploring a puzzle without guidance.
Key Tasks and Algorithms:
1.Clustering: Grouping similar items together.
1. Examples:
1. K-Means: Groups data into clusters based on similarity.
2. DBSCAN: Finds clusters of any shape.
3. HCA (Hierarchical Cluster Analysis): Creates a tree of clusters.
2.Anomaly Detection & Novelty Detection: Spotting unusual data points.
1. Examples:
1. One-class SVM: Identifies outliers.
2. Isolation Forest: Detects anomalies by isolating them in the data.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

3. Visualization & Dimensionality Reduction: Simplifies data to make it easier to visualize or work with.
1. Examples:
1. PCA (Principal Component Analysis): Reduces data dimensions.
2. t-SNE: Visualizes data in 2D or 3D.
4. Association Rule Learning: Finding relationships between items in data.
1. Examples:
1. Apriori: Discovers itemset rules (e.g., "People who buy bread often buy butter").
2. Eclat: Similar to Apriori but uses a different approach to find associations.
Unsupervised learning is like finding hidden patterns or insights in raw data without any predefined guidance!

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

In unsupervised learning, like clustering, the system organizes data into groups without any labels or predefined categories.
Example: Blog Visitors
You have data about your blog's visitors but don't know much about their preferences.
•A clustering algorithm analyzes the data and finds patterns, grouping visitors with similar traits.
•For example, it might identify:
• 40% are males who love comic books and read in the evenings.
• 20% are young sci-fi fans who visit on weekends.
Using this insight, you can:
•Write posts tailored for each group.
•If you use a hierarchical clustering algorithm, it can break these groups into smaller, more specific subgroups, helping you
refine your targeting even further.
This way, you better understand and engage with your audience!

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Visualization algorithms are a type of unsupervised learning that help you understand complex, unlabeled data by converting it
into a simpler form, like a 2D or 3D plot.
How It Works:
•You give the algorithm a lot of data.
•It processes the data and creates a visual representation, trying to maintain the structure and relationships within the data.
•For example, if there are clusters in the data, it ensures they stay distinct in the visualization.
Why It’s Useful:
•Helps you see how data is organized.
•Makes it easier to spot patterns or trends you didn’t know existed.
Example:
•t-SNE or PCA might turn a large dataset of customer behavior into a 2D graph where similar customers are grouped together.
This helps you understand your audience better or find new insights!

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Anomaly detection is an unsupervised learning task where the system identifies unusual or rare data points that don’t fit the
normal pattern.
Examples:
•Detecting fraudulent credit card transactions, Spotting manufacturing defects in products., Removing outliers from a
dataset to improve analysis.
How It Works:
•During training, the system mostly sees normal data and learns its typical patterns.
•When it encounters new data, it checks if it looks normal or if it might be an anomaly.
Novelty Detection vs. Anomaly Detection:
•Novelty Detection:
• Trained with only normal data.
• Used to detect completely new or unusual patterns.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

•Anomaly Detection:
• Tolerates a small percentage of anomalies in the
training set.
• Designed to spot rare outliers in a broader context.
Both tasks are critical in areas like fraud prevention, quality
control, and data preprocessing.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Semisupervised learning

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Semisupervised learning is a type of machine learning where the system learns from a mix of labeled and unlabeled data, often
with a small amount of labeled data and a large amount of unlabeled data.
Example: Google Photos
1.Unsupervised part:
1. The system groups photos by recognizing patterns, like identifying that the same person appears in photos 1, 5, and 11
(clustering).
2.Supervised part:
1. You provide a label for each person (e.g., "This is John").
2. The system learns from your labels and identifies the same person in all other photos.
Why Use Semisupervised Learning?
•Labeled data is expensive and time-consuming to create, but unlabeled data is abundant.
•Combining both helps improve learning with minimal labeled data.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Example Algorithm:
•Deep Belief Networks (DBNs):
• Start with unsupervised learning to train parts of the system (e.g., Restricted Boltzmann Machines).
• Then, refine the entire system with supervised learning for better accuracy.
Semisupervised learning bridges the gap between unsupervised and supervised learning, making it a powerful tool in
practical applications.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Reinforcement Learning

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Reinforcement Learning (RL) is a unique type of learning where an agent learns by interacting with its environment and
receiving feedback in the form of rewards (positive) or penalties (negative).
Key Concepts:
•Agent: The learner or decision-maker.
•Environment: The system the agent interacts with.
•Actions: Choices the agent can make.
•Policy: A strategy that tells the agent which action to take in a given situation.
•Goal: Maximize total rewards over time by learning the best policy.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Example:
•A robot learning to walk:
• It tries different movements, gets rewards for successful steps, and penalties for falling.
• Over time, it learns the best way to walk steadily.
•AlphaGo (by DeepMind):
• Learned to play the game of Go by analyzing millions of games and practicing against itself.
• During actual matches (e.g., against world champion Ke Jie), it followed the policy it had already learned to make
winning moves.
Why It’s Unique:
Reinforcement Learning is about trial and error, exploring actions to learn the best strategy without explicit instructions,
making it ideal for complex tasks like robotics, games, and autonomous systems.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
Batch and Online Learning
Machine Learning systems can also be classified based on how they handle incoming data, specifically whether they learn
incrementally or require training on the entire dataset at once.
1. Batch Learning (Offline Learning)
•How it works:
• The system is trained on all available data at once.
• After training, it is deployed and no longer learns from new data.
• To update the system with new data, you must retrain it from scratch with the full dataset (old + new data).
•Advantages:
• Simple to implement.
• Can produce highly optimized models.
•Disadvantages:
• Requires significant time and computing resources for retraining.
• Cannot adapt quickly to new or changing data.
• Not practical for scenarios with rapidly changing information or massive datasets.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

2. Online Learning
•How it works:
• The system learns incrementally, processing data instances one at a time or in small groups (mini-batches).
• It can adapt to new data on the fly, making it suitable for real-time or continuously changing environments.
•Advantages:
• Efficient and fast for large datasets or streaming data.
• Requires less memory as old data can be discarded after learning.
• Ideal for scenarios like stock price prediction or autonomous systems with limited resources.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
•Disadvantages:
• Requires careful tuning of the learning rate:
• High learning rate: Quickly adapts but may forget older data.
• Low learning rate: Retains past knowledge but adapts more slowly.
• Vulnerable to bad data, which can degrade performance over time.
• Needs close monitoring to handle issues like noise, outliers, or malicious data.
Special Considerations:
•Out-of-Core Learning:
• For very large datasets, the system processes data in chunks, training incrementally while handling memory limitations.
•Monitoring:
• Use anomaly detection to monitor incoming data and performance.
• Be prepared to pause or revert to a previous model state if bad data causes issues.
Summary:
•Batch Learning: Best for static systems with stable data that doesn’t change frequently.
•Online Learning: Ideal for dynamic environments with continuously changing or large-scale data streams.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Instance-Based Versus Model-Based Learning

Generalization in Machine Learning refers to the ability of a system to apply what it has learned from training data to new,
unseen examples. The goal is not just to perform well on the training data, but to make accurate predictions on new data as
well. There are two primary approaches to generalization:

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

1. Instance-based Learning
•How it works:
•The system "learns by heart" and memorizes the training examples.
•To make predictions, the system compares new instances to the training examples using a similarity measure.
•For example, a spam filter might flag an email as spam if it is similar to previously flagged spam emails.
•The system doesn't build a model but instead relies on finding the closest matches in the training data.
•Example:
•If you have a set of known spam emails, the filter will flag a new email as spam if it shares many common words with the
spam emails.
•Advantages:
•Simple and intuitive.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

2. Model-based Learning
•How it works:
•The system learns a model from the training data, which can then be used to make predictions on new data.
•Instead of memorizing the data, the system tries to generalize by creating a model (like a decision tree, neural
network, or linear regression).
•This model can then be applied to new, unseen data to make predictions.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

•Advantages:
•More scalable and efficient for large datasets.
•Builds a more flexible, generalizable model that can handle variations in new data.
•Disadvantages:
•The model may not always generalize well, especially if it's too complex (leading to overfitting) or too simple (leading to
underfitting).

Summary:
•Instance-based Learning: Directly compares new data to training data using similarity, useful for simpler tasks but can be
inefficient and less flexible.
•Model-based Learning: Builds a model from training data and uses it for predictions, more scalable and flexible for complex
tasks.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Main Challenges of Machine Learning

"Bad data" refers to data that is incomplete, incorrect, or poorly structured, which can negatively affect the
performance of a learning algorithm. Examples include:
1.Missing values: Some data points might have missing information.
2.Incorrect labels: In supervised learning, mislabeled data can confuse the model.
3.Noise: Data with a lot of irrelevant or random information can lead to poor results.
4.Outliers: Extreme values that don't represent the majority of the data can distort the model.
These issues make it harder for the algorithm to learn patterns accurately, leading to poor predictions.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Insufficient Quantity of Training Data

• A toddler can quickly learn what an apple is just by being shown one and hearing the word "apple." They can then
recognize apples of different colors and shapes.

• Machine learning, however, needs much more data to learn. For even simple problems, algorithms typically
require thousands of examples to perform well.

• For complex tasks like recognizing images or speech, millions of examples are often needed unless you can use
parts of an already trained model to help.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Nonrepresentative Training Data


• In machine learning, generalization refers to the ability of a model to perform well on new, unseen data.
• Whether you're using instance-based learning (where predictions are made based on specific examples) or model-based
learning (where a model is trained to predict outcomes based on patterns in the data), the goal is to ensure that the system
can make accurate predictions for new cases, not just the data it was trained on.
Example:
•Imagine you train a model to predict the economic status of countries using data from a set of countries.
•If the training data is incomplete (e.g., a few countries are missing), the model might not generalize well to those countries.
Impact of Missing Data:
•Missing countries: If you add the missing countries to the dataset, the model may improve by learning patterns that better
represent all the countries, rather than just those in the original dataset.

By adding missing data or making sure that the training dataset is representative of the variety of new cases you want to
generalize to, you can help the model generalize better to new data, avoiding biases or gaps in its predictions.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

When you train a linear model on incomplete or nonrepresentative data, the model may not generalize well to new, unseen
cases. This happens because the model is trying to fit a simple line to data that doesn’t actually follow a simple trend.
Example:
•You train a linear model on data about countries' happiness versus their income.
•The solid line represents the updated model after adding the missing countries, while the dotted line is the old model trained on
incomplete data.
•The new model reveals that:
• Very rich countries don’t seem much happier than moderately rich ones.
• Some poor countries appear happier than many rich countries.
This is a key problem: the old model, based on incomplete data, over-simplifies the relationship between income and happiness,
and fails to capture the complexity of the actual data.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Key Takeaways:
1.Nonrepresentative Data: Using a training set that is missing data or doesn't reflect the diversity of new cases will likely
lead to a model that doesn't make accurate predictions for certain groups (e.g., very rich or very poor countries).
2.Sampling Bias: If the data collection process is flawed or incomplete, it can introduce sampling bias:
1. Small sample: Leads to sampling noise, where the data might not represent the full range of possibilities.
2. Large sample with flawed sampling: Even a large dataset can be nonrepresentative if the method of sampling is
biased.
3.Representative Training Set: To build a reliable model, the training set must be representative of the full range of
data the model will encounter in the real world. Otherwise, the model may struggle to generalize correctly to new or
unusual cases.

In summary, it's essential to ensure that your training data covers the full spectrum of scenarios, especially when predicting
real-world outcomes that can be influenced by many factors.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Poor-Quality Data

When training a machine learning model, the quality of your training data plays a critical role in the performance of your
model. If the data is noisy, contains errors, or has outliers, the model will have a harder time identifying the true underlying
patterns, and its performance may suffer.
Key Steps to Improve Data Quality:
1.Dealing with Outliers:
1. Outliers are data points that are significantly different from others and may distort the model.
2. If outliers are due to errors, they can often be discarded or corrected manually.
3. If they are genuine, you may need to decide whether they should be kept based on their relevance to the problem.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

2. Handling Missing Data:


If some data is missing, such as customer age in a dataset, there are several options:
1. Ignore the attribute: If the feature is not crucial, you may leave it out.
2. Ignore missing instances: You may remove rows with missing data, but this can reduce the dataset size.
3. Impute missing values: Replace missing values with meaningful values, such as the median or mean for numerical
features.
4. Use multiple models: Train one model with the missing feature and one without, comparing performance.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

3. Cleaning the Data:


1. Data cleaning can be time-consuming, but it is often the most critical part of a data science project.
2. This process involves identifying and correcting errors, dealing with missing values, and ensuring that the
data is as accurate and complete as possible.
Why It Matters:
•Poor-quality data introduces bias, which can cause the model to misinterpret patterns and lead to inaccurate
predictions.
•Clean data helps ensure that the model is trained on meaningful information, making it more likely to generalize
well to new cases.
In short, spending time cleaning your data is often the most effective way to improve model performance.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
Irrelevant Features

The saying "garbage in, garbage out" highlights that the quality of the training data directly affects how well a
machine learning model will perform. If the data is full of errors or irrelevant information, the model will struggle to
learn useful patterns.
A critical part of building a successful machine learning system is ensuring that the training data contains the right
information (relevant features). This is where feature engineering comes in, which involves:
1. Feature Selection:
•This means choosing the most useful features from the existing data. For example, in a dataset predicting house
prices, features like square footage and location are useful, while color of the house might not be.
2. Feature Extraction:
•This involves combining or transforming existing features to create more useful ones. For example, you might
combine height and weight into a new feature, body mass index (BMI), which could be more informative for a health
prediction model.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

3. Creating New Features:


•Sometimes, it's useful to gather new data to create features that weren't available before. For instance, you might
add seasonality to predict sales, based on whether it's a holiday season or not.

Why It Matters:
•Good features lead to better models, so spending time on feature engineering is crucial.
•If your features are irrelevant or poorly chosen, the model will not learn useful patterns, leading to poor predictions or
performance.
Next, after preparing the data, it’s important to also ensure that the algorithms you use are suitable for the task. Just having
clean and relevant data isn’t enough if the algorithm is flawed.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
Overfitting the Training Data

• Overfitting in machine learning is when a model learns the details and noise in the training data too well, but fails to
generalize to new, unseen data.

• It's like memorizing answers for a test without understanding the material — you may do well on the test you studied
for, but struggle with any new questions.
Example:
• Imagine you're trying to predict life satisfaction using a model. If you use a high-degree polynomial (a very complex
model), it might fit the training data very closely, showing perfect results on the data you trained it with.

• But, this can lead to overfitting, as it captures every small fluctuation in the data, including noise or random variations
that don’t reflect real-world patterns.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

In the Figure 1-22 example:


•The high-degree polynomial model performs better on the training data than a simple linear model, but it is too
complex.
•Problem: This model might look great for the data it was trained on, but it won't work well on new, unseen data
because it learned the noise in the data, not the general trend.
Why It Matters:
•Overfitting can lead to poor performance when the model is used in real-life scenarios, as it has "memorized" the
training data rather than learning to generalize.
•It's important to use simpler models or techniques (like cross-validation) to prevent overfitting and ensure the model
can generalize well to new data.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Underfitting the Training Data

Underfitting occurs when a model is too simple to capture the underlying patterns in the data. It happens when the model doesn't
learn enough from the training data, resulting in poor performance both on the training data and new data.
Example:
If you try to predict life satisfaction using a simple linear model (a straight line), the model may not capture the complexity of
reality, since life satisfaction depends on many factors (e.g., income, health, social relationships). The linear model will give
inaccurate predictions because it’s too simplistic.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
Solutions to Fix Underfitting:
1.Use a more powerful model:
1. Choose a more complex model that has more parameters and can capture more intricate patterns in the data. For
example, a polynomial or decision tree model might better fit the data.
2.Improve the features:
1. Provide better features to the model. This could involve feature engineering like combining existing features or creating
new, more informative ones.
3.Relax model constraints:
1. Reduce the regularization (a technique that prevents overfitting by penalizing complexity). If the regularization is too
strong, it can overly limit the model’s ability to fit the data. Reducing the regularization allows the model to learn more
from the data.
Why It Matters:
•Underfitting leads to a model that doesn't perform well because it can't learn the true structure of the data. To improve it, you
need to make the model more powerful, provide it with better features, or allow it more flexibility to learn from the data.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

WELL-POSED LEARNING PROBLEMS

Definition: A computer program "learns" if it gets better at a task over time by using experience (E), and its performance is
measured by some performance measure (P).

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Examples of Machine Learning Applications:


1.Learning to recognize spoken words:
1. Speech recognition systems, like SPHINX, use machine learning to recognize sounds and words, adjusting to different
speakers, microphones, and environments.
2.Learning to drive an autonomous vehicle:
1. Machine learning helps self-driving cars, like the ALVINN system, learn how to drive by itself, adapting to different roads
and traffic conditions.
3.Learning to classify astronomical structures:
1. NASA uses machine learning to classify celestial objects (like stars or galaxies) from huge image databases. For example,
they use decision trees to automatically identify objects in the Sky Survey.
4.Learning to play world-class backgammon:
1. Programs like TD-GAMMON learn to play games like backgammon by playing millions of games against themselves.
These programs now compete with top human players.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Summary:
In these examples, machine learning helps systems improve by learning from experience, whether it's recognizing
speech, driving a car, analyzing space data, or playing games.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

To define a learning problem, we need to specify three key features:


1.Class of Tasks (T): The type of task the system is trying to learn.
2.Performance Measure (P): How the system's success is measured for the task.
3.Source of Experience (E): The experience the system gains while learning.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DESIGNING A LEARNING SYSTEM

To design a checkers-playing program for a world tournament:


1.Task (T): Play checkers and make winning moves.
2.Performance Measure (P): Percent of games won in the tournament.
3.Experience (E): Learn by playing practice games against itself.
The program will improve through reinforcement learning, where it gets feedback (win/loss) and adjusts its strategy
accordingly. Over time, it will refine its tactics to perform well in the tournament.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
Choosing the Training Experience

When designing a program to learn to play checkers, there are key decisions to make about the training process:
1.Direct vs. Indirect Feedback:
1. Direct feedback involves learning from specific board states and correct moves.
2. Indirect feedback gives information based on game outcomes, requiring the system to figure out which moves were
responsible for winning or losing (this is called the "credit assignment problem").
2.Control over Training:
1. The learner can either rely on a teacher to select training examples, or the learner can generate its own examples by
playing against itself, making the learning process more autonomous.
3.Representing the Distribution of Training Data:
1. The training data should ideally represent the types of situations the system will face during the test (e.g., in the world
checkers tournament). If the training data is not representative (such as only playing against itself), the system may
perform poorly against real opponents.
In this case, the system will learn by playing games against itself, generating training data independently, and allowing for
unlimited practice.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Choosing the Target Function


When designing a program to play checkers, the next step is to decide what the program will learn and how it will use that
knowledge.
1.Learning the Best Move (ChooseMove):
The program can learn a function called ChooseMove that selects the best move for any given board state. However, learning this
directly can be difficult.
2.Alternative: Learning an Evaluation Function (V):
A simpler approach is to teach the program an evaluation function (V) that gives a numerical score to each board state, where
higher scores represent better positions. The program can then use this function to evaluate potential moves by checking which
move leads to the best board state.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

3.Defining V:
1. If the game is won, the board state gets a score of +100.
2. If the game is lost, the score is -100.
3. If the game is drawn, the score is 0.
4. If the game is still ongoing, the score of the current board state is based on the best possible move to a final state.
4. Challenge of Efficient Computation:
This evaluation function requires checking all possible future moves, which is computationally expensive. The goal of the
learning process is to find an efficient version of the evaluation function that can be used quickly, which is known as function
approximation.
In summary, the program will learn an evaluation function (V) that helps it choose the best move without needing to compute
the entire game’s outcome.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
Choosing a Representation for the Target Function

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Choosing a Function Approximation Algorithm:


1. This step involves selecting an algorithm to learn the target function VV (or cc) based on the training data.
Common algorithms include:
1. Linear Regression: For learning linear relationships between features and target values.
2. Gradient Descent: To adjust the weights iteratively to minimize errors in predictions.
3. Decision Trees or Neural Networks: For more complex relationships.
2. The algorithm chosen should be able to effectively approximate the function that predicts the value of board states.
Estimating Training Values:
1. Training values are the target values that the program tries to predict. In the case of the checkers program, these
are the values of board states as defined by the target function VV.
2. The program estimates these values based on the training experience, such as playing games against itself and
evaluating the final board states (e.g., win = 100, lose = -100, draw = 0).

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Adjusting the Weights:


1. The learning algorithm adjusts the weights w0w0​ to w6w6​ based on the difference between the predicted values and
the actual training values (the correct evaluation of the board state).
2. For example, if the predicted value for a board state is too high or low, the weights are adjusted to bring the prediction
closer to the true value.
3. This is typically done through a process like gradient descent, where the weights are updated in small steps to
minimize errors in predictions.
In summary:
•Choosing an algorithm helps decide how the target function will be learned.
•Estimating training values involves using game data to assign correct target values to board states.
•Adjusting the weights ensures that the program's evaluation function improves over time based on feedback.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

The Final Design

The checkers learning system consists of four modules:


1.Performance System: Plays the game using the learned function to choose moves.
2.Critic: Reviews the game, providing feedback on the quality of moves and generating training examples.
3.Generalizer: Uses the training examples to learn a function that can evaluate any board state.
4.Experiment Generator: Chooses new practice problems for the system to learn from, helping improve learning
efficiency.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
PERSPECTIVES AND ISSUES IN MACHINE LEARNING

• Machine learning can be thought of as searching for the best solution from a large set of possibilities (hypotheses).
• The goal is to find a hypothesis that matches the given training data and any prior knowledge.

Different learning algorithms (like for decision trees, neural networks, or linear functions) explore the hypothesis space using
strategies suited to their structure. This search process is guided by:
1.The size of the hypothesis space.
2.The amount of training data.
3.The confidence that the hypothesis will work well on new, unseen examples.

In summary, machine learning is about finding patterns (hypotheses) in data by systematically searching through possible
solutions.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

Concept learning is about understanding general ideas (concepts) from specific examples. For instance, we might learn
the concept of "bird" by looking at animals labeled as either "bird" or "not bird."
Each concept can be seen as:
1.A group of objects (e.g., animals that are birds).
2.A yes-or-no function that says "true" for birds and "false" for everything else.
The goal of concept learning is to figure out this function based on examples. For example, given labeled examples of
birds and non-birds, the system learns the rules that define what makes an animal a bird.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
VERSION SPACES AND THE CANDIDATE ELIMINATION ALGORITHM

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

REMARKS ON VERSION SPACES AND CANDIDATE-ELIMINATION

Will the CANDIDATE-ELIMINATION Algorithm Converge to the Correct Hypothesis?

The CANDIDATE-ELIMINATION algorithm works by refining a set of hypotheses that are consistent with the training
data. As more training examples are observed, the algorithm narrows down the space of possible hypotheses. The version
space, which is the set of all hypotheses that match the training data, converges to the correct target concept when:
1.No errors in the training data: The algorithm assumes the data is correct.
2.The target concept is within the hypothesis space: There is at least one hypothesis in the space that describes the target
concept.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY

What happens with errors in the training data?


If the training data contains mistakes, such as a positive example incorrectly labeled as negative, the algorithm might
remove the correct target concept from the version space. This happens because it will eliminate any hypotheses
inconsistent with the incorrect training example.
However, if enough additional training data is provided, the algorithm might eventually detect this inconsistency and notice
that the version space is empty, indicating there is no hypothesis consistent with all the training data. This empty space can
signal that there is an error or an issue with the hypothesis space.
Summary:
•If training data is correct and the target concept is in the hypothesis space, the algorithm will find the correct concept.
•With errors in training data, the algorithm might eliminate the correct concept, but eventually, it could recognize this if
enough data is provided.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
What Training Example Should the Learner Request Next?

• In this scenario, the learner can conduct experiments by choosing instances (queries) and then getting feedback
(classification) from an external oracle (like a teacher or nature).

• The goal is to use these queries to reduce the space of possible hypotheses (called the version space) by discriminating
between competing hypotheses.
Query Strategy:
The best query is one that splits the version space roughly in half. This means the learner should choose an instance that is
classified as positive by some hypotheses and negative by others. This way, after the oracle classifies it, the learner can
eliminate half of the hypotheses, quickly narrowing down the target concept.

For example, if the learner asks about an instance like (Sunny, Warm, Normal, Light, Warm, Same), which satisfies half of
the hypotheses, the classification (positive or negative) will help eliminate half of the remaining hypotheses in the version
space.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
BANGALORE INSTITUTE OF TECHNOLOGY

Optimal Query Strategy:


•The ideal strategy is to select instances that split the hypotheses into two equal groups (those predicting "yes" and those
predicting "no").
•This approach is similar to playing the game "20 Questions", where you ask yes/no questions that split the possibilities
in half.
•If the learner can always generate instances that split the version space in half, the number of queries needed to find the
correct target concept will be minimal: about log⁡
2(size of version space)\log_2(\text{size of version
space})log2​(size of version space).
However, in some cases, it might not be possible to perfectly split the version space, so more queries may be required than
the ideal log⁡
2\log_2log2 number.

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

You might also like