Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
192 views

Machine Learning Notes Unit 1

The document provides an overview of the Find-S algorithm for concept learning in machine learning. [1] Find-S starts with the most specific hypothesis and generalizes it incrementally to account for each positive training example, ignoring negative examples. [2] It replaces attribute constraints in the hypothesis with more general ones if the current hypothesis does not account for a positive example. [3] The final output is the maximally specific hypothesis that is consistent with all positive examples in the training set.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views

Machine Learning Notes Unit 1

The document provides an overview of the Find-S algorithm for concept learning in machine learning. [1] Find-S starts with the most specific hypothesis and generalizes it incrementally to account for each positive training example, ignoring negative examples. [2] It replaces attribute constraints in the hypothesis with more general ones if the current hypothesis does not account for a positive example. [3] The final output is the maximally specific hypothesis that is consistent with all positive examples in the training set.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

SREYAS INSTITUTE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


III B.TECH II SEMESTER
NOTES MACHINE LEARNING
UNIT – I
Chapter-1: Introduction - Well-posed learning problems, designing a learning
system, Perspectives, and issues in machine learning

1
2
3
4
5
6
Designing a Learning System:

To Design a Well Learning system must follow the below steps:


1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target Function
4. Choosing a Function Approximation Algorithm
5. The Final Design:
We will address the following in the final design:
 The Performance Systems,
 The Critics.
 The Generalizer and
 The Experiment Generator

7
8
UNIT-1: CHAPTER-2
CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC 0RDERING
1. Introduction,
2. A Concept Learning Task,
3. Concept Learning As Search,
4. Find-S: Finding A Maximally Specific Hypothesis,
5. Version Spaces, and The Candidate Elimination Algorithm,
6. Remarks on Version Spaces and Candidate Elimination, Inductive Bias.

INTRODUCTION
1. Concept learning can be formulated as a problem of searching through a
predefined space of potential hypotheses for the hypothesis that best fits the
training examples.
2. In many cases, this search can be efficiently organized by taking advantage of
a naturally occurring structure over the hypothesis space-a-general-to-specific
ordering of hypotheses.
3. We consider the problem of automatically inferring the general definition of
some concept, given examples labeled as members or nonmembers of the
concept.
4. This task is commonly referred to as concept learning or approximating a
Boolean-valued function from examples.
Concept learning: Inferring a boolean-valued function from training examples
of its input and output.

2.2 A CONCEPT LEARNING AS A TASK


Water sport." Table 2.1 describes a set of example days, each represented by a set of
attributes. The attribute EnjoySport indicates whether or not Aldo enjoys his favorite
water sport on this day. The task is to learn to predict the value of EnjoySport for an
arbitrary day, based on the values of its other attributes.
In particular, let each hypothesis be a vector of six constraints, specifying the values
of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.

9
For each attribute, the hypothesis will either 0 indicate by a "?' that any value is
acceptable for this attribute, 0 specify a single required value (e.g., Warm) for the
attribute, or 0 indicate by a "0" that no value is acceptable.

The most general hypothesis-that every day is a positive example-is represented by


(?, ?, ?, ?, ?, ?) and the most specific possible hypothesis-that no day is a positive
example-is represented by (0,0,0,0,0,0)
To summarize, the EnjoySport concept learning task requires learning the set of days
for which EnjoySport = yes
2.1 Notation
The concept or function to be learned is called the target concept, which we denote
by c. In general, c can be any boolean-valued function defined over the instances
X; that is, c: X-- {O, 1).
In the current example, the target concept corresponds to the value of the attribute
EnjoySport (i.e., c(x) = 1; if EnjoySport = Yes, and c(x) = 0; if EnjoySport =
No).

10
2.2 The INDUCTIVE LEARNING HYPOTHESIS:
Any hypothesis found to approximate the target function well over a sufficiently
large set of training examples will also approximate the target function well over
other unobserved examples.
2.3. Concept Learning As Search

11
12
2.4. Find-S: Finding a Maximally Specific Hypothesis

ML | Find S Algorithm
Introduction:

The find-S algorithm is a basic concept learning algorithm in machine learning. The
find-S algorithm finds the most specific hypothesis that fits all the positive examples.
We have to note here that the algorithm considers only those positive training
example.
The find-S algorithm starts with the most specific hypothesis and generalizes this
hypothesis each time it fails to classify an observed positive training data.
Hence, the Find-S algorithm moves from the most specific hypothesis to the most
general hypothesis.

Important Representation:
The important Machine Learning Concepts with? indicates that any value is
acceptable for the attribute.
1. specify a single required value ( e.g., Cold ) for the attribute.
2. Φ indicates that no value is acceptable.
3. The most general hypothesis is represented by: {?,?,?, ?, ?, ?}
4. The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Steps Involved In Find-S :


Start with the most specific hypothesis.
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
1. Take the next example and if it is negative, then no changes occur to the
hypothesis.
2. If the example is positive and we find that our initial hypothesis is too specific
then we update our current hypothesis to a general condition.
3. Keep repeating the above steps till all the training examples are complete.
4. After we have completed all the training examples we will have the final
hypothesis when can use to classify the new examples.

Example :
Consider the following data set having the data about which particular seeds are
poisonous.

13
First, we consider the hypothesis to be a more specific hypothesis. Hence, our
hypothesis would be :
h = {ϕ, ϕ, ϕ, ϕ}
Consider example 1 :
The data in example 1 is {GREEN, HARD, NO, WRINKLED }.
We see that our initial hypothesis is more specific and we have to generalize it for
this example. Hence, the hypothesis becomes : h = { GREEN, HARD, NO,
WRINKLED }
Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }

Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes :
h = { ?, HARD, NO, WRINKLED }
Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We

14
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes:
h = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have the
general condition, example 6 and example 7 would result in the same hypotheses
with all general attributes.
h = { ?, ?, ?, ? }
Hence, for the given data the final hypothesis would be:
Final Hypothesis: h = { ?, ?, ?, ? }

FIND S Algorithm – Maximally Specific Hypothesis Solved Example

FIND S Algorithm is used to find the Maximally Specific Hypothesis. Using the
Find-S algorithm gives a single maximally specific hypothesis for the given set of
training examples.

Find-S Algorithm Machine Learning

1. Initialize h to the most specific hypothesis in H


2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output the hypothesis h

Solved Numerical Example: Step – 1 of Find-S Algorithm

15
Step 2 of Find-S Algorithm First iteration

h0 = (ø, ø, ø, ø, ø, ø, ø)
X1 = <Sunny, Warm, Normal, Strong, Warm, Same>
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

Step 2 of Find-S Algorithm Second iteration

h1 = <Sunny, Warm, Normal, Strong, Warm, Same>


X2 = <Sunny, Warm, High, Strong, Warm, Same>
h2 = <Sunny, Warm, ?, Strong, Warm, Same>

Step 2 of Find-S Algorithm Third iteration

h2 = <Sunny, Warm, ?, Strong, Warm, Same>


X3 = <Rainy, Cold, High, Strong, Warm, Change> – No
X3 is Negative example Hence ignored
h3 = <Sunny, Warm, ?, Strong, Warm, Same>

Step 2 of Find-S Algorithm Fourth iteration

h3 = <Sunny, Warm, ?, Strong, Warm, Same>


X4 = <Sunny, Warm, High, Strong, Cool, Change>
h4 = <Sunny, Warm, ?, Strong, ?, ?>

Step 3

The final maximally specific hypothesis is <Sunny, Warm, ?, Strong, ?, ?>

PROPERTIES: FIND – S:
The Find-S algorithm only considers the positive examples and eliminates
negative examples.
For each positive example, the algorithm checks for each attribute in the example.
If the attribute value is the same as the hypothesis value, the algorithm moves on
without any changes

There are a few limitations of the Find-S algorithm listed down below:

1. There is no way to determine if the hypothesis is consistent throughout the


data.

16
2. Inconsistent training sets can actually mislead the Find-S algorithm, since it
ignores the negative examples.
3. Find-S algorithm does not provide a backtracking technique to determine the
best possible changes that could be done to improve the resulting hypothesis.

How Does It Work?

1. The process starts with initializing ‘h’ with the most specific hypothesis,
generally, it is the first positive example in the data set.
2. We check for each positive example. If the example is negative, we will move
on to the next example but if it is a positive example we will consider it for
the next step.
3. We will check if each attribute in the example is equal to the hypothesis value.
4. If the value matches, then no changes are made.
5. If the value does not match, the value is changed to ‘?’.
6. We do this until we reach the last positive example in the data set.

What is the unanswered questions by Find s algorithm?


 Has the learner converged to the correct target concept? ...
 Why prefer the most specific hypothesis? ...
 Are the training examples consistent? ...
 What if there are several maximally specific consistent hypotheses?

17
2.5. Version Spaces, and The Candidate Elimination Algorithm

Version Space
Terms Used:
 Concept learning: Concept learning is basically learning task of the machine (Learn by Train
data)
 General Hypothesis: Not Specifying features to learn the machine.
 G = {‘?’, ‘?’,’?’, ’?’…}: Number of attributes
 Specific Hypothesis: Specifying features to learn machine (Specific feature)
 S= {‘pi’,’pi’,’pi’…}: Number of pi depends on number of attributes.

Version Space: It is intermediate of general hypothesis and Specific


hypothesis. It not only just written one hypothesis but a set of all possible
hypotheses which are consistent with the target concept based on training
data-set.

Candidate Elimination Algorithm


 The candidate elimination algorithm incrementally builds the version space given a
hypothesis space H and a set E of examples.
 The examples are added one by one; each example possibly shrinks the version space by
removing the hypotheses that are inconsistent with the example.
 The candidate elimination algorithm does this by updating the general and specific
boundary for each new example.

 You can consider this as an extended form of Find-S algorithm.


 Consider both positive and negative examples.
 Actually, positive examples are used here as Find-S algorithm (Basically they are generalizing
from the specification).
 While the negative example is specified from generalize form.

Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific
Hypothesis.
Step3: For each training example
Step4: If example is positive example
If attribute_ value == hypothesis_ value:
Do nothing
else: replace attribute value with '?'
(Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.

18
Example:
Consider the dataset given below:

Algorithmic steps:
Initially : G = [[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?], [?,?,?,?,?,?],
[?,?,?,?,?,?], [?,?,?,?,?,?]]
S = [Null, Null, Null, Null, Null, Null]
For instance 1 : <'sunny','warm','normal','strong','warm','same'> and positive
output.
G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']
For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.
G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same']
For instance 3 : <'rainy','cold','high','strong','warm ','change'> and negative output.
G3 =[['sunny',?,?,?,?,?],[?,'warm',?,?,?,?],[?,?,?, ?, ?,
?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,'same']]
S3 = S2

For instance 4 : <'sunny','warm','high','strong','cool','change'> and positive output.


G4 = G3
S4 = ['sunny','warm',?,'strong', ?, ?]
At last, by synchronizing the G4 and S4 algorithm produce the output.

Output :
G= ['sunny',?,?,?,?,?],[?,'warm',?,?,?,?]]
S= ['sunny', 'warm', ?, 'strong', ?, ?]

19
2.6 . Remarks on Version Spaces and Candidate Elimination

Version spaces and the CANDIDATE-ELIMINATION algorithm provide a useful conceptual


framework for studying concept learning. However, this learning algorithm is not robust to noisy
data or to situations in which the unknown target concept is not expressible in the provided
hypothesis space.
Remarks. 1 Will the CANDIDATE-ELIMINATION Algorithm Converge to the Correct
Hypothesis?

The version space learned by the CANDIDATE-ELIMINATION algorithm will


converge toward the hypothesis that correctly describes the target concept, provided
(1) There are no errors in the training examples, and
(2) There is some hypothesis in h that correctly describes the target
concept.
The target concept is exactly learned when the S and G boundary sets converge to a
single, identical, hypothesis.

Remarks. 2: What Training Example Should the Learner Request Next?

Remarks. 3: How Can Partially Learned Concepts Be Used?

2.7 Inductive Bias.

Inductive bias or the inherent bias of the algorithms are assumptions that are made
by the learning algorithm to form a hypothesis or a generalization beyond the set of
training instances in order to classify unobserved data. It involves a preference for
a simpler hypothesis that best fits the data.

• Given:
– a concept learning algorithm L for a set of instances X
– a concept c defined over X
– a set of training examples for c: Dc = {áx, c(x)ñ}
– L(xi, Dc) outcome of classification of xi after learning
• Inductive inference ( ≻ ):
Dc Ù xi ≻ L(xi, Dc)
• The inductive bias is defined as a minimal set of assumptions B, such that
(|− for deduction)
" (xi Î X) [ (B Ù Dc Ù xi) |− L(xi, Dc) ]

20
Inductive bias of Candidate- Elimination
 Assume L is defined as follows:
 compute VSH,D
 classify new instance by complete agreement of all the hypotheses
in VSH,D
 Then the inductive bias of Candidate-Elimination is simply
B º (c Î H)
 In fact by assuming c Î H:
1. c Î VSH,D , in fact VSH,D includes all hypotheses in H consistent with D
2. L(xi, Dc) outputs a classification "by complete agreement", hence any
hypothesis, including c, outputs L(xi, Dc)
Inductive system

Equivalent deductive system

Each learner has an inductive bias


• Three learner with three different inductive bias:
1. Rote learner: no inductive bias, just stores examples and is able to
classify only previously observed examples
2. Candidate Elimination: the concept is a conjunction of constraints.
3. Find-S: the concept is in H (a conjunction of constraints) plus "all
instances are negative unless seen as positive examples” (stronger
bias)
The stronger the bias, greater the ability to generalize and classify new
instances (greater inductive leaps).

21
Chapter-3: Decision Tree Learning
1. Introduction,
2. Decision Tree Representation,
3. Appropriate Problems For Decision Tree Learning,
4. The Basic Decision Tree Learning Algorithm,
5. Hypothesis Space Search In Decision Tree Learning,
6. Inductive Bias In Decision Tree Learning,
7. Issues In Decision Tree Learning.

1. INTRODUCTION
Decision tree learning is a method for approximating discrete-valued
target functions, in which the learned function is represented by a
decision tree. Learned trees can also be re-represented as sets of if-then
rules to improve human readability.

2. DECISION TREE REPRESENTATION

Inductive inference with decision trees

 Decision Trees is one of the most widely used and practical methods of
inductive inference
 Inductive inference: The Process of reaching a general conclusion from
specific examples.
 Features
 Method for approximating discrete-valued functions (including
Boolean)
 Learned functions are represented as decision trees (or if-then-else
rules)
 Expressive hypotheses space, including disjunction
 Robust to noisy data

When to use Decision Trees


 Problem characteristics:
 Instances can be described by attribute value pairs
 Target function is discrete valued
 Disjunctive hypothesis may be required
 Possibly noisy training data samples
 Robust to errors in training data

22
 Missing attribute values
 Different classification problems:
 Equipment or medical diagnosis
 Credit risk analysis
 Several tasks in natural language processing

A Decision Tree is a flowchart-like structure,


in which each internal node represents a test on a feature (e.g. whether a coin flip
comes up heads or tails) , each leaf node represents a class label (decision taken after
computing all features) and branches represent conjunctions of features that lead to
those class labels. The paths from root to leaf represent classification rules.
Below diagram illustrate the basic flow of decision tree for decision making with
labels (Rain(Yes), No Rain(No)).

Decision tree is one of the predictive modelling approaches used in statistics, data
mining and machine learning.

Decision trees are constructed via an algorithmic approach that identifies ways to
split a data set based on different conditions. It is one of the most widely used and
practical methods for supervised learning.

Decision Trees are a non-parametric supervised learning method used for


both classification and regression tasks.

Tree models where the target variable can take a discrete set of values are
called classification trees.

23
Decision trees where the target variable can take continuous values (typically real
numbers) are called regression trees.
Classification And Regression Tree (CART) is general term for this.

Data Format

Data comes in records of forms. (x,Y)=(x1,x2,x3,....,xk,Y)

The dependent variable, Y, is the target variable that we are trying to understand,
classify or generalize. The vector x is composed of the features, x1, x2, x3 etc., that
are used for that task.

Example
training_data = [
['Green', 3, 'Apple'],
['Yellow', 3, 'Apple'],
['Red', 1, 'Grape'],
['Red', 1, 'Grape'],
['Yellow', 3, 'Lemon'],
]
# Header = ["Color", "diameter", "Label"]
# The last column is the label.
# The first two columns are features.

3. APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING

A variety of decision tree learning methods have been developed


with somewhat differing capabilities and requirements, decision
tree learning is generally best suited to problems with the
following characteristics:

24
 Instances are represented by attribute-value pairs. Instances are
described by a fixed set of attributes (e.g., Temperature) and their
values (e.g., Hot). The easiest situation for decision tree learning is
when each attribute takes on a small number of disjoint possible
values (e.g., Hot, Mild, Cold). However, extensions to the basic
algorithm allow handling real-valued attributes as well (e.g.,
representing Temperature numerically).
 The targetfunction has discrete output values. The decision tree
assigns a boolean classification (e.g., yes or no) to each example.
Decision tree methods easily extend to learning functions with more
than two possible output values.
 A more substantial extension allows learning target functions with
real-valued outputs, though the application of decision trees in this
setting is less common.
 Disjunctive descriptions may be required. As noted above, decision
trees naturally represent disjunctive expressions.
 The training data may contain errors. Decision tree learning methods
are robust to errors, both errors in classifications of the training
examples and errors in the attribute values that describe these
examples.
 The training data may contain missing attribute values. Decision tree
methods can be used even when some training examples

Note: Remaining Notes will be posted later.

25

You might also like