Machine Learning Notes Unit 1
Machine Learning Notes Unit 1
1
2
3
4
5
6
Designing a Learning System:
7
8
UNIT-1: CHAPTER-2
CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC 0RDERING
1. Introduction,
2. A Concept Learning Task,
3. Concept Learning As Search,
4. Find-S: Finding A Maximally Specific Hypothesis,
5. Version Spaces, and The Candidate Elimination Algorithm,
6. Remarks on Version Spaces and Candidate Elimination, Inductive Bias.
INTRODUCTION
1. Concept learning can be formulated as a problem of searching through a
predefined space of potential hypotheses for the hypothesis that best fits the
training examples.
2. In many cases, this search can be efficiently organized by taking advantage of
a naturally occurring structure over the hypothesis space-a-general-to-specific
ordering of hypotheses.
3. We consider the problem of automatically inferring the general definition of
some concept, given examples labeled as members or nonmembers of the
concept.
4. This task is commonly referred to as concept learning or approximating a
Boolean-valued function from examples.
Concept learning: Inferring a boolean-valued function from training examples
of its input and output.
9
For each attribute, the hypothesis will either 0 indicate by a "?' that any value is
acceptable for this attribute, 0 specify a single required value (e.g., Warm) for the
attribute, or 0 indicate by a "0" that no value is acceptable.
10
2.2 The INDUCTIVE LEARNING HYPOTHESIS:
Any hypothesis found to approximate the target function well over a sufficiently
large set of training examples will also approximate the target function well over
other unobserved examples.
2.3. Concept Learning As Search
11
12
2.4. Find-S: Finding a Maximally Specific Hypothesis
ML | Find S Algorithm
Introduction:
The find-S algorithm is a basic concept learning algorithm in machine learning. The
find-S algorithm finds the most specific hypothesis that fits all the positive examples.
We have to note here that the algorithm considers only those positive training
example.
The find-S algorithm starts with the most specific hypothesis and generalizes this
hypothesis each time it fails to classify an observed positive training data.
Hence, the Find-S algorithm moves from the most specific hypothesis to the most
general hypothesis.
Important Representation:
The important Machine Learning Concepts with? indicates that any value is
acceptable for the attribute.
1. specify a single required value ( e.g., Cold ) for the attribute.
2. Φ indicates that no value is acceptable.
3. The most general hypothesis is represented by: {?,?,?, ?, ?, ?}
4. The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
Example :
Consider the following data set having the data about which particular seeds are
poisonous.
13
First, we consider the hypothesis to be a more specific hypothesis. Hence, our
hypothesis would be :
h = {ϕ, ϕ, ϕ, ϕ}
Consider example 1 :
The data in example 1 is {GREEN, HARD, NO, WRINKLED }.
We see that our initial hypothesis is more specific and we have to generalize it for
this example. Hence, the hypothesis becomes : h = { GREEN, HARD, NO,
WRINKLED }
Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h = { GREEN, HARD, NO, WRINKLED }
Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes :
h = { ?, HARD, NO, WRINKLED }
Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We
14
compare every single attribute with the initial data and if any mismatch is found
we replace that particular attribute with a general case ( ” ? ” ). After doing the
process the hypothesis becomes:
h = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have the
general condition, example 6 and example 7 would result in the same hypotheses
with all general attributes.
h = { ?, ?, ?, ? }
Hence, for the given data the final hypothesis would be:
Final Hypothesis: h = { ?, ?, ?, ? }
FIND S Algorithm is used to find the Maximally Specific Hypothesis. Using the
Find-S algorithm gives a single maximally specific hypothesis for the given set of
training examples.
15
Step 2 of Find-S Algorithm First iteration
h0 = (ø, ø, ø, ø, ø, ø, ø)
X1 = <Sunny, Warm, Normal, Strong, Warm, Same>
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
Step 3
PROPERTIES: FIND – S:
The Find-S algorithm only considers the positive examples and eliminates
negative examples.
For each positive example, the algorithm checks for each attribute in the example.
If the attribute value is the same as the hypothesis value, the algorithm moves on
without any changes
There are a few limitations of the Find-S algorithm listed down below:
16
2. Inconsistent training sets can actually mislead the Find-S algorithm, since it
ignores the negative examples.
3. Find-S algorithm does not provide a backtracking technique to determine the
best possible changes that could be done to improve the resulting hypothesis.
1. The process starts with initializing ‘h’ with the most specific hypothesis,
generally, it is the first positive example in the data set.
2. We check for each positive example. If the example is negative, we will move
on to the next example but if it is a positive example we will consider it for
the next step.
3. We will check if each attribute in the example is equal to the hypothesis value.
4. If the value matches, then no changes are made.
5. If the value does not match, the value is changed to ‘?’.
6. We do this until we reach the last positive example in the data set.
17
2.5. Version Spaces, and The Candidate Elimination Algorithm
Version Space
Terms Used:
Concept learning: Concept learning is basically learning task of the machine (Learn by Train
data)
General Hypothesis: Not Specifying features to learn the machine.
G = {‘?’, ‘?’,’?’, ’?’…}: Number of attributes
Specific Hypothesis: Specifying features to learn machine (Specific feature)
S= {‘pi’,’pi’,’pi’…}: Number of pi depends on number of attributes.
Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific
Hypothesis.
Step3: For each training example
Step4: If example is positive example
If attribute_ value == hypothesis_ value:
Do nothing
else: replace attribute value with '?'
(Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.
18
Example:
Consider the dataset given below:
Algorithmic steps:
Initially : G = [[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?], [?,?,?,?,?,?],
[?,?,?,?,?,?], [?,?,?,?,?,?]]
S = [Null, Null, Null, Null, Null, Null]
For instance 1 : <'sunny','warm','normal','strong','warm','same'> and positive
output.
G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']
For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.
G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same']
For instance 3 : <'rainy','cold','high','strong','warm ','change'> and negative output.
G3 =[['sunny',?,?,?,?,?],[?,'warm',?,?,?,?],[?,?,?, ?, ?,
?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,'same']]
S3 = S2
Output :
G= ['sunny',?,?,?,?,?],[?,'warm',?,?,?,?]]
S= ['sunny', 'warm', ?, 'strong', ?, ?]
19
2.6 . Remarks on Version Spaces and Candidate Elimination
Inductive bias or the inherent bias of the algorithms are assumptions that are made
by the learning algorithm to form a hypothesis or a generalization beyond the set of
training instances in order to classify unobserved data. It involves a preference for
a simpler hypothesis that best fits the data.
• Given:
– a concept learning algorithm L for a set of instances X
– a concept c defined over X
– a set of training examples for c: Dc = {áx, c(x)ñ}
– L(xi, Dc) outcome of classification of xi after learning
• Inductive inference ( ≻ ):
Dc Ù xi ≻ L(xi, Dc)
• The inductive bias is defined as a minimal set of assumptions B, such that
(|− for deduction)
" (xi Î X) [ (B Ù Dc Ù xi) |− L(xi, Dc) ]
20
Inductive bias of Candidate- Elimination
Assume L is defined as follows:
compute VSH,D
classify new instance by complete agreement of all the hypotheses
in VSH,D
Then the inductive bias of Candidate-Elimination is simply
B º (c Î H)
In fact by assuming c Î H:
1. c Î VSH,D , in fact VSH,D includes all hypotheses in H consistent with D
2. L(xi, Dc) outputs a classification "by complete agreement", hence any
hypothesis, including c, outputs L(xi, Dc)
Inductive system
21
Chapter-3: Decision Tree Learning
1. Introduction,
2. Decision Tree Representation,
3. Appropriate Problems For Decision Tree Learning,
4. The Basic Decision Tree Learning Algorithm,
5. Hypothesis Space Search In Decision Tree Learning,
6. Inductive Bias In Decision Tree Learning,
7. Issues In Decision Tree Learning.
1. INTRODUCTION
Decision tree learning is a method for approximating discrete-valued
target functions, in which the learned function is represented by a
decision tree. Learned trees can also be re-represented as sets of if-then
rules to improve human readability.
Decision Trees is one of the most widely used and practical methods of
inductive inference
Inductive inference: The Process of reaching a general conclusion from
specific examples.
Features
Method for approximating discrete-valued functions (including
Boolean)
Learned functions are represented as decision trees (or if-then-else
rules)
Expressive hypotheses space, including disjunction
Robust to noisy data
22
Missing attribute values
Different classification problems:
Equipment or medical diagnosis
Credit risk analysis
Several tasks in natural language processing
Decision tree is one of the predictive modelling approaches used in statistics, data
mining and machine learning.
Decision trees are constructed via an algorithmic approach that identifies ways to
split a data set based on different conditions. It is one of the most widely used and
practical methods for supervised learning.
Tree models where the target variable can take a discrete set of values are
called classification trees.
23
Decision trees where the target variable can take continuous values (typically real
numbers) are called regression trees.
Classification And Regression Tree (CART) is general term for this.
Data Format
The dependent variable, Y, is the target variable that we are trying to understand,
classify or generalize. The vector x is composed of the features, x1, x2, x3 etc., that
are used for that task.
Example
training_data = [
['Green', 3, 'Apple'],
['Yellow', 3, 'Apple'],
['Red', 1, 'Grape'],
['Red', 1, 'Grape'],
['Yellow', 3, 'Lemon'],
]
# Header = ["Color", "diameter", "Label"]
# The last column is the label.
# The first two columns are features.
24
Instances are represented by attribute-value pairs. Instances are
described by a fixed set of attributes (e.g., Temperature) and their
values (e.g., Hot). The easiest situation for decision tree learning is
when each attribute takes on a small number of disjoint possible
values (e.g., Hot, Mild, Cold). However, extensions to the basic
algorithm allow handling real-valued attributes as well (e.g.,
representing Temperature numerically).
The targetfunction has discrete output values. The decision tree
assigns a boolean classification (e.g., yes or no) to each example.
Decision tree methods easily extend to learning functions with more
than two possible output values.
A more substantial extension allows learning target functions with
real-valued outputs, though the application of decision trees in this
setting is less common.
Disjunctive descriptions may be required. As noted above, decision
trees naturally represent disjunctive expressions.
The training data may contain errors. Decision tree learning methods
are robust to errors, both errors in classifications of the training
examples and errors in the attribute values that describe these
examples.
The training data may contain missing attribute values. Decision tree
methods can be used even when some training examples
25