Decision Tree
Decision Tree
Decision Tree
Decision Trees
Decision Trees are a type of Supervised Machine Learning (that is you
explain what the input is and what the corresponding output is in the
training data) where the data is continuously split according to a certain
parameter. The tree can be explained by two entities, namely decision
nodes and leaves. The leaves are the decisions or the final outcomes. And
the decision nodes are where the data is split.
There are two main types of Decision Trees:
Classification trees (Yes/No types): the decision variable is Categorical.
Regression trees (Continuous data types): the decision or the outcome variable is
Continuous,
1
03/12/1445
Decision Trees
A decision tree is a flow-chart-like tree structure
Internal node denotes a test on an attribute (feature)
Branch represents an outcome of the test
All records in a branch have the same value for the tested attribute
Leaf node represents class label or class label distribution
outlook
humidity P windy
N P N P
2
03/12/1445
3
03/12/1445
outlook
So a new instance:
sunny overcast rain
<rainy, hot, normal, true>: ?
N P N P
N P N P
Rule1: Rule3:
If (outlook=“sunny”) AND (humidity<=0.75) If (outlook=“overcast”)
Then (play=“yes”) Then (play=“yes”)
Rule2: ...
If (outlook=“rainy”) AND (wind>20)
Then (play=“no”)
4
03/12/1445
Note: ID3 algorithm only deals with categorical attributes, but can be extended
(as in C4.5) to handle continuous attributes
10
5
03/12/1445
11
Entropy
Entropy for a two class variable
12
6
03/12/1445
13
Information Gain
Now, assume that using attribute A a set S of instances will be
partitioned into sets S1, S2 , …, Sv each corresponding to distinct
values of attribute A.
If Si contains pi cases of P and ni cases of N, the entropy, or the expected
information needed to classify objects in all subtrees Si is
Gain( A) I ( p, n) E ( A)
At any point we want to branch using an attribute that provides the highest
information gain.
14
7
03/12/1445
16
8
03/12/1445
? yes ?
[2+,3-] [4+,0-] [3+,2-]
{D1, D2, D8, D9, D11} {D3, D7, D12, D13} {D4, D5, D6, D10, D14}
17
18
9
03/12/1445
Overfitting
• An induced tree may overfit the training data
• Too many branches, some may reflect anomalies due to noise or
outliers
• Some splits or leaf nodes may be the result of decision based on very
few instances, resulting in poor accuracy for unseen instances
19
20
10
03/12/1445
21
22
11
03/12/1445
23
24
12
03/12/1445
25
26
13
03/12/1445
27
14