Week 4 Part 1 Classification
Week 4 Part 1 Classification
Week-4 Part-1
Classification & Prediction
4th Year Semester-1(Elective)
Data mining algorithms: Classification and Prediction
Used to extract models describing important data
classes or to predict future data trends.
- E.g., Classification Model :- Built to categorize
bank loan applications as either safe or risky.
ted topay ba
Credit approval
• A bank wants to classify its customers based on whether they are
expected to pay back their approved loans. The history of past
customers is used to train the classifier.
oryof past cu
• The classifier provides rules, which identify potentially reliable
future customers.
er The classif
Rules
• Credit approval
Classification rule:
If age = “31...40”and income = high then credit_rating = excellent
Model construction:
• describing a set of predetermined classes: Excellent and Fair using
training set.
Model is represented using classification rules.
Classification Process (1):model
Construction
• Model Construction
• Training Data
Classification Algorithms(rule)
• IF rank = ‘professor’ OR years > 6
THEN teach = ‘yes‘ Classifier(Model)
Classification Process (1):Model Construction
classification Process (2):Model
Construction
Model construction example.
Classification contd…(Example: )
cal cal us
i i o
or or nu
teg
teg
nti
ass Single,
ca ca co cl MarSt
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No
6 No Medium 60K No
Training Set
Decision
Apply Tree
Tid Attrib1 Attrib2 Attrib3 Class
Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Typical Applications(classification)
• credit approval
• target marketing
• medical diagnosis
• treatment effectiveness analysis
• Image processing
Classification contd….
Classification—A Two-Step Process
1)First Step: (Model Construction)
- Model built describing a pre-determined set of data classes or concepts.
- A class label attribute determines which tuple belongs to which pre-determined class.
- This model is built by analyzing a training data set.
- Each tuple in a training data set is called a training sample.
- This model is represented in the form of:
o Classification Rules
o Decision Trees
o Mathematical Formulae
- Eg: Database having customer credit info
o Classification Rule – identifies customers with ‘Excellent credit rating’ or ‘Fair credit
rating’.
Classification contd….
2) Second Step: (Using the model in prediction)
- Model is used for classification
o Predictive accuracy of the model is estimated. –
There are several methods in this.
• Holdout Method is the simple technique
• Uses test set of class labelled samples.
• These are randomly selected and are independent of
training samples.
• The Accuracy of the Model on a given test set is the % of
test set samples that are correctly classified by the model.
• If the accuracy of the model is acceptable the model can
be used to classify future data tuples for which the class
label is not known.
Machine learning techniques
Supervised vs. Unsupervised
Learning
Supervised learning (classification)
o Supervision: The training data (observations, measurements,
etc.) are
accompanied by labels indicating the class of the observations
o New data is classified based on the training set
Unsupervised learning
o The class labels of training data is unknown
o Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
Ex. Clustering
Use cases for supervised
learning
Supervised Learning
Usecase2
Supervised learning usecase2
cont’d…
Supervised learning use case
questions
Points to remember
supervised learning)
Unsupervised learning
Usecase
Unsupervised learning contd..
Unsupervised learning contd..
Unsupervised learning contd..
Unsupervised learning contd..
Points to remember(unsupervised learning)
Training sets
• Given a collection of records(instances).
Each record contains a set of attributes, one of the attributes is the
class.
• Find a model for class attribute as a function of the values of
other attributes.
• Goal: previously unseen records should be assigned a class as
accurately as possible.
• A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test
sets, with training set used to build the model and test set used
to validate it.
Examples of Classification Task
• Predicting tumor cells as benign or malignant