Introduction to Machine Learning (1)
Introduction to Machine Learning (1)
Ashok Rao
Former Head, Network Project
CEDT, IISc, Bangalore
< ashokrao.mys@gmail.com >
.
Presentation Outline
What is Machine Learning ?
Why Machine Learning ?
Supervised learning
Unsupervised learning
Classifiers
Hybrid Classifiers
Panel of classifiers
2
What would have happened if
there is no learning?
Our dictionary defines “to learn” as
5
What is Machine Learning?
6
The goal of machine learning is to
build computer systems that can
adapt and learn from their
experience.
Tom Dietterich
7
Machine Learning
What is Machine Learning ?
Ability to form rules automatically and subsequent use
(decision) by exposing a system (Algorithm, structure,
data, sensors, etc.,) to input data (information).
Why Machine Learning ?
8
Machine Learning in Medical Domain
Health care is among the most critical of needs.
While Technology has improved (CT Scan, MRI, PET, in
3-D too, CA Surgery, Tele-Medicine, etc)
Still about 70% of the worlds population do not have
Quality and Reliable health care.
Automating diagnosis, Testing and if possible doing it
remote (more effective if portable) would help.
Rather than fully Automating health care, better is
providing Computer Assisted options.
One such example is radiological diagnosis of scan data.
Data
Mining
Machine
Computer Vision Learning
Robotics
Classificati
on
Feature Clusterin
Sensor Preprocessing
extraction
g
Prediction
Data Analytics Model
OneR,
Frequency Naïve Bayesian,
Decision tree
Classification and
regression
Clustering
Exploration
Min, Max, Mean,
Variance, Histogram,
Numerical
Correlation, Plot,
Skewness.
Classification
• Data: A set of data records (also called
examples, instances or cases) described
by
• k attributes: A1, A2, … Ak.
• a class: Each example is labelled
with a pre-defined class.
• Goal: To learn a classification model
from the data that can be used to predict
the classes of new (future, or test)
cases/instances.
Prediction
Prediction
Quantitative Qualitative
Causal Model
Regression
Time series
Moving Average
Exponential
Smoothing
ARIMA
Kalman Filter
Clustering
• The goal of clustering is to
• group data points that are close (or similar) to each
other
• identify such groupings (or clusters) in an
unsupervised manner
• Unsupervised: no information is provided to the
algorithm on which data points belong to which
clusters
• Example
x x
What should
the clusters be
for these data
x points?
x x
x
x x
x
Supervised learning
yes no
X1 X2 X3 T
B T B Y Learner
B S S Y
B S S Y Prediction
R A S N
T
Testing data
Y
X1 X2 X3 T
B A S ? N
Hypothesis
Y C S ?
23
Key issue: generalization
yes no
? ?
Rich (not exhaustive) training set (over fitting, Ch reg. A-Z)
24
Unsupervised learning
What if there are no output labels?
25
Supervised vs. unsupervised learning
Supervised Unsupervised
Learning based on a training set where learning based on un-annotated instances
labeling of instances represents the target (the training data doesn’t specify what we
(categorization) Function are trying to learn)
Each data in the dataset analyzed has been Each data in the dataset analyzed is not
classified classified
Needs help from the data Doesn’t need help from the data
Needs great amount of data Great amount of data are not necessarily
needed
Outcome: a classification decision Outcome: a grouping of objects (instances
and
groups of instances)
Examples: Examples:
Neural Networks (NN) Clustering (Mixture Modeling)
Decision Trees Self Organizing Map (SOM)
Support Vector Machine (SVM)
(Humans are good at creating
groups/categories/clusters from data)
Supervised learning success stories
Face detection
Steering an autonomous car across the US
Detecting credit card fraud
Medical diagnosis
…
28
Hypothesis spaces
Decision trees
Neural networks
K-nearest neighbors
…
29
Perceptron
(Single layer neural net )
30
Which separating hyperplane?
31
The best linear separator is the one with
the largest margin
margin
32
What if the data is not linearly separable?
33
Multilayer Perceptrons (hidden & output layers)
34
Kernel trick
x2
x
2 xy
y y 2
z3
x2 kernel
x1
z2
z1
36
Implications of Occam’s Razor
Simplicity is the order of things.
Simple Explanation.
Simple Model.
Simple Structure.
- Combination of “Simple”
37
Boosting
vertebrate invertebrate
1 5
4
3
6
9
8
Hierarchical Clustering
Ste Ste Ste Ste Ste
agglomerative
p0 p1 p2 p3 p4
a
ab
b
abcde
c
cde
d
de
e
divisive
Ste Ste Ste Ste Ste
p4 p3 p2 p1 p0
Partitional Clustering
Step 1: Select k
random seeds s.t. Initial
d(ki,kj) > dmin Seeds (if
k=3)
K-Means Clustering:
Initial Seeds
K-Means Clustering:
New
Centroids
K-Means Clustering:
Centroids
K-Means Clustering: Iterate
Until Stability
New
Centroids
ML enabling technologies
Faster computers
More data
The web
New ideas
Kernel trick
Large margins
Boosting
Graphical models
…
58
Some Select references
The web
Kevin Murphy, MIT, AI Lab, PPT slides
Avrim Blum, Carnegie Mellon University, PPT Slides
Bishop C. Pattern Recognition and Machine Learning.
Springer, 2006.
59
m
Principal Component
Analysis (PCA)
PCA seeks a projection that best represents the
data in a least-squares sense.
.
We shall Build on This idea next
Subspace Methods
Training and Classification
What NOT.
89