0% found this document useful (0 votes)

108 views

To Machine Learning: Isabelle Guyon

The document introduces machine learning and describes how machine learning algorithms can be trained on data to learn patterns and make predictions. It discusses different types of machine learning models like linear models, kernel methods, neural networks, and decision trees. Finally, it provides examples of applications of machine learning in areas like banking, biomedicine, computer interfaces, and the internet.

Uploaded by

TraderCat Solaris

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views

To Machine Learning: Isabelle Guyon

Uploaded by

TraderCat Solaris

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 40

Introduction

to
Machine Learning
Isabelle Guyon
isabelle@clopinet.com
What is Machine Learning?

Learning Trained
algorithm machine

TRAINING
DATA Answer

Query
What for?

• Classification
• Time series prediction
• Regression
• Clustering
Some Learning Machines

• Linear models
• Kernel methods
• Neural networks
• Decision trees
Applications

training
Market
examples5
10 Ecology Analysis Machine
Vision
104 Text
Categorization
103 OCR
System diagnosis

HWR
102
Bioinformatics
10

inputs
10 102 103 104 105
Banking / Telecom / Retail

• Identify:
– Prospective customers
– Dissatisfied customers
– Good customers
– Bad payers
• Obtain:
– More effective advertising
– Less credit risk
– Fewer fraud
– Decreased churn rate
Biomedical / Biometrics

• Medicine:
– Screening
– Diagnosis and prognosis
– Drug discovery

• Security:
– Face recognition
– Signature / fingerprint / iris
verification
– DNA fingerprinting 6
Computer / Internet

• Computer interfaces:
– Troubleshooting wizards
– Handwriting and speech
– Brain waves

• Internet
– Hit ranking
– Spam filtering
– Text categorization
– Text translation
– Recommendation
7
Challenges

training NIPS 2003 &

examples5 Ada WCCI 2006
10 Sylva

104
Dexter, Nova

103 Gisette
Gina
Madelon

102 Arcene,
Dorothea, Hiva
10

inputs
10 102 103 104 105
Ten Classification Tasks
40 150
ARCENE 100 ADA
20 50
0
0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0 5 10 15 20 25 30 35 40 45 50
40 150
DEXTER 100 GINA
20 50
0
0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0 5 10 15 20 25 30 35 40 45 50
40
150
DOROTHEA 100 HIVA
20
50
0
0
0 5 10 15 20 25 30 35 40 45 50 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
40
GISETTE 150
20 100 NOVA
50
0 0
0 5 10 15 20 25 30 35 40 45 50
40 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
MADELON
20 150
100 SYLVA
0 50
0 5 10 15 20 25 30 35 40 45 50 0
Test BER (%) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Challenge Winning Methods

1.8
1.6 Gisette (HWR)
1.4 Gina (HWR)
1.2 Dexter (Text)
BER/<BER>

1 Nova (Text)
0.8 Madelon (Artificial)
0.6 Arcene (Spectral)
0.4 Dorothea (Pharma)
0.2 Hiva (Pharma)
0 Ada (Marketing)
Linear Neural Trees Naïve Sylva (Ecology)
/Kernel Nets /RF Bayes
Conventions

X={xij} m y ={yj}
xi


w
Learning problem
Data matrix: X
m lines = patterns (data points,
examples): samples, patients,
documents, images, …
n columns = features: (attributes,
input variables): genes, proteins,
words, pixels, …

Unsupervised learning
Is there structure in data?
Supervised learning
Predict an outcome y.
Colon cancer, Alon et al 1999
Linear Models

• f(x) = w  x +b = j=1:n wj xj +b
Linearity in the parameters, NOT in the input
components.
• f(x) = w  (x) +b = j wj j(x) +b (Perceptron)

• f(x) = i=1:m i k(xi,x) +b (Kernel method)

Artificial Neurons
Cell potential
x1
w1
x2 w2
 f(x)

Axon
wn
Activation xn b Activation function
of other
neurons 1 Dendrites
Synapses

McCulloch and Pitts, 1943 f(x) = w  x + b

Linear Decision Boundary

hyperplane
0.5
x2 0.4

0.3

0.2

0.1

X3
x3
-0.1

-0.2

-0.3

-0.4

-0.5
-0.5
0 -0.
0
x1 x
0.5 0.5
X2 2 xX11
Perceptron
Rosenblatt, 1957
x1 1(x)
w1
x2 2(x)
w2


f(x)

xn wN
N(x) b
f(x) = w  (x) + b
1
NL Decision Boundary

x2
0.5

Hs.7780
x3 0

-0.5
0.5
0.5
0
0

x
Hs.234680
2
-0.5 -0.5
Hs.128749
x1 x1
Kernel Method
Potential functions, Aizerman et al 1964
x1 k(x1,x)
1
x2 k(x2,x) 2


xn m
k(xm,x) b
f(x) = i i k(xi,x) + b
1 k(. ,. ) is a similarity measure or “kernel”.
Hebb’s Rule

wj  wj + yi xij

Activation
of another xj wj y
neuron  Axon

Dendrite
Synapse

Link to “Naïve Bayes”

Kernel “Trick” (for Hebb’s rule)

• Hebb’s rule for the Perceptron:

w = i yi (xi)
f(x) = w  (x) = i yi (xi)  (x)

• Define a dot product:

k(xi,x) = (xi)  (x)
f(x) = i yi k(xi,x)
Kernel “Trick” (general)

• f(x) = i i k(xi, x)

• k(xi, x) = (xi)  (x)

Dual forms
• f(x) = w  (x)
• w = i i (xi)
What is a Kernel?
A kernel is:
• a similarity measure
• a dot product in some feature space: k(s, t) = (s)  (t)

But we do not need to know the  representation.

Examples:
• k(s, t) = exp(-||s-t||2/2) Gaussian kernel

• k(s, t) = (s  t)q Polynomial kernel

Multi-Layer Perceptron
Back-propagation, Rumelhart et al, 1986


xj



internal “latent” variables

“hidden units”
Chessboard Problem
Tree Classifiers
CART (Breiman, 1984) or C4.5 (Quinlan, 1993)
f2
All the
data
f1
Choose f2 At each step,
choose the
feature that
“reduces
Choose f1 entropy” most.
Work towards
“node purity”.
Iris Data (Fisher, 1936)
Figure from Norbert Jankowski and Krzysztof Grabczewski
Linear discriminant Tree classifier

setosa versicolor
virginica

Gaussian mixture Kernel method (SVM)

Fit / Robustness Tradeoff

x2 x2

x1 x1
15
Performance evaluation

f(x) < 0 f(x) < 0

f( x
)=

x2 x2
f(x)
0

f(x) > 0 f(x) > 0

x1 x1
Performance evaluation

f(x) < -1 f(x) < -1

f( x

x2 f(x) =
)=

x2
-1
-1

f(x) > -1 f(x) > -1

x1 x1
Performance evaluation
f(x

f(x) < 1 f(x) < 1

x2 x2
1

f(x)
=1

f(x) > 1 f(x) > 1

x1 x1
ROC Curve
For a given
threshold Ideal ROC curve
100%
on f(x),
you get a
point on the ROC
t ual
ROC curve. Ac

Positive class
success rate
C
(hit rate, RO
dom
sensitivity) n
Ra

0 100%
1 - negative class success rate
(false alarm rate, 1-specificity)
ROC Curve
For a given
threshold Ideal ROC curve (AUC=1)
100%
on f(x),
you get a
point on the ROC
t ual
ROC curve. Ac
0.5 )
Positive class U C=
success rate (A
O C
R
(hit rate,
dom
n
sensitivity) Ra

0  AUC  1

0 100%
1 - negative class success rate
(false alarm rate, 1-specificity)
Lift Curve

Customers Ideal Lift

100%
ranked
Hit rate = Frac. good customers select.
according i ft
lL
to f(x); c tua
A M
selection O
of the top
ranking
customers. lift
dom
n
Ra

Gini  M
O

Gini=2 AUC-1
0 Fraction of customers selected 100%
0  Gini  1
Performance Assessment
Predictions: F(x)
Cost matrix
Class -1 Class +1 Total Class +1 / Total

Truth: Class -1 tn fp neg=tn+fp False alarm = fp/neg

y Class +1 tp pos=fn+tp Hit rate = tp/pos
fn
m=tn+fp
Total rej=tn+fn sel=fp+tp Frac. selected = sel/m
+fn+tp
Class+1 Precision False alarm rate = type I errate = 1-specificity
/Total = tp/sel Hit rate = 1-type II errate = sensitivity =
recall = test power
Compare F(x) = sign(f(x)) to the target y, and report:
• Error rate = (fn + fp)/m
• {Hit rate , False alarm rate} or {Hit rate , Precision} or {Hit rate , Frac.selected}
• Balanced error rate (BER) = (fn/pos + fp/neg)/2 = 1 – (sensitivity+specificity)/2
• F measure = 2 precision.recall/(precision+recall)
Vary the decision threshold  in F(x) = sign(f(x)+), and plot:
• ROC curve: Hit rate vs. False alarm rate
• Lift curve: Hit rate vs. Fraction selected
• Precision/recall curve: Hit rate vs. Precision
What is a Risk Functional?

A function of the parameters of the

learning machine, assessing how much it
is expected to fail on a given task.
Examples:
• Classification:
– Error rate: (1/m) i=1:m 1(F(xi)yi)
– 1- AUC (Gini Index = 2 AUC-1)
• Regression:
– Mean square error: (1/m) i=1:m(f(xi)-yi)2
How to train?

• Define a risk functional R[f(x,w)]

• Optimize it w.r.t. w (gradient descent,
mathematical programming, simulated
annealing, genetic algorithms, etc.)
R[f(x,w)]

Parameter space (w)

w*
(… to be continued in the next lecture)
How to Train?

• Define a risk functional R[f(x,w)]

• Find a method to optimize it, typically
“gradient descent”
wj  wj -  R/wj
or any optimization method (mathematical
programming, simulated annealing,
genetic algorithms, etc.)
(… to be continued in the next lecture)
Summary
• With linear threshold units (“neurons”) we can build:
– Linear discriminant (including Naïve Bayes)
– Kernel methods
– Neural networks
– Decision trees
• The architectural hyper-parameters may include:
– The choice of basis functions  (features)
– The kernel
– The number of units
• Learning means fitting:
– Parameters (weights)
– Hyper-parameters
– Be aware of the fit vs. robustness tradeoff
Want to Learn More?

• Pattern Classification, R. Duda, P. Hart, and D. Stork. Standard pattern

recognition textbook. Limited to classification problems. Matlab code. http://rii.
ricoh.com/~stork/DHS.html
• The Elements of statistical Learning: Data Mining, Inference, and
Prediction. T. Hastie, R. Tibshirani, J. Friedman, Standard statistics textbook.
Includes all the standard machine learning methods for classification,
regression, clustering. R code.
http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/
• Linear Discriminants and Support Vector Machines, I. Guyon and D. Stork,
In Smola et al Eds. Advances in Large Margin Classiers. Pages 147--169, MIT
Press, 2000. http://clopinet.com/isabelle/Papers/guyon_stork_nips98.ps.gz
• Feature Extraction: Foundations and Applications. I. Guyon et al, Eds. Book
for practitioners with datasets of NIPS 2003 challenge, tutorials, best
performing methods, Matlab code, teaching material. http://clopinet.com/fextract
-book

Generative AI Refresher - E0
89% (9)
Generative AI Refresher - E0
3 pages
Adaptive Neural Fuzzy Inference Systems (ANFIS) : Analysis and Applications
No ratings yet
Adaptive Neural Fuzzy Inference Systems (ANFIS) : Analysis and Applications
29 pages
Fix-5.0 SP1 Vol-6
No ratings yet
Fix-5.0 SP1 Vol-6
434 pages
PyBrain Slides
No ratings yet
PyBrain Slides
20 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Embedded System Definition
No ratings yet
Embedded System Definition
6 pages
Embedded Systems Chapter 1
No ratings yet
Embedded Systems Chapter 1
42 pages
Introduction To Embedded Systems
No ratings yet
Introduction To Embedded Systems
9 pages
Embedded Systems
No ratings yet
Embedded Systems
15 pages
Embedded Systems Chap 1
No ratings yet
Embedded Systems Chap 1
47 pages
K-Nearest Neighbor (KNN)
No ratings yet
K-Nearest Neighbor (KNN)
27 pages
Digital Filter Design
No ratings yet
Digital Filter Design
102 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
A Course in Advanced Signal Processing
No ratings yet
A Course in Advanced Signal Processing
16 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
Machine Learning and Applications: Assignment
No ratings yet
Machine Learning and Applications: Assignment
2 pages
Python: The Future Language of IT Industry
100% (1)
Python: The Future Language of IT Industry
12 pages
Hardware Security
No ratings yet
Hardware Security
22 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
International Engineering Professionalism
0% (1)
International Engineering Professionalism
13 pages
ASE396 Methods of Estimation/Detection Scribe Notes
No ratings yet
ASE396 Methods of Estimation/Detection Scribe Notes
116 pages
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
SVM
No ratings yet
SVM
12 pages
KSC2016 - Recurrent Neural Networks
No ratings yet
KSC2016 - Recurrent Neural Networks
66 pages
Accident Detection and Warning System
No ratings yet
Accident Detection and Warning System
4 pages
05 - Simulated Annealing - 01
100% (1)
05 - Simulated Annealing - 01
44 pages
Data Mining Cheat Sheet
No ratings yet
Data Mining Cheat Sheet
6 pages
Machine Learning CNN
No ratings yet
Machine Learning CNN
28 pages
Lec1 Machine Learning
No ratings yet
Lec1 Machine Learning
25 pages
Algebraic Loops - MATLAB & Simulink
No ratings yet
Algebraic Loops - MATLAB & Simulink
21 pages
Intelligent Agents
No ratings yet
Intelligent Agents
62 pages
Signals in MatLab PDF
No ratings yet
Signals in MatLab PDF
18 pages
Direction of Arrival Using 2-D Matrix Pencil Method
No ratings yet
Direction of Arrival Using 2-D Matrix Pencil Method
5 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Real Time DSP: Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria
No ratings yet
Real Time DSP: Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria
29 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Matlab-Intro11.12.08 Sina PDF
No ratings yet
Matlab-Intro11.12.08 Sina PDF
26 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Tabu Search 1
100% (1)
Tabu Search 1
15 pages
Acquisition of Seismic Data Using Wiener Filter
No ratings yet
Acquisition of Seismic Data Using Wiener Filter
26 pages
Median Filter
No ratings yet
Median Filter
7 pages
P4 Discrete Event System Simulation CH 01
No ratings yet
P4 Discrete Event System Simulation CH 01
36 pages
DISCRETE-TIME RANDOM PROCESS Summary
No ratings yet
DISCRETE-TIME RANDOM PROCESS Summary
13 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
MalenoV Code 5 Layer CNN 65x65x65 Voxels
No ratings yet
MalenoV Code 5 Layer CNN 65x65x65 Voxels
30 pages
5.leadership and Motivation
100% (3)
5.leadership and Motivation
17 pages
Lecture 6 - Spectrum Estimation PDF
0% (1)
Lecture 6 - Spectrum Estimation PDF
47 pages
ML Decode TE IT
No ratings yet
ML Decode TE IT
71 pages
Evolution of Machine Learning
No ratings yet
Evolution of Machine Learning
7 pages
EXP NO:07-a Date: Design of Fir Filter by Windowing Technique
No ratings yet
EXP NO:07-a Date: Design of Fir Filter by Windowing Technique
15 pages
Lab 6 - Matlab FDATools
No ratings yet
Lab 6 - Matlab FDATools
4 pages
Solucion A Algunos Ejercicios Capitulo 7
No ratings yet
Solucion A Algunos Ejercicios Capitulo 7
14 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
100% (1)
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
57 pages
Swarm Intelligence PSO and ACO
No ratings yet
Swarm Intelligence PSO and ACO
69 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Classification
100% (2)
Classification
105 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Non Parametric Estimation For Financial Investment Under Log-Utility
No ratings yet
Non Parametric Estimation For Financial Investment Under Log-Utility
193 pages
Principal Component and Constantly Re-Balanced Portfolio - Slides
No ratings yet
Principal Component and Constantly Re-Balanced Portfolio - Slides
65 pages
Time Series Analysis
No ratings yet
Time Series Analysis
240 pages
Fix-5.0 SP1 Vol-7
No ratings yet
Fix-5.0 SP1 Vol-7
233 pages
Non Parametric Prediction
No ratings yet
Non Parametric Prediction
16 pages
Fix-5.0 SP1 Vol-2
No ratings yet
Fix-5.0 SP1 Vol-2
9 pages
Fix-5.0 SP1 Vol-1
No ratings yet
Fix-5.0 SP1 Vol-1
144 pages
Time Series Documentation - Mathematica
100% (2)
Time Series Documentation - Mathematica
214 pages
Portfolio Formation Can Affect Asset Pricing Tests: Ingrid Lo
No ratings yet
Portfolio Formation Can Affect Asset Pricing Tests: Ingrid Lo
15 pages
Fourier Series
No ratings yet
Fourier Series
440 pages
Brief Notes #4 Random Vectors: and X
No ratings yet
Brief Notes #4 Random Vectors: and X
4 pages
Brief Notes #3 Random Variables: Continuous Distributions
No ratings yet
Brief Notes #3 Random Variables: Continuous Distributions
3 pages
Brief Notes #2 Random Variables: Discrete Distributions
No ratings yet
Brief Notes #2 Random Variables: Discrete Distributions
3 pages
Brief Notes #1 Events and Their Probability: Definitions
No ratings yet
Brief Notes #1 Events and Their Probability: Definitions
3 pages
Probability
100% (2)
Probability
520 pages
HHT Basics - Slides
No ratings yet
HHT Basics - Slides
64 pages
Theory of Interest
No ratings yet
Theory of Interest
15 pages
Lec13 Neural Networks and Deep Learning PDF
No ratings yet
Lec13 Neural Networks and Deep Learning PDF
33 pages
Mushtaq 2021 IOP Conf. Ser. Mater. Sci. Eng. 1070 012049
No ratings yet
Mushtaq 2021 IOP Conf. Ser. Mater. Sci. Eng. 1070 012049
14 pages
Leancontext: Cost-Efficient Domain-Specific Question Answering Using Llms
No ratings yet
Leancontext: Cost-Efficient Domain-Specific Question Answering Using Llms
8 pages
Long Short-Term Memory Network For Remaining Useful Life Estimation
No ratings yet
Long Short-Term Memory Network For Remaining Useful Life Estimation
8 pages
Fa22 Bee 105 (KAMRAN AZIZ) Lab 6 Ai
No ratings yet
Fa22 Bee 105 (KAMRAN AZIZ) Lab 6 Ai
7 pages
Speech Emotion Recognition Using Deep Learning Techniques: A Review
No ratings yet
Speech Emotion Recognition Using Deep Learning Techniques: A Review
19 pages
13 - Chapter 5 PDF
No ratings yet
13 - Chapter 5 PDF
40 pages
Resume Ishan Nigam
No ratings yet
Resume Ishan Nigam
2 pages
GAN-based End-to-End Unsupervised Image Registration For RGB-Infrared Image
No ratings yet
GAN-based End-to-End Unsupervised Image Registration For RGB-Infrared Image
5 pages
Detection of Crop Diseases Using Machine Learning
No ratings yet
Detection of Crop Diseases Using Machine Learning
10 pages
Ai Unit 7
No ratings yet
Ai Unit 7
9 pages
AI ppt
No ratings yet
AI ppt
10 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
27 pages
Week-3 Module-2 Neural Network
No ratings yet
Week-3 Module-2 Neural Network
58 pages
Dfy Chatbot Dev Using Python
No ratings yet
Dfy Chatbot Dev Using Python
4 pages
Book Chapter Final
No ratings yet
Book Chapter Final
39 pages
WCCI 2024 Program Handout
No ratings yet
WCCI 2024 Program Handout
2 pages
Lecture 13
No ratings yet
Lecture 13
29 pages
Week2 - 2022 - Biological Data Science - Polikar - Traditional Machine Learning Lecture
No ratings yet
Week2 - 2022 - Biological Data Science - Polikar - Traditional Machine Learning Lecture
123 pages
Finxter OpenAI Glossary
No ratings yet
Finxter OpenAI Glossary
1 page
Unit-3 ML Mech 3-2
No ratings yet
Unit-3 ML Mech 3-2
16 pages
Kesya Nursyahada - 21081010120 - Monthly Report Learning Progress 3&4
No ratings yet
Kesya Nursyahada - 21081010120 - Monthly Report Learning Progress 3&4
9 pages
ANN-CNN-RNN
No ratings yet
ANN-CNN-RNN
26 pages
Recurrent Neural Network (RNN) Are A Type of
No ratings yet
Recurrent Neural Network (RNN) Are A Type of
4 pages
Transformer-Transducer End-to-End Speech Recognition with Self-Attention
No ratings yet
Transformer-Transducer End-to-End Speech Recognition with Self-Attention
5 pages
2024 4 GPT
No ratings yet
2024 4 GPT
2 pages
Indian Sign Language Recognition System
No ratings yet
Indian Sign Language Recognition System
3 pages
Comparison Between Multinomial and Bernoulli Naïve Bayes For Text Classification
No ratings yet
Comparison Between Multinomial and Bernoulli Naïve Bayes For Text Classification
4 pages