0% found this document useful (0 votes)

44 views

Statistical Machine Learning-The Basic Approach and Current Research Challenges

This document discusses statistical machine learning and current research challenges. It describes how machine learning aims to develop algorithms that detect meaningful patterns in complex data sets too complex for humans to analyze. Representative learning tasks include medical research, fraud detection, and spam filtering. The document outlines key differences from classical statistics, such as a focus on hypothesis generation over testing. It also discusses the fundamental tradeoff between model complexity and accuracy, and challenges in avoiding overfitting while achieving computational efficiency. Support vector machines are presented as one approach that aims to balance these considerations.

Uploaded by

D'reef Newton 'dududpersonz'

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Statistical Machine Learning-The Basic Approach and Current Research Challenges

Uploaded by

D'reef Newton 'dududpersonz'

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 35

Statistical Machine LearningThe Basic Approach and

Current Research Challenges

Shai Ben-David
CS497
February, 2007

A High Level Agenda

The purpose of science is
to find meaningful simplicity
in the midst of disorderly complexity

Herbert Simon

Representative learning tasks

Medical

research.
Detection of fraudulent activity
(credit card transactions, intrusion
detection, stock market manipulation)
Analysis of genome functionality
Email spam detection.
Spatial prediction of landslide hazards.

Common to all such tasks

We wish to develop algorithms that detect meaningful

regularities in large complex data sets.

We focus on data that is too complex for humans to

figure out its meaningful regularities.

We consider the task of finding such regularities from

random samples of the data population.

We should derive conclusions in timely manner.

Computational efficiency is essential.

Different types of learning tasks

Classification prediction
we wish to classify data points into categories, and we
are given already classified samples as our training
input.

For example:
Training a spam filter
Medical Diagnosis (Patient info
High/Low risk).
Stock market prediction ( Predict tomorrows market
trend from companies performance data)

Other Learning Tasks

Clustering
the grouping data into representative collections
- a fundamental tool for data analysis.

Examples :

Clustering customers for targeted marketing.

Clustering pixels to detect objects in images.

Clustering web pages for content similarity.

Differences from Classical Statistics

are interested in hypothesis generation

rather than hypothesis testing.
We wish to make no prior assumptions
about the structure of our data.
We develop algorithms for automated
generation of hypotheses.
We are concerned with computational
efficiency.

Learning Theory:
The fundamental dilemma
Tradeoff between
accuracy and simplicity

y=f(x)

Y
Good models
should enable
Prediction
of new data
X

Accuracy

A Fundamental Dilemma of Science:

Model Complexity vs Prediction Accuracy

Limited data

Possible
Models/representati
ons

Complexity

Problem Outline

We are interested in
(automated) Hypothesis Generation,
rather than traditional Hypothesis Testing

First obstacle: The danger of overfitting.

First solution:
Consider only a limited set of candidate hypotheses.

Empirical Risk Minimization

Paradigm
Choose a Hypothesis Class H of subsets of X.
For an input sample S, find some h in H that fits S
well.

For a new point x, predict a label according to its

membership in h.

The Mathematical Justification

Assume both a training sample S and the test point
(x,l) are generated i.i.d. by the same distribution over
X x {0,1} then,
If H is not too rich ( in some formal sense) then,
for every h in H, the training error of h on the
sample S is a good estimate of its probability of
success on the new x .
In other words there is no overfitting

The Mathematical Justification - Formally

If S is sampled i.i.d. by some probability P over X{0,1}
then, with probability > 1-, For all h in H

1
VC dim( H ) ln( )
| {(x, y) S : h( x) y} |

Pr( x , y )D (h( x) y )
c
|S|
|S|

Expected test error

Training error

Complexity Term

The Types of Errors to be

Considered
Training error
minimizer

Best regressor for P

The Class H
Total error
Approximation Error
Estimation Error

Best h (in H) for P

The Model Selection Problem

Expanding H
will lower the approximation error
BUT
it will increase the estimation error
(lower statistical soundness)

Yet another problem

Computational Complexity

Once we have a large enough training

sample,
how much computation is required to
search for a good hypothesis?
(That is, empirically good.)

The Computational Problem

Given a class H of subsets of Rn

Input: A finite set of {0, 1}-labeled

points S in Rn
1}

Output: Some hypothesis function h in H that

maximizes the number of correctly labeled points of S.

Hardness-of-Approximation Results
For each of the following classes, approximating the
best agreement rate for h in H (on a given input
sample S ) up to some constant ratio, is NP-hard :
Monomials
Constant width
Monotone Monomials
Half-spaces
Balls
BD-Eiron-Long
Axis aligned Rectangles
Threshold NNs
Bartlett- BD

The Types of Errors to be

Considered
Arg min{ Er s ( h ) : h H }
Output of the the
learning Algorithm

Best regressor for D

The Class H
Approximation Error
Estimation Error
Computational Error
Total Error

Arg min{ Er( h ) : h H }

Our hypotheses set should balance

several requirements:
Expressiveness being able to capture the
structure of our learning task.

Statistical compactness- having low

combinatorial complexity.

Computational manageability existence of

efficient ERM algorithms.

Concrete learning paradigm- linear separators

The predictor h:

Sign ( wi xi+b)

(where w is the weight vector of the hyperplane h,

and x=(x1, xi,xn) is the example to classify)

Potential problem
data may not be linearly separable

The SVM Paradigm

Choose an Embedding of the domain X into

some high dimensional Euclidean space,
so that the data sample becomes (almost)
linearly separable.

Find a large-margin data-separating hyperplane

in this image space, and use it for prediction.
Important gain: When the data is separable,
finding such a hyperplane is computationally feasible.

The SVM Idea: an Example

x (x, x2)

The SVM Idea: an Example

Controlling Computational Complexity

Potentially the embeddings may require

very high Euclidean dimension.
How can we search for hyperplanes
efficiently?
The Kernel Trick: Use algorithms that
depend only on the inner product of
sample points.

Kernel-Based Algorithms
Rather than define the embedding explicitly, define
just the matrix of the inner products in the range
space.
K(x1x1)
K(x1xm)
K(x1x2)
K(xixj)
K(xmx1)

............

.......

........

K(xmxm
)

Mercer Theorem: If the matrix is symmetric and positive

semi-definite, then it is the inner product matrix with
respect to some embedding

Support Vector Machines (SVMs)

On input: Sample (x1 y1) ... (xmym) and a

kernel matrix K
Output:

A good separating hyperplane

A Potential Problem: Generalization

VC-dimension bounds: The VC-dimension of

the class of half-spaces in Rn is n+1.

n+1
Can we guarantee low dimension of the embeddings
range?

Margin bounds: Regardless of the Euclidean

dimension, generalization
can bounded as a function of
g
the margins of the hypothesis hyperplane.
Can one guarantee the existence of a large-margin
separation?

The Margins of a Sample

max
separating h

min
xi

wn xi

(where wn is the weight vector of the hyperplane h)

Summary of SVM learning

The user chooses a Kernel Matrix
- a measure of similarity between
input points.
2. Upon viewing the training data, the
algorithm finds a linear separator the
maximizes the margins (in the high
dimensional Feature Space).
1.

How are the basic requirements met?

Expressiveness by allowing all types of kernels
there is (potentially) high expressive power.

Statistical compactness- only if we are lucky,

and the algorithm found a large margin good
separator.

Computational manageability it turns out the

search for a large margin classifier can be done in
time polynomial in the input size.

倪海夏人纪简体针灸教程
100% (1)
倪海夏人纪简体针灸教程
216 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Slide 1
No ratings yet
Slide 1
37 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Hypothesis Space and Inductive Bias - Inductive Bias - Inductive Learning - Underfitting and Overfitting
No ratings yet
Hypothesis Space and Inductive Bias - Inductive Bias - Inductive Learning - Underfitting and Overfitting
4 pages
Machine Learning-2
No ratings yet
Machine Learning-2
16 pages
ML Unit-4 Prob Learning
No ratings yet
ML Unit-4 Prob Learning
36 pages
Bark08 Ghahramani Samlbb 01
No ratings yet
Bark08 Ghahramani Samlbb 01
26 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Bayesian Learning
No ratings yet
Bayesian Learning
81 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
AL3451 13 M
No ratings yet
AL3451 13 M
22 pages
15CSL76
No ratings yet
15CSL76
35 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
ps2
No ratings yet
ps2
9 pages
ML - Unit 1 - Part Ii
No ratings yet
ML - Unit 1 - Part Ii
18 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
Lecture 1: Brief Overview - PAC Learning
No ratings yet
Lecture 1: Brief Overview - PAC Learning
3 pages
SML_Lecture2
No ratings yet
SML_Lecture2
35 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Week_7_Notes[1]
No ratings yet
Week_7_Notes[1]
11 pages
03-computational cognitive science
No ratings yet
03-computational cognitive science
42 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
n27 PDF
No ratings yet
n27 PDF
3 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
ML_UNIT 4
No ratings yet
ML_UNIT 4
15 pages
ML - Questions & Answer
No ratings yet
ML - Questions & Answer
45 pages
ML_Industry_Lab_File_With_Code_and_IO
No ratings yet
ML_Industry_Lab_File_With_Code_and_IO
8 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
ML Lecture 8
No ratings yet
ML Lecture 8
12 pages
Neural Networks Economics
No ratings yet
Neural Networks Economics
27 pages
Bayesian Learning Note
No ratings yet
Bayesian Learning Note
20 pages
Lecture 2: Basics and Definitions: Networks As Data Models
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
28 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Answer Kwy AIML Nw
No ratings yet
Answer Kwy AIML Nw
10 pages
Practical Statistical Relational AI: Pedro Domingos
No ratings yet
Practical Statistical Relational AI: Pedro Domingos
109 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Male
No ratings yet
Male
9 pages
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
No ratings yet
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
44 pages
mlt 2021-22
No ratings yet
mlt 2021-22
14 pages
8.31.17 Final Data Science Lecture 1
No ratings yet
8.31.17 Final Data Science Lecture 1
13 pages
Black-Box Randomized Reductions in Algorithmic Mechanism Design
No ratings yet
Black-Box Randomized Reductions in Algorithmic Mechanism Design
10 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
Support Vector Machines: Trends & Controversies
No ratings yet
Support Vector Machines: Trends & Controversies
11 pages
Wk01 machine learning
No ratings yet
Wk01 machine learning
6 pages
Learnability Can Be Undecidable-Nicolelis
No ratings yet
Learnability Can Be Undecidable-Nicolelis
5 pages
Pac VC PDF
No ratings yet
Pac VC PDF
32 pages
Chapter 07
No ratings yet
Chapter 07
68 pages
Lec05 Brute Force
No ratings yet
Lec05 Brute Force
31 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
QG - THPT Bac Luong Son 1
No ratings yet
QG - THPT Bac Luong Son 1
5 pages
Austin Journal of Nutrition & Metabolism
No ratings yet
Austin Journal of Nutrition & Metabolism
12 pages
Heart Block
100% (1)
Heart Block
26 pages
8B - Lab 4 - Mirrors, Lenses and Optical Instruments
No ratings yet
8B - Lab 4 - Mirrors, Lenses and Optical Instruments
11 pages
Iodine Clock Reaction
No ratings yet
Iodine Clock Reaction
3 pages
Preventative Care Assignment-Part 24 Answers
No ratings yet
Preventative Care Assignment-Part 24 Answers
9 pages
Catholic Prayer Book
No ratings yet
Catholic Prayer Book
19 pages
Dokumen - Tips - Bpharm-7-Semester-Project-Reportfinal 2 PDF
No ratings yet
Dokumen - Tips - Bpharm-7-Semester-Project-Reportfinal 2 PDF
25 pages
Scope and Sequence COST 2
No ratings yet
Scope and Sequence COST 2
5 pages
Knowing Oneself
100% (1)
Knowing Oneself
2 pages
Making Movies Manual
No ratings yet
Making Movies Manual
134 pages
Eddie Betts
No ratings yet
Eddie Betts
7 pages
(Ebook) Julius Caesar and the transformation of the Roman Republic by Tom Stevenson ISBN 9781315746173, 1315746174 - Download the ebook with all fully detailed chapters
100% (1)
(Ebook) Julius Caesar and the transformation of the Roman Republic by Tom Stevenson ISBN 9781315746173, 1315746174 - Download the ebook with all fully detailed chapters
47 pages
Standard Steel
No ratings yet
Standard Steel
13 pages
Booking and Info Sheet
No ratings yet
Booking and Info Sheet
5 pages
Jan Lokpal
No ratings yet
Jan Lokpal
18 pages
List of Publications
No ratings yet
List of Publications
14 pages
CH 01 HW
No ratings yet
CH 01 HW
19 pages
Gregorian-Lunar Calendar Conversion Table of 1952 (Ren-Chen - Year of The Dragon)
No ratings yet
Gregorian-Lunar Calendar Conversion Table of 1952 (Ren-Chen - Year of The Dragon)
1 page
Moonlight Resume
No ratings yet
Moonlight Resume
1 page
Writing The Discussion Chapter - Chapter 5
100% (8)
Writing The Discussion Chapter - Chapter 5
11 pages
Daily Routine 1
No ratings yet
Daily Routine 1
5 pages
Strategic Management Chapter 1
No ratings yet
Strategic Management Chapter 1
9 pages
The Temple Architecture Styles in India
No ratings yet
The Temple Architecture Styles in India
9 pages
Crown's Rule in India
No ratings yet
Crown's Rule in India
9 pages
Beyond Budgeting Refined
No ratings yet
Beyond Budgeting Refined
12 pages
A Guide To CPFR Implementation PDF
No ratings yet
A Guide To CPFR Implementation PDF
63 pages
Power and Professions in Britain, 1700-1850 (Penelo Corfield) (Z-Library)
No ratings yet
Power and Professions in Britain, 1700-1850 (Penelo Corfield) (Z-Library)
281 pages
TSLB3113 - Language Assessment in The Classroom - T1 - Anis PDF
100% (1)
TSLB3113 - Language Assessment in The Classroom - T1 - Anis PDF
1 page