0% found this document useful (0 votes)

52 views

01 Machine Learning Basics

This document provides an overview of machine learning basics, including classification, regression, training, generalization, capacity, underfitting, overfitting, and optimal capacity. The key points are: 1) Classification predicts categories while regression predicts real-valued outputs. Training estimates parameters to optimize an objective function on a training set. 2) Generalization is the performance on new data, which is estimated using a test set. Capacity is the ability to learn complex functions, and high capacity can cause overfitting. 3) Underfitting occurs when the model cannot fit the training data well, while overfitting occurs when the model fits training data too closely and loses generalization ability. Occam's Razor suggests choosing the

Uploaded by

anirudh917

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

01 Machine Learning Basics

Uploaded by

anirudh917

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Machine Learning Basics

Xiaogang Wang

xgwang@ee.cuhk.edu.hk

January 5, 2015

cuhk

Xiaogang Wang Machine Learning Basics

Machine Learning

cuhk

Xiaogang Wang Machine Learning Basics

Classification
f (x) predicts the category that x belongs to
f : RD → {1, . . . , K }

f (x) is decided by the decision boundary

As an variant, f can also predict the probability distribution over
classes given x, f (x) = P(y |x). The category is predicted as
y ∗ = arg max P(y = k |x)
k

cuhk

(Duda et al. Pattern Classification 2000)

Xiaogang Wang Machine Learning Basics

Regression

Predict real-valued output

f : RD → RM

Example: linear regression

D
X
y = wt x = wd xd + w0
d=1

cuhk

(Bengio et al. Deep Learning 2014)

Xiaogang Wang Machine Learning Basics

Training

Training: estimate the parameters of f from {(x(train)

i , yi(train) )}
Decision boundary, parameters of P(y |x), and w in linear
regression

Optimize an objective function on the training set. It is a

performance measure on the training set and could be different
from that on the test set.
Mean squared error (MSE) for linear regression
1 X
MSEtrain = ||wt x(train)
i − yi(train) ||22
N
i

Cross entropy (CE) for classification

1 X
CEtrain = log P(y = yi(train) |x(train)
i )
N
i

Why not use classification errors #{f (x(train)

i ) 6= yi(train) }? cuhk

Xiaogang Wang Machine Learning Basics

Optimization

The choice of the objective function should be good for

optimization
Take linear regression as an example

5w MSEtrain = 0

⇒ 5w ||X(train) w − y(train) ||22 = 0

w = (X(train)t X(train) )−1 X(train)t y(train)

where X(train) = [x(train)
1 , . . . , x(train)
N ] and y(train) = [y1(train) , . . . , yN(train) ].

cuhk

Xiaogang Wang Machine Learning Basics

Generalization

We care more about the performance of the model on new,

previously unseen examples
The training examples usually cannot cover all the possible input
configurations, so the learner has to generalize from the training
examples to new cases
Generalization error: the expected error over ALL examples
To obtain theoretical guarantees about generalization of a
machine learning algorithm, we assume all the samples are
drawn from a distribution p(x, y ), and calculate generalization
error (GE) of a prediction function f by taking expectation over
p(x, y ) Z
GEf = p(x, y )Error(f (x), y )
x,y
cuhk

Xiaogang Wang Machine Learning Basics

Generalization

However, in practice, p(x, y ) is unknow. We assess the

(test) (test)
generalization performance with a test set {xi , yi }

M
1 X (test) (test)
Performancetest = Error(f (xi ), yi )
M
i=1

We hope that both test examples and training examples are

drawn from p(x, y ) of interest, although it is unknown

cuhk

Xiaogang Wang Machine Learning Basics

Capacity

The ability of the learner (or called model) to discover a function

taken from a family of functions. Examples:
Linear predictor
y = wx + b
Quadratic predictor

y = w2 x 2 + w1 x + b

Degree-10 polynomial predictor

10
X
y =b+ wi x i
i=1

The latter family is richer, allowing to capture more complex

functions
Capacity can be measured by the number of training examples
{x(train)
i , yi(train) } that the learner could always fit, no matter how
cuhk
to change the values of x(train) i and yi(train)

Xiaogang Wang Machine Learning Basics

Underfitting

The learner cannot find a solution that fits training examples well
For example, use linear regression to fit training examples
{x(train)
i , yi(train) } where yi(train) is an quadratic function of x(train)
i

Underfitting means the learner cannot capture some important

aspects of the data
Reasons for underfitting happening
Model is not rich enough
Difficult to find the global optimum of the objective function on the
training set or easy to get stuck at local minimum
Limitation on the computation resources (not enough training
iterations of an iterative optimization procedure)
Underfitting commonly happens in deep learning with large
scale training data and could be even a more serious problem
than overfitting in some cases
cuhk

Xiaogang Wang Machine Learning Basics

Overfitting

The learner fits the training data well, but loses the ability to
generalize well, i.e. it has small training error but larger
generalization error
A learner with large capacity tends to overfit
The family of functions is too large (compared with the size of the
training data) and it contains many functions which all fit the
training data well.
Without sufficient data, the learner cannot distinguish which one is
most appropriate and would make an arbitrary choice among
these apparently good solutions
A separate validation set helps to choose a more appropriate one
In most cases, data is contaminated by noise. The learner with
large capacity tends to describe random errors or noise instead of
the underlying models of data (classes)

cuhk

Xiaogang Wang Machine Learning Basics

Overfitting

cuhk
(Duda et al. Pattern Classification 2000)

Xiaogang Wang Machine Learning Basics

Occam’s Razor

The fundamental element of machine learning is the

trade-off between capacity and generalization
Occam’s Razor states that among competing functions that
could explains the training data, one should choose the “simpler”
one. Simplicity is the opposite of capacity.
Occam’s Razor suggests us pick the family of functions just
enough large enough to leave only one choice that fits well the
data.

cuhk

Xiaogang Wang Machine Learning Basics

Optimal capacity

Difference between training error and generalization error

increases with the capacity of the learner
Generalization error is a U-shaped function of capacity
Optimal capacity capacity is associated with the transition from
underfitting to overfitting
One can use a validation set to monitor generalization error
empirically

Optimal capacity should increase with the number of training

examples

cuhk

Xiaogang Wang Machine Learning Basics

Optimal capacity

Typical relationship between capacity and both training and generalization (or test)
error. As capacity increases, training error can be reduced, but the optimism
(difference between training and generalization error) increases. At some point, the
increase in optimism is larger than the decrease in training error (typically when the
training error is low and cannot go much lower), and we enter the overfitting regime,
where capacity is too large, above the optimal capacity. Before reaching optimal
capacity, we are in the underfitting regime.
cuhk

(Bengio et al. Deep Learning 2014)

Xiaogang Wang Machine Learning Basics

Optimal capacity

As the number of training examples increases, optimal capacity (bold black) increases (we can afford a bigger and
more flexible model), and the associated generalization error (green bold) would decrease, eventually reaching the
(non-parametric) asymptotic error (green dashed line). If capacity was fixed (parametric setting), increasing the
number of training examples would also decrease generalization error (top red curve), but not as fast, and training
error would slowly increase (bottom red curve), so that both would meet at an asymptotic value (dashed red line)
corresponding to the best achievable solution in some class of learned functions. cuhk

(Bengio et al. Deep Learning 2014)

Xiaogang Wang Machine Learning Basics

Exercise question

In the figure above, the training data (10 black dots) were selected from a
quadratic function plus Gaussian noise, i.e., f (x) = w2 x 2 + w1 x 2 + b + where
p() = N(0, σ 2 ). The degree-10 polynomial fits the data perfectly. Which learner
should be chosen in order to better predict new examples? The second-order
function or the 10th degree function?
If the ten training examples were generated from a 10th degree polynomial plus
Gaussian noise, which learned should be chosen?
If the one million training examples were generated from a quadratic function cuhk
plus Gaussian noise, which learned should be chosen?

Xiaogang Wang Machine Learning Basics

How to reduce capacity?

Reduce the number of features

Reduce the number of independent parameters
Reduce the network size of deep models
Reduce the number of training iterations
Add regularization to the learner
...

cuhk

Xiaogang Wang Machine Learning Basics

Curse of dimensionality
Why do we need to reduce the dimensionality of the feature
space?

cuhk

(Duda et al. Pattern Classification 2000)

Xiaogang Wang Machine Learning Basics

Curse of dimensionality

The more training samples in each cell, the more robust the
classifier
The number of cells grows exponentially with the dimensionality
of the feature space. If each dimension is divided into three
intervals, the number of cells is N = 3D
Some cells are empty when the number of cells is very large!

cuhk
(Duda et al. Pattern Classification 2000)

Xiaogang Wang Machine Learning Basics

Regularization

Equivalent to imposing a preference over the set of functions

that a learner can obtain as a solution
In Bayesian learning, it is reflected as a prior probability
distribution over the space of functions (or equivalently their
parameters) that the learn can assess
Regularization prevents overfitting by adding penalty for
complexity
Training a classifier/regressor is to minimize
Prediction error on the training set + regularization

Examples
The objective function for linear regression becomes
1 X t (train)
MSEtrain + regularization = (w xi − yi(train) )2 + λ||w||22
N
i
cuhk
Multi-task learning, transfer learning, dropout, sparsity, pre-training

Xiaogang Wang Machine Learning Basics

Function estimation

We are interested in predicting y from input x and assume there exists

a function that describes the relationship between y and x, e.g.
y = f (x) + , where is random noise following certain distribution.
Prediction function f can be parametrized by a parameter vector θ.
Estimating f̂n from a training set Dn (train)
= {(x1
(train)
, y1
(train)
), . . . , (xn
(train)
, yn )} is
equivalent to estimating θ̂n from Dn .
Since Dn is randomly generated from a underlying distribution,
both θ̂ and f̂ are random variables (or vectors, or functions)
distributed according to some probability distributions.
The quality of estimation can be measured by bias and variance
compared with the “true” parameter vector θ or function f̂
With a better design of the parametric form of the function, the
learner could achieve low generalization error even with small
capacity
This design process typical involves domain knowledge cuhk

Xiaogang Wang Machine Learning Basics

Bias

bias(θ̂) = E(θ̂) − θ
where expectation is over all the train sets of size n sampled from the
underlying distribution
An estimator is called unbiased if E(θ̂) = θ
Example: Gaussian distribution. p(xi ; θ) = N (θ, Σ) and the
Pn
estimator is θ̂ = n1 i=1 x(train)
i

" n
# n n
1 X (train) 1 X h (train) i 1 X
E(θ̂) = E xi = E xi = θ=θ
n n n
i=1 i=1 i=1

cuhk

Xiaogang Wang Machine Learning Basics

Variance

Var[θ̂] = E[(θ̂ − E[θ̂])2 ] = E[θ̂2 ] − E[θ̂]2

Variance typically decreases as the size of the train set

increases
Both bias and variance are the sources of estimation errors

MSE = E[(θ̂ − θ)2 ] = Bias(θ̂)2 + Var[θ̂]

Increasing the capacity of a learner may also increase variance,

although it has better chance to cover the true function

cuhk

Xiaogang Wang Machine Learning Basics

cuhk

Xiaogang Wang Machine Learning Basics

Summary: issues to be concerned in machine learning

Effective optimization methods and models to address the

underfitting problem
How to balance the trade-off between capacity and
generalization?
How to effectively reduce capacity (which means also reducing
estimation variance) without increasing the bias much?
For machine learning with big training data, how to effectively
increase capacity to cover or get closer to the true function to be
estimated?

cuhk

Xiaogang Wang Machine Learning Basics

Open discussion

Why does deep learning have different behavior than other

machine learning methods for large scale training?

cuhk

Xiaogang Wang Machine Learning Basics

Discriminative model

Directly model P(y |x) and decision boundaries

Learn the discriminative functions gk (x)

y = arg max gk (x)

In the linear case, gk (x) = wtk x

P(y |x) can be estimated from the linear discriminant functions
t
ewj x
P(y = j|x) = PK t
k =1 e wk x

It is also called softmax function

Examples: SVM, boosting, K-nearest-neighbor
cuhk

Xiaogang Wang Machine Learning Basics

Discriminative model

It is easier for discriminative models to fit data

cuhk

Xiaogang Wang Machine Learning Basics

Discriminative model

Parameter θ = {wk } can be estimated from maximizing the data

likelihood
n
(train)
Y
θ̂ = arg max P(Dn |θ) = arg max P(yn(train) |xn , θ)
θ θ
i=1

Maximum a posteriori (MAP) estimation

θ = arg max p(θ|Dn ) = arg max log P(Dn |θ) + log p(θ)
θ θ

According to the Bayes’ rule, i.e., p(θ|Dn ) = P(Dn |θ)p(θ)/P(Dn ),

θ = arg max log P(Dn |θ) + log p(θ)
θ

n
X
θ = arg max log P(yn(train) |x(train)
n , θ) + log p(θ)
θ
i=1

prior p(θ) corresponds to a regularizer, e.g.

2 cuhk
p(θ) = e−λ||θ||

Xiaogang Wang Machine Learning Basics

Generative model

Estimate the underlying class conditional probability densities

p(x|y ) and P(y ) are parameterized by θ

Prior P(y ) can be used to model the dependency among predictions,
such as the segmentation labels of pixels or predictions of speech
sequences.
It is more difficult to model class conditional probability densities.
However, it also adds stronger regularization to model fitting, since the
learned model not only needs to predict class labels but also generate
the input data.
It is easier to add domain knowledge when desgining the models of
p(x|y ) cuhk

Xiaogang Wang Machine Learning Basics

Supervised and unsupervised learning

Supervised learning: the goal is to use input-label pairs, (x, y ) to

learn a function f that predicts a label (or a distribution over
labels) given the input, ŷ = f (x)
Unsupervised learning: no label or other target is provided. The
data consists of a set of examples x and the objective is to learn
about the statistical structure of x itself.
Weakly supervised learning: the training data contains (x, y )
pairs as in supervised learning, but the labels y are either
unreliably present (i.e. with missing values) or noisy (i.e. where
the label given is not the true label)

cuhk

Xiaogang Wang Machine Learning Basics

Unsupervised learning

Find the “best” representation of data that reserves as much

information about x as possible while being “simpler” than x
Taking linear case as an example
0
d
X
x̃ = a0 + ai ei
i=1

Lower dimensional representation: d 0 < d

Sparse representation: the number of non-zero ai is small
Independent representation: disentangle the sources of variations
underlying the data distributions such that the dimensions of the
representation are statistically independent, i.e. ai and aj are
statistically independent

Deep learning is to learn data representation, but in a nonlinear

and hierarchical way cuhk

Xiaogang Wang Machine Learning Basics

Principal Component Analysis (PCA)
There are n d−dimensional samples x1 , . . . , xn .
PCA seeks a principal subspace spanned by d 0 (d 0 < d)
orthonormal vectors e1 , . . . , ed 0 , such that
Pd 0
the projected samples (x̃k = a0 + i=1 aki ei ) onto this
subspace has maximum variance; or equivalently
the mean squared distance between the samples and their
projections are minimized.
The projections {aki } are uncorrelated (i.e. independent if
data distribution is assumed as Gaussian distribution)

cuhk

(Bengio et al. Deep Learning 2014)

Xiaogang Wang Machine Learning Basics

Formulation of PCA

n
kx̃k − x̄k2
X
arg max
{ei }
k =1
or
n
kxk − x̃k k2
X
arg min
{ei }
k =1

(kxk − x̄k = kxk − x̃k k + kx̃k − x̄k2 )

2 2

cuhk

Xiaogang Wang Machine Learning Basics

Zero- and One-Dimensional Representations by PCA

zero-dimensional representation: use a single vector x0 to

represent all the samples and minimize the squared-error
function
n
X
J0 (x0 ) = kx0 − xk k2
k =1
1 Pn
The solution is x0 = x̄ = n k =1 xk
One-dimensional representation: project the data to a line
running through the sample mean, x̃k = x̄ + ak 1 e1 and
minimize the squared-error criterion function
n
kx̃k − xk k2
X
J1 (a11 , . . . , an1 , e1 ) =
k =1
cuhk

Xiaogang Wang Machine Learning Basics

Find the Principal Components ak 1

J1 (a11 , . . . , an1 , e1 )
Xn n
X
2
= k(x̄ + ak 1 e1 ) − xk k = kak 1 e1 − (xk − x̄)k2
k =1 k =1
X n n
X n
X
= ak21 ke1 k2 − 2 ak 1 et (xk − x̄) + kxk − x̄k2
k =1 k =1 k =1

∂J1
Since e1 is a unit vector, ke1 k = 1. To minimize J1 , set ∂ak 1 =0
and we have
ak 1 = et1 (xk − x̄)
We obtain a least-squares solution by projecting the vector
xk onto the line in the direction of e1 passing through the
mean.
cuhk

Xiaogang Wang Machine Learning Basics

Find the Optimal Projection Direction e1

n
X n
X n
X
J1 (e1 ) = ak21 −2 ak21 + kxk − x̄k2
k =1 k =1 k =1
n n
X 2 X
et1 (xk − x̄) kxk − x̄k2

= − +
k =1 k =1
X n n
X
= − et1 (xk − x̄)(xk − x̄)t e1 + kxk − x̄k2
k =1 k =1
n
X
= −et1 Se1 + kxk − x̄k2
k =1

cuhk

Xiaogang Wang Machine Learning Basics

Find the Optimal Projection Direction e1

Pn t
S= k =1 (xk − x̄)(xk − x̄) is the scatter matrix
et1 Se1 = nk =1 ak21 ( nk =1 ak 1 = 0) is the variance
P P
of the
projected data
The vector e1 that minimizes J1 also maximizes et1 Se1 ,
subject to the constraint that ke1 k = 1

cuhk

Xiaogang Wang Machine Learning Basics

Lagrange Optimization

Seek the position x0 of an extremum of a scalar-valued

function f (x) subject to the constrain that g(x) = 0
First from the Lagrangian function

L(x, λ) = f (x) + λg(x)

where λ is a scalar called the Lagrange undetermined

multiplier.
Convert into an unconstrained problem by taking the
derivative,
∂L(x, λ) ∂f (x) ∂g(x)
= +λ =0
∂x ∂x ∂x
Solve x and λ considering g(x) = 0 cuhk

Xiaogang Wang Machine Learning Basics

Find the Optimal Projection Direction e1

Use the method of Lagrange multipliers to maximize the

et1 Se1 subject to the constraint that ke1 k = 1,

u = et1 Se1 − λ(et1 e1 − 1),

∂u
= 2Se1 − 2λe1 = 0.
∂e1
Setting the gradient vector equal to zero, we see that e1
must be an eigenvector of the scatter matrix

Se1 = λe1

Since et1 Se1 = λet1 e1 = λ, to maximize et1 Se1 , we select

the eigenvector with the largest eigenvalue
cuhk

Xiaogang Wang Machine Learning Basics

d 0 -Dimensional Representation by PCA

Pd 0
d 0 −dimensional representation: x̃k = x̄ + i=1 aki ei
Mean-squared criterion function:

n d0
! 2
X X
Jd 0 = x̄ + aki ei − xk

k =1 i=1

Define additional principal components in an incremental

fashion by choosing each new direction minimizing J
amongst all possible directions orthogonal to those already
considered
To minimize Jd 0 , e1 , . . . , ed 0 are the d 0 eigenvectors of the
scatter matrix with the largest eigenvalues. aki are the
principal components of samples.
cuhk

Xiaogang Wang Machine Learning Basics

d 0 -Dimensional Representation by PCA

Since the scatter matrix is real and symmetric, its

eigenvectors are orthogonal and its eigenvalues are
nonnegative.
The squared error:
d 0 n
X X
Jd 0 = − λi + kxk − x̄k2
i=1 k =1

n
X d
X
2
Jd = 0 ⇒ kxk − x̄k = λi
k =1 i=1
d
X
Jd 0 = λi
i=d 0 +1
cuhk

Xiaogang Wang Machine Learning Basics

Variance of Data Captured by the PCA Subspace

data projected onto the first d 0

The variance ofP
0
eigenvectors is di=1 λi
Measure how much variance has been captured by the
first d 0 eigenvectors:
Pd 0
i=1 λi
Pd
j=1 λj

cuhk

Xiaogang Wang Machine Learning Basics

Covariance Matrix of Principal Components

The correlation between projections on ei and ej (i 6= j) is

n
X n
X
aki akj = eti (xk − x̄)(xk − x̄)t ej = eti Sej = λi eti ej = 0
k =1 k =1

The covariance matrix of samples in the PCA subspace is

diag[λ1 , . . . , λd 0 ]

cuhk

Xiaogang Wang Machine Learning Basics

Summary of PCA

The principal subspace is spanned by d 0 orthonormal vectors

e1 , . . . , ed 0 which are computed as the d 0 eigenvectors of the
scatter matrix S with the largest eigenvalues λ1 , . . . , λd 0 .
The principal components aki of the samples are computed as
aki = eti (xk − x̄)
The variance of the projected samples onto this principal
Pd 0
subspace is i=1 = λi .
The mean squared disance between the samples and their
Pd
projections are i=d 0 +1 = λi .
PCA disentangles the factors of variation underlying the data,
assuming such variation is a Gaussian distribution
We are interested in learning representations that disentangle
more complicated forms of feature dependencies
cuhk

Xiaogang Wang Machine Learning Basics

Smoothness Prior

Shallow models assume smoothness prior on the prediction function to

be learned, i.e.
f ∗ (x) ≈ f ∗ (x + )
where is a small change.
K-nearest neighbor predictors assume piecewise constant
For classification and K = 1, f (x) is the output class associated with the
nearest neighbor of x in the training set
For regression, f (x) is the average of the outputs associated with the K
nearest neighbors of x
The number of distinguishable regions cannot be more than the number of
training examples

cuhk

Xiaogang Wang Machine Learning Basics

Interpolation with Kernel

n
X
f (x) = b + αi K (x, xi )
i=1
K is a kernel function, e.g., the Gaussian kernel
K (u, v) = N(u − v; 0, σ 2 I)

b and αi can be learned by SVM

Treat each xi is a template and the kernel function as a similarity
function that matches a template and a test example

cuhk

Xiaogang Wang Machine Learning Basics

Local Representation
One can think of the training samples as control knots which locally
specify the shape of the prediction function
The smoothness prior only allows the learner to generalize locally. If
(xi , yi ) is a supervised training example and xi is a near neighbor of x,
we expect that f (x) ≈ yi . Better generalization can be obtained with
more neighbors.
To distinguish O(N) regions in the input space, shallow models require
O(N) examples (and typically there are O(N) parameters associated
with the O(N) regions).

cuhk

Xiaogang Wang Machine Learning Basics

Local Representation

If the function is complex, more regions and more training

samples are required.
The representation learned by deep models can be generalized
non-locally

cuhk

Xiaogang Wang Machine Learning Basics

Reference

Yoshua Bengio, Ian Goodfellow and Aaron Courville, Chapter 1

“Machine Learning Basics” in “Deep Learning,” Book in
preparation for MIT Press, 2014.
http://www.iro.umontreal.ca/ bengioy/dlbook

cuhk

Xiaogang Wang Machine Learning Basics

CFA L1 2023 Portfolio Ethics Fintree JuiceNotes
100% (1)
CFA L1 2023 Portfolio Ethics Fintree JuiceNotes
88 pages
3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
Solution To ch5 Physics
75% (8)
Solution To ch5 Physics
15 pages
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
No ratings yet
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
29 pages
Machine Learning - Home - Week 2 - Notes - Coursera
No ratings yet
Machine Learning - Home - Week 2 - Notes - Coursera
10 pages
SinhaDu16 PDF
No ratings yet
SinhaDu16 PDF
20 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Bản sao của softmax_regression.ipynb - Colab
No ratings yet
Bản sao của softmax_regression.ipynb - Colab
6 pages
10.1.1.92.623
No ratings yet
10.1.1.92.623
11 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Lecture slides - Linear Regression (2025)
No ratings yet
Lecture slides - Linear Regression (2025)
45 pages
Linear-Regression 231212 072619
No ratings yet
Linear-Regression 231212 072619
13 pages
ML LW 6 Kernel SVM
No ratings yet
ML LW 6 Kernel SVM
4 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
05_lecturenote_NB
No ratings yet
05_lecturenote_NB
10 pages
practicalMachineLearning_lecture3
No ratings yet
practicalMachineLearning_lecture3
25 pages
hw3_red
No ratings yet
hw3_red
4 pages
ANoteon Krigingand Gaussian Processes
No ratings yet
ANoteon Krigingand Gaussian Processes
6 pages
Learning 2
No ratings yet
Learning 2
104 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Lecture 5_Logistic Regression (1)
No ratings yet
Lecture 5_Logistic Regression (1)
28 pages
Lecture 1: Introduction To Uncertainty Quantification: Today
No ratings yet
Lecture 1: Introduction To Uncertainty Quantification: Today
12 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
Homework 4
0% (1)
Homework 4
4 pages
Introduction To Machine Learning: Workshop On Machine Learning For Intelligent Image Processing
No ratings yet
Introduction To Machine Learning: Workshop On Machine Learning For Intelligent Image Processing
44 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
Test 1 Week 3
No ratings yet
Test 1 Week 3
3 pages
unit 4 regression
No ratings yet
unit 4 regression
26 pages
Multi-Classification by Using Tri-Class SVM
No ratings yet
Multi-Classification by Using Tri-Class SVM
13 pages
Training Data Selection For Support Vector Machine
No ratings yet
Training Data Selection For Support Vector Machine
11 pages
Lecture 04
No ratings yet
Lecture 04
46 pages
Imp Document3
No ratings yet
Imp Document3
6 pages
midterm2008f_sol
No ratings yet
midterm2008f_sol
12 pages
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
No ratings yet
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
6 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
Apprenticeship Learning For Motion Planning With Application To Parking Lot Navigation
No ratings yet
Apprenticeship Learning For Motion Planning With Application To Parking Lot Navigation
8 pages
Detection of Temporal Bone Abnormalities Using Hybrid Wavelet Support Vector Machine Classification
No ratings yet
Detection of Temporal Bone Abnormalities Using Hybrid Wavelet Support Vector Machine Classification
6 pages
Sam HW2
No ratings yet
Sam HW2
4 pages
Stanford University CS 229, Autumn 2015 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2015 Midterm Examination
25 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Week 4
No ratings yet
Week 4
3 pages
Learning-Demo
No ratings yet
Learning-Demo
7 pages
CRF_Laura_Kallmeyer
No ratings yet
CRF_Laura_Kallmeyer
21 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Denoising Autoencoders tr1316
No ratings yet
Denoising Autoencoders tr1316
16 pages
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
No ratings yet
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
21 pages
Support Vector Machine Classification Algorithm and Its Application
No ratings yet
Support Vector Machine Classification Algorithm and Its Application
8 pages
Detailed Sigmoid and Softmax Activation Function
No ratings yet
Detailed Sigmoid and Softmax Activation Function
5 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Lecture 2b
No ratings yet
Lecture 2b
45 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
1 ModuleEcontent - Session5
No ratings yet
1 ModuleEcontent - Session5
24 pages
Lecture 14: Kernels — Applied ML
No ratings yet
Lecture 14: Kernels — Applied ML
14 pages
lec4 (1)
No ratings yet
lec4 (1)
24 pages
AI Lec2.1 MLsupervised
No ratings yet
AI Lec2.1 MLsupervised
21 pages
Devoir 1
No ratings yet
Devoir 1
6 pages
Introduction To Pattern Recognition: Vojtěch Franc
No ratings yet
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Numbers 1 - 27th July
No ratings yet
Numbers 1 - 27th July
30 pages
Using Algebraic Identities Summary
No ratings yet
Using Algebraic Identities Summary
5 pages
FRM Part 1 Changes LO Wise
No ratings yet
FRM Part 1 Changes LO Wise
7 pages
Admission Schedule For MGB & GMBA Program - 2019 Intakes
No ratings yet
Admission Schedule For MGB & GMBA Program - 2019 Intakes
1 page
Aditya Choudhary 4.2 Yrs Software & Mobile App QA Engineer
No ratings yet
Aditya Choudhary 4.2 Yrs Software & Mobile App QA Engineer
3 pages
Most Common GMAT Study Plans Used by GMAT Club Members
No ratings yet
Most Common GMAT Study Plans Used by GMAT Club Members
54 pages
A Variational U-Net For Conditional Appearance and Shape Generation
No ratings yet
A Variational U-Net For Conditional Appearance and Shape Generation
21 pages
Sample Survey
No ratings yet
Sample Survey
7 pages
APPLIED THERMODYNAMICS 18ME42 Module 04 Question No 7a-7b
No ratings yet
APPLIED THERMODYNAMICS 18ME42 Module 04 Question No 7a-7b
27 pages
126AG052016
No ratings yet
126AG052016
2 pages
Detector System in CT
100% (1)
Detector System in CT
11 pages
Lab 1: SNS: "Craters in Sand"
No ratings yet
Lab 1: SNS: "Craters in Sand"
7 pages
STD-12 MATHS ANSWER KEY
No ratings yet
STD-12 MATHS ANSWER KEY
11 pages
Chapter 9 Physics FSC
No ratings yet
Chapter 9 Physics FSC
4 pages
Ch. 7 Assignment - Word Problems and Solving
No ratings yet
Ch. 7 Assignment - Word Problems and Solving
3 pages
Nasa - TM X 74335 PDF
No ratings yet
Nasa - TM X 74335 PDF
241 pages
Biomed Clear Sterilization Results-5
No ratings yet
Biomed Clear Sterilization Results-5
6 pages
Introducción A La Cinética Enzimática
No ratings yet
Introducción A La Cinética Enzimática
29 pages
Lesson 1 - Introduction To Pythagoras (Twinkl)
No ratings yet
Lesson 1 - Introduction To Pythagoras (Twinkl)
17 pages
Experiment No (9) Centroid and Center of Gravity
No ratings yet
Experiment No (9) Centroid and Center of Gravity
7 pages
1121214-Electrostatic Potential and Capacitance
No ratings yet
1121214-Electrostatic Potential and Capacitance
21 pages
Case Hole & Production Log Evaluation PDF
No ratings yet
Case Hole & Production Log Evaluation PDF
34 pages
Well Logging
No ratings yet
Well Logging
20 pages
structures-ii-notes
No ratings yet
structures-ii-notes
53 pages
Fluid: Flowing Deforms Continuously External Shearing Force
No ratings yet
Fluid: Flowing Deforms Continuously External Shearing Force
33 pages
Massively Parallel Amplitude-Only Fourier Neural Network
No ratings yet
Massively Parallel Amplitude-Only Fourier Neural Network
8 pages
Lesson 3 - Displacement-Time Graph
No ratings yet
Lesson 3 - Displacement-Time Graph
6 pages
Ferroresonance in Voltage Transformers Analysis An
No ratings yet
Ferroresonance in Voltage Transformers Analysis An
8 pages
Emulsion Notes Pharmaceutics B. Pharmacy Handwritten Notes
86% (7)
Emulsion Notes Pharmaceutics B. Pharmacy Handwritten Notes
3 pages
Seat of The Soul
100% (3)
Seat of The Soul
105 pages
Practical Task Edm Djj40142 Sesi 1 2021-2022
No ratings yet
Practical Task Edm Djj40142 Sesi 1 2021-2022
12 pages
WSS M1a365 A12
No ratings yet
WSS M1a365 A12
9 pages
Signed Off - Earth and Life Science11 - q1 - m1 - Origin and Structure of Earth - v3
No ratings yet
Signed Off - Earth and Life Science11 - q1 - m1 - Origin and Structure of Earth - v3
46 pages
Antennas and Propagation (Ec601 PC)
No ratings yet
Antennas and Propagation (Ec601 PC)
22 pages
2016-A Benchmark Case For Aerodynamics and Aeroacoustics of ...
No ratings yet
2016-A Benchmark Case For Aerodynamics and Aeroacoustics of ...
7 pages
Mohammed and Barbosa - Numerical Modeling Strategy For The Simulation of Nonlinear Response of Slender Reinforced Concrete Structural Walls
No ratings yet
Mohammed and Barbosa - Numerical Modeling Strategy For The Simulation of Nonlinear Response of Slender Reinforced Concrete Structural Walls
45 pages
Section 1
No ratings yet
Section 1
3 pages