0% found this document useful (0 votes)

22 views

Kernel

The document discusses work analyzing the invariance and stability of deep convolutional neural network representations to deformations. Specifically, it examines whether CNN representations are stable when the input signal is modified, how CNNs can achieve invariance to transformation groups, and if CNNs preserve essential signal information. The work builds functional spaces for CNNs and proposes a measure of model complexity to analyze learning aspects of CNNs.

Uploaded by

Edgar Marca

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Kernel

Uploaded by

Edgar Marca

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Invariance and Stability to Deformations

of Deep Convolutional Representations

Julien Mairal
Inria Grenoble

ML and AI workshop, Telecom ParisTech, 2018

Julien Mairal Invariance and stability of DL 1/35

This is mostly the work of Alberto Bietti

A. Bietti and J. Mairal. Group Invariance, Stability to

Deformations, and Complexity of Deep Convolutional
Representations. arXiv:1706.03078. 2018.
A. Bietti and J. Mairal. Invariance and Stability of Deep
Convolutional Representations. NIPS. 2017.

Julien Mairal Invariance and stability of DL 2/35

Learning a predictive model
The goal is to learn a prediction function f : Rp → R given labeled
training data (xi , yi )i=1,...,n with xi in Rp , and yi in R:
n
1X
min L(yi , f (xi )) + λΩ(f ) .
f ∈F n | {z }
| i=1 {z } regularization
empirical risk, data fit

Julien Mairal Invariance and stability of DL 3/35

Objectives
Deep convolutional signal representations
Are they stable to deformations?
How can we achieve invariance to transformation groups?
Do they preserve signal information?

Learning aspects
Building a functional space for CNNs (or similar objects).
Paradigm 3: Deep Kernel Machines
Deriving a measure of model complexity.
A quick zoom on convolutional neural networks

still involves the ERM problem

Julien Mairal Invariance and stability of DL 4/35
A kernel perspective
n
1X
min L(yi , f (xi )) + λkf k2H .
f ∈H n
i=1

map data to a Hilbert space (RKHS) and work with linear forms:

Φ:X →H and f (x) = hΦ(x), f iH .

ϕ
6
......................
...............
......... .....................................
........................
.................. ...........
.............. ......
.................................
.........
..................... z
......

....
.
..
....
.....
.......
.......
X ...................
...
...
...
..
...
...
.......
H
x zx
... ... .....
... .. ........................................ .. ............................................
... .................. ..
......................................... ..........................................
.........................................
.. ...................... .. ..................................
................................

z x
.. .. .....................................
.. .. ...................
*

... ...

x
... .. .............. ............................................ .
.. ... ............................. ............................................
... ......................... ........................ ...
.. .........................................

.......... x
...

x
....
...
... .....................
.....
I ....
.. -
Q
x
... ..
......Q
s
......................................:
.. .
.... ..
.... ..................................

:x
... ................................... ..................................
.... ........................................... ...........................................
.... ................ ...... ....... .......................
.... ...

........................
...... ...........................
................ ..
. ................
.. . ..........................
..................................
. .........................................
... ........................................
.. ..
............................................ ....... ...................................
... ...
... ...
...... ....
..........
.............. .......
..................... ......
................................. ..............
.. .................................

Julien Mairal Invariance and stability of DL 5/35

A kernel perspective
n
1X
min L(yi , f (xi )) + λkf k2H .
f ∈H n
i=1

Main purpose: embed data in a vectorial space where

many geometrical operations exist (angle computation, projection
on linear subspaces, definition of barycenters....).
one may learn potentially rich infinite-dimensional models.
regularization is natural:

|f (x) − f (x0 )| ≤ kf kH kΦ(x) − Φ(x0 )kH .

Julien Mairal Invariance and stability of DL 6/35

A kernel perspective
Second purpose: unhappy with the current Euclidean structure?
lift data to a higher-dimensional space with nicer properties (e.g.,
linear separability, clustering structure).
then, the linear form f (x) = hΦ(x), f iH in H may correspond to a
non-linear model in X .
x1 x1 2

x2
R
x2 2

Julien Mairal Invariance and stability of DL 7/35

A kernel perspective
Recipe
Map data x to high-dimensional space, Φ(x) in H (RKHS), with
Hilbertian geometry (projections, barycenters, angles, . . . , exist!).
predictive models f in H are linear forms in H: f (x) = hf, Φ(x)iH .
Learning with a positive definite kernel K(x, x0 ) = hΦ(x), Φ(x0 )iH .

[Schölkopf and Smola, 2002, Shawe-Taylor and Cristianini, 2004]...

Julien Mairal Invariance and stability of DL 8/35

What is the relation with deep neural networks?

It is possible to design a RKHS H where a large class of deep neural
networks live [Mairal, 2016].

f (x) = σk (Wk σk–1 (Wk–1 . . . σ2 (W2 σ1 (W1 x)) . . .)) = hf, Φ(x)iH .

This is the construction of “convolutional kernel networks”.

[Schölkopf and Smola, 2002, Shawe-Taylor and Cristianini, 2004]...

Julien Mairal Invariance and stability of DL 8/35

Why do we care?
Φ(x) is related to the network architecture and is independent
of training data. Is it stable? Does it lose signal information?
f is a predictive model. Can we control its stability?

|f (x) − f (x0 )| ≤ kf kH kΦ(x) − Φ(x0 )kH .

kf kH controls both stability and generalization!

Julien Mairal Invariance and stability of DL 8/35

A signal processing perspective
plus a bit of harmonic analysis

Consider images defined on a continuous domain Ω = Rd .

τ : Ω → Ω: C 1 -diffeomorphism.
Lτ x(u) = x(u − τ (u)): action operator.
Much richer group of transformations than translations.
Translations and Defo
• Patterns are translated and deformed

Invarian
Two dim

[Mallat, 2012, Allassonnière, Amit, and Trouvé, 2007, Trouvé and Younes, 2005]...

Julien Mairal Invariance and stability of DL 9/35

A signal processing perspective
plus a bit of harmonic analysis

Consider images defined on a continuous domain Ω = Rd .

τ : Ω → Ω: C 1 -diffeomorphism.
Lτ x(u) = x(u − τ (u)): action operator.
Much richer group of transformations thanTranslations
translations. and Defo
• Patterns are translated and deformed

Invarian
Two dim

Relation with deep convolutional representations

Stability to deformations studied for wavelet-based scattering transform.
[Mallat, 2012, Bruna and Mallat, 2013, Sifre and Mallat, 2013]...

Julien Mairal Invariance and stability of DL 9/35

A signal processing perspective
plus a bit of harmonic analysis

Consider images defined on a continuous domain Ω = Rd .

τ : Ω → Ω: C 1 -diffeomorphism.
Lτ x(u) = x(u − τ (u)): action operator.
Much richer group of transformations than translations.

Definition of stability
Representation Φ(·) is stable [Mallat, 2012] if:

kΦ(Lτ x) − Φ(x)k ≤ (C1 k∇τ k∞ + C2 kτ k∞ )kxk.

k∇τ k∞ = supu k∇τ (u)k controls deformation.

kτ k∞ = supu |τ (u)| controls translation.
C2 → 0: translation invariance.

Julien Mairal Invariance and stability of DL 9/35

Summary of our results
Multi-layer construction of the RKHS H
Contains CNNs with smooth homogeneous activations functions.

Julien Mairal Invariance and stability of DL 10/35

Summary of our results
Multi-layer construction of the RKHS H
Contains CNNs with smooth homogeneous activations functions.

Signal representation
Signal preservation of the multi-layer kernel mapping Φ.
Conditions of non-trivial stability for Φ.
Constructions to achieve group invariance.

Julien Mairal Invariance and stability of DL 10/35

Summary of our results
Multi-layer construction of the RKHS H
Contains CNNs with smooth homogeneous activations functions.

Signal representation
Signal preservation of the multi-layer kernel mapping Φ.
Conditions of non-trivial stability for Φ.
Constructions to achieve group invariance.

On learning
Bounds on the RKHS norm k.kH to control stability and
generalization of a predictive model f .

|f (x) − f (x0 )| ≤ kf kH kΦ(x) − Φ(x0 )kH .

Julien Mairal Invariance and stability of DL 10/35

Outline

1 Construction of the multi-layer convolutional representation

2 Invariance and stability

3 Learning aspects: model complexity

Julien Mairal Invariance and stability of DL 11/35

A generic deep convolutional representation
Initial map x0 in L2 (Ω, H0 )
x0 : Ω → H0 : continuous input signal
u ∈ Ω = Rd : location (d = 2 for images).
x0 (u) ∈ H0 : input value at location u (H0 = R3 for RGB images).

Julien Mairal Invariance and stability of DL 12/35

Building map xk in L2 (Ω, Hk ) from xk−1 in L2 (Ω, Hk−1 )

xk : Ω → Hk : feature map at layer k

Pk xk−1 .

Pk : patch extraction operator, extract small patch of feature map

xk−1 around each point u (Pk xk−1 (u) is a patch centered at u).

Julien Mairal Invariance and stability of DL 12/35

Building map xk in L2 (Ω, Hk ) from xk−1 in L2 (Ω, Hk−1 )

xk : Ω → Hk : feature map at layer k

Mk Pk xk−1 .

Pk : patch extraction operator, extract small patch of feature map

xk−1 around each point u (Pk xk−1 (u) is a patch centered at u).
Mk : non-linear mapping operator, maps each patch to a new
Hilbert space Hk with a pointwise non-linear function ϕk (·).

Julien Mairal Invariance and stability of DL 12/35

Building map xk in L2 (Ω, Hk ) from xk−1 in L2 (Ω, Hk−1 )

xk : Ω → Hk : feature map at layer k

xk = Ak Mk Pk xk−1 .

Pk : patch extraction operator, extract small patch of feature map

xk−1 around each point u (Pk xk−1 (u) is a patch centered at u).
Mk : non-linear mapping operator, maps each patch to a new
Hilbert space Hk with a pointwise non-linear function ϕk (·).
Ak : (linear) pooling operator at scale σk .
Julien Mairal Invariance and stability of DL 12/35
A generic deep convolutional representation

xk := Ak Mk Pk xk–1 : Ω → Hk

xk (w) = Ak Mk Pk xk–1 (w) ∈ Hk

linear pooling

Mk Pk xk–1 : Ω → Hk Mk Pk xk–1 (v) = ϕk (Pk xk–1 (v)) ∈ Hk

kernel mapping

Pk xk–1 (v) ∈ Pk (patch extraction)

xk–1 (u) ∈ Hk–1 xk–1 : Ω → Hk–1

Julien Mairal Invariance and stability of DL 13/35

Patch extraction operator Pk

Sk
Pk xk–1 (u) := (v ∈ Sk 7→ xk–1 (u + v)) ∈ Pk = Hk–1 .

Pk xk–1 (v) ∈ Pk (patch extraction)

xk–1 (u) ∈ Hk–1 xk–1 : Ω → Hk–1

Sk : patch shape, e.g. box.

Pk is linear, and preserves the norm: kPk xk–1 k = kxk–1 k.
R
Norm of a map: kxk2 = Ω kx(u)k2 du < ∞ for x in L2 (Ω, H).

Julien Mairal Invariance and stability of DL 14/35

Non-linear pointwise mapping operator Mk

Mk Pk xk–1 (u) := ϕk (Pk xk–1 (u)) ∈ Hk .

Mk Pk xk–1 : Ω → Hk Mk Pk xk–1 (v) = ϕk (Pk xk–1 (v)) ∈ Hk

non-linear mapping

Pk xk–1 (v) ∈ Pk
xk–1 : Ω → Hk–1

Julien Mairal Invariance and stability of DL 15/35

Non-linear pointwise mapping operator Mk

Mk Pk xk–1 (u) := ϕk (Pk xk–1 (u)) ∈ Hk .

ϕk : Pk → Hk pointwise non-linearity on patches.

We assume non-expansivity

kϕk (z)k ≤ kzk and kϕk (z) − ϕk (z 0 )k ≤ kz − z 0 k.

Mk then satisfies, for x, x0 ∈ L2 (Ω, Pk )

kMk xk ≤ kxk and kMk x − Mk x0 k ≤ kx − x0 k.

Julien Mairal Invariance and stability of DL 16/35

ϕk from kernels
Kernel mapping of homogeneous dot-product kernels:

hz, z 0 i
Kk (z, z 0 ) = kzkkz 0 kκk = hϕk (z), ϕk (z 0 )i.
kzkkz 0 k
P
κk (u) = ∞ j
j=0 bj u with bj ≥ 0, κk (1) = 1.
kϕk (z)k = Kk (z, z)1/2 = kzk (norm preservation).
0 0 0
kϕk (z) − ϕk (z )k ≤ kz − z k if κk (1) ≤ 1 (non-expansiveness).

Julien Mairal Invariance and stability of DL 17/35

Examples
0 1 0 2
κexp (hz, z 0 i) = ehz,z i−1 = e− 2 kz−z k (if kzk = kz 0 k = 1).
1
κinv-poly (hz, z 0 i) = 2−hz,z 0 i .

[Schoenberg, 1942, Scholkopf, 1997, Smola et al., 2001, Cho and Saul, 2010, Zhang
et al., 2016, 2017, Daniely et al., 2016, Bach, 2017, Mairal, 2016]...

Julien Mairal Invariance and stability of DL 17/35

Pooling operator Ak
Z
xk (u) = Ak Mk Pk xk–1 (u) = hσk (u − v)Mk Pk xk–1 (v)dv ∈ Hk .
Rd

xk := Ak Mk Pk xk–1 : Ω → Hk xk (w) = Ak Mk Pk xk–1 (w) ∈ Hk

linear pooling

Mk Pk xk–1 : Ω → Hk

xk–1 : Ω → Hk–1

Julien Mairal Invariance and stability of DL 18/35

Pooling operator Ak
Z
xk (u) = Ak Mk Pk xk–1 (u) = hσk (u − v)Mk Pk xk–1 (v)dv ∈ Hk .
Rd

hσk : pooling filter at scale σk .

hσk (u) := σk−d h(u/σk ) with h(u) Gaussian.
linear, non-expansive operator: kAk k ≤ 1 (operator norm).

Julien Mairal Invariance and stability of DL 18/35

Recap: Pk , Mk , Ak

xk := Ak Mk Pk xk–1 : Ω → Hk

xk (w) = Ak Mk Pk xk–1 (w) ∈ Hk

linear pooling

Mk Pk xk–1 : Ω → Hk Mk Pk xk–1 (v) = ϕk (Pk xk–1 (v)) ∈ Hk

kernel mapping

Pk xk–1 (v) ∈ Pk (patch extraction)

xk–1 (u) ∈ Hk–1 xk–1 : Ω → Hk–1

Julien Mairal Invariance and stability of DL 19/35

Multilayer construction
Assumption on x0
x0 is typically a discrete signal aquired with physical device.
Natural assumption: x0 = A0 x, with x the original continuous
signal, A0 local integrator with scale σ0 (anti-aliasing).

Julien Mairal Invariance and stability of DL 20/35

Multilayer representation

Φn (x) = An Mn Pn An−1 Mn−1 Pn−1 · · · A1 M1 P1 x0 ∈ L2 (Ω, Hn ).

Sk , σk grow exponentially in practice (i.e., fixed with subsampling).

Julien Mairal Invariance and stability of DL 20/35

Multilayer representation

Φn (x) = An Mn Pn An−1 Mn−1 Pn−1 · · · A1 M1 P1 x0 ∈ L2 (Ω, Hn ).

Sk , σk grow exponentially in practice (i.e., fixed with subsampling).

Prediction layer
e.g., linear f (x) = hw, Φn (x)i.
R
“linear kernel” K(x, x0 ) = hΦn (x), Φn (x0 )i = 0
Ω hxn (u), xn (u)idu.

Julien Mairal Invariance and stability of DL 20/35

Outline

1 Construction of the multi-layer convolutional representation

2 Invariance and stability

3 Learning aspects: model complexity

Julien Mairal Invariance and stability of DL 21/35

Invariance, definitions
τ : Ω → Ω: C 1 -diffeomorphism with Ω = Rd .
Lτ x(u) = x(u − τ (u)): action operator.
Much richer group of transformations than translations.
Translations and Defo
• Patterns are translated and deformed

Invarian
Two dim

[Mallat, 2012, Bruna and Mallat, 2013, Sifre and Mallat, 2013]...

Julien Mairal Invariance and stability of DL 22/35

Invariance, definitions
τ : Ω → Ω: C 1 -diffeomorphism with Ω = Rd .
Lτ x(u) = x(u − τ (u)): action operator.
Much richer group of transformations than translations.

Definition of stability
Representation Φ(·) is stable [Mallat, 2012] if:

kΦ(Lτ x) − Φ(x)k ≤ (C1 k∇τ k∞ + C2 kτ k∞ )kxk.

k∇τ k∞ = supu k∇τ (u)k controls deformation.

kτ k∞ = supu |τ (u)| controls translation.
C2 → 0: translation invariance.

[Mallat, 2012, Bruna and Mallat, 2013, Sifre and Mallat, 2013]...

Julien Mairal Invariance and stability of DL 22/35

Warmup: translation invariance
Representation
M
Φn (x) = An Mn Pn An–1 Mn–1 Pn–1 · · · A1 M1 P1 A0 x.

How to achieve translation invariance?

Translation: Lc x(u) = x(u − c).

Julien Mairal Invariance and stability of DL 23/35

Warmup: translation invariance
Representation
M
Φn (x) = An Mn Pn An–1 Mn–1 Pn–1 · · · A1 M1 P1 A0 x.

How to achieve translation invariance?

Translation: Lc x(u) = x(u − c).
Equivariance - all operators commute with Lc : Lc = Lc .

kΦn (Lc x) − Φn (x)k = kLc Φn (x) − Φn (x)k

≤ kLc An − An k · kMn Pn Φn–1 (x)k
≤ kLc An − An kkxk.

Julien Mairal Invariance and stability of DL 23/35

Warmup: translation invariance
Representation
M
Φn (x) = An Mn Pn An–1 Mn–1 Pn–1 · · · A1 M1 P1 A0 x.

How to achieve translation invariance?

Translation: Lc x(u) = x(u − c).
Equivariance - all operators commute with Lc : Lc = Lc .

kΦn (Lc x) − Φn (x)k = kLc Φn (x) − Φn (x)k

≤ kLc An − An k · kMn Pn Φn–1 (x)k
≤ kLc An − An kkxk.
C2
Mallat [2012]: kLτ An − An k ≤ σn kτ k∞ (operator norm).

Julien Mairal Invariance and stability of DL 23/35

Warmup: translation invariance
Representation
M
Φn (x) = An Mn Pn An–1 Mn–1 Pn–1 · · · A1 M1 P1 A0 x.

How to achieve translation invariance?

Translation: Lc x(u) = x(u − c).
Equivariance - all operators commute with Lc : Lc = Lc .

kΦn (Lc x) − Φn (x)k = kLc Φn (x) − Φn (x)k

≤ kLc An − An k · kMn Pn Φn–1 (x)k
≤ kLc An − An kkxk.
C2
Mallat [2012]: kLc An − An k ≤ σn c (operator norm).
Scale σn of the last layer controls translation invariance.