Learning a nonlinear embedding by preserving class neibourhood structure 최종
•Download as PPTX, PDF•
3 likes•312 views
Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a nonlinear embedding by preserving class neighbourhood structure." International Conference on Artificial Intelligence and Statistics. 2007.
1 of 33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
More Related Content
Learning a nonlinear embedding by preserving class neibourhood structure 최종
1. Learning a Nonlinear Embedding by
Preserving Class Neighbourhood Structure
AISTATS `07 San Juan, Puerto Rico
Salakhutdinov Ruslan, and Geoffrey E. Hinton.
Presenter:
WooSung Choi
(ws_choi@korea.ac.kr)
DataKnow. Lab
Korea UNIV.
4. kNN(k-Nearest Neighbor) Classification
Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a nonlinear
embedding by preserving class neighbourhood
structure." International Conference on Artificial Intelligence and
Statistics. 2007.
NN Class
1-NN 6
2-NN 6
3-NN 6
4-NN 6
5-NN 0
<Result of 5-NN>
Result of 5-NN Classification: 6 (80%)
5. Motivating Example
• MNIST
Dimensionality: 28 x 28 = 784
50,000 training images
10,000 test images
• Error: 2.77%
• Query response: 108ms
6. Reality Check
• Curse of dimensionality
[Qin lv et al, Image Similarity Search with Compact Data
Structures @CIKM`04]
poor performance when the number of dimensions is high
Roger Weber et al, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-
Dimensional Spaces @ VLDB`98
7. Locality Sensitive Hashing, Data Sensitive Hashing
Curse of
Dimensionality
Recall 데이터 분포 고려 기반 기술
Scan X (없음) 1 △ N/A
RTree-based Solution O (강함) 1 O index: Tree
Locality Sensitive
Hashing
△ (덜함) < 1 X
Hashing
+ Mathematics
Data Sensitive Hashing △ (덜함) < 1 O
Hashing
+ Machine
Learning
9. Abstract
• How to pre-train and fine-tune a MNN
To lean a nonlinear transformation
From the input space
To a low dimensional feature space
Where KNN classification performs well
Improved using unlabeled data
14. Relative Works: Linear Transformation
• Linear Transformation [8,9,18]
𝑓 𝑥 𝑎|𝑊 = 𝑊𝑥 𝑎
Weakness
Limited number of parameters
𝑓: 𝑅784 → 𝑅30, then 𝑊 should be 30 by 784 matrix(23,520
parameters)
In this paper: 785*500 + 501*500 + 501*500 + 2001*30
parameters
Cannot model higher-order correlation
• Deep Autoencoder [14], DBN[12]
15. In this paper
• Non-Linear Transformation
Overview
Pre-training: Similar to [12,14]
Stack of RBM
RBM1 784-500
RBM2 500-500
RBM3 500-2000
RBM4 2000-30
Fine-tuning: backpropagation
To maximize the objective function
Maximize the expected number
of correctly classified points
on the training data
18. Notation
Symbol Definition
𝑎 = 1,2,3,4, … , 𝑁 Index
𝑥 𝑎
∈ 𝑅 𝑑 𝑎 𝑡ℎ training vector (d-
dimensional data)
𝑐 𝑎
∈ {1,2,…,C} Label of 𝑎 𝑡ℎ
training vector
(𝑥 𝑎
, 𝑐 𝑎
) Labeled training cases
𝑓(𝑥 𝑎
|𝑊)
Output of Multilayer Neural
network parameterized by 𝑊
𝑑 𝑎𝑏 = 𝑓 𝑥 𝑎|𝑊 − 𝑓 𝑥 𝑏|𝑊
2
Euclidean distance metric
𝑝 𝑎𝑏 =
exp(−𝑑 𝑎𝑏)
𝑧≠𝑎 exp(−𝑑 𝑎𝑧)
The probability that
point a selects one of its
neighbor b in the
transformed feature space
𝑑 𝑎𝒂 𝑑 𝑎𝑏 𝑑 𝑎𝐜 𝑑 𝑎𝐝 𝑑 𝑎𝐞
0 1 5 7 7
𝑒−𝑑 𝑎𝑎 e−𝑑 𝑎𝑏 e−𝑑 𝑎𝑐 e−𝑑 𝑎𝑑 e−𝑑 𝑎𝑒
1 0.3678 0.0497 0.002 0.002
𝑝 𝑎𝑎 𝑝 𝑎𝑏 𝑝 𝑎𝑐 𝑝 𝑎𝑑 𝑝 𝑎𝑒
0 0.88 0.11 0 0
𝑝 𝑎𝑏 =
0.3678
0.3678 + 0.0497 + 0.0002 + 0.0002
≈ 0.88
19. Notation
Symbol Definition
𝑝 𝑎𝑏 =
exp(−𝑑 𝑎𝑏)
𝑧≠𝑎 exp(−𝑑 𝑎𝑧)
The probability that
point a selects one of its
neighbor b in the
transformed feature
space
𝑝 𝑐 𝑎
= 𝑘 =
b:cb=𝑘
𝑝 𝑎𝑏
The probability that
point a belongs to class k
𝑶 𝑵𝑪𝑨 =
𝒂=𝟏
𝑵
𝒃:𝒄 𝒂=𝒄 𝒃
𝒆𝒙𝒑(−𝒅 𝒂𝒃)
𝒛≠𝒂 𝒆𝒙𝒑(−𝒅 𝒂𝒛)
The Expected Number of
correctly classified point
on the training data
𝒄 𝒂
𝒄 𝒃 𝒄 𝒄
𝒄 𝒅 𝒄 𝒆
N/A 3 3 2 1
𝑝 𝑎𝑎 𝑝 𝑎𝑏 𝑝 𝑎𝑐 𝑝 𝑎𝑑 𝑝 𝑎𝑒
0 0.88 0.11 0 0
𝑝 𝑐 𝑎 = 3 = 0.99
𝑝 𝑐 𝑎
= 2 = 0
𝑝 𝑐 𝑎
= 1 = 0
23. Details
• Pre-training
Mini-batch
Each containing 100cases
Epoch: 50
Fine-Tuning
Method: Conjugate gradients on larger
mini-batches of 5,000 with three line
search performed for each mini-batch
Epoch: 50
Dataset
60,000 training images
10,000 for validation