Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Deep 
Learning 
at 
NAVER 
DEVIEW 
2014 
NAVER 
LABS 
김 정 희 수석연구원 
2014. 09. 30
Contents 
2 
1. Deep learning overview 
2. Deep learning at NAVER 
3. Deep learning?
1. 
Deep 
Learning 
Overview
Deep 
Learning 
Overview 
4 
Machine 
learning 
with 
Deep 
neural 
networks
Machine 
Learning 
5 
Observable 
Data 
Modeling 
(Learning) 
Informa?on
Machine 
Learning 
6 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
Speech 
DEVIEW 
2014 
Recogni?on 
Speech 
Text
Machine 
Learning 
7 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
One 
Language 
Another 
Language 
Machine 
I 
am 
a 
boy 
Transla?on 
나는 소년이다
Machine 
Learning 
8 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
Image 
고양이 
Classify 
Image 
Category
Machine 
Learning 
9 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
HOW 
?
Machine 
Learning 
10 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
LDA
Machine 
Learning 
11 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
Gaussian 
Mixture
Machine 
Learning 
12 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
Non-­‐nega8ve 
Matrix 
Factoriza8on
Machine 
Learning 
13 
Observable 
Data 
Modeling 
(Learning) 
Informa?on 
Deep 
Neural 
Networks
Deep 
Neural 
Networks 
-­‐ 
Learning 
14 
Input 
Data 
Weight 
Matrix 
Probability 
of 
Each 
Class 
Back 
propaga8on 
1980년대 
확립
Why 
Deep 
Learning? 
Yoshua 
Bengio 
2014 
KDDNN 
15 
• NAVER 
Speech 
recogniNon 
§ 20 
% 
error 
rate 
개선
Why 
Deep 
Learning? 
Yoshua 
Bengio 
2014 
KDDNN 
16 
• ImageNet 
ClassificaNon 
2012 
§ Krizhevsky 
et 
al 
(convnet) 
-­‐ 
16.4 
% 
error 
( 
top 
– 
5 
) 
§ Next 
best 
( 
non 
– 
convnet 
) 
-­‐ 
26.2 
% 
error
Why 
Deep 
Learning? 
Yoshua 
Bengio 
2014 
KDDNN 
17
2. 
Deep 
Learning 
at 
NAVER
Rebirth 
of 
Deep 
Learning 
19 
New Algorithms Big Data Hardware
Primary 
Check 
Points 
of 
Deep 
Learning 
20 
New Algorithms Big Data Hardware
ConvoluNonal 
neural 
networks 
2012 
21 
FFNN 
CNN 
Alex 
Krizhevsky 
et 
al. 
2012 
NIPSNN
Image 
classificaNon 
(m.ndrive.naver.com) 
22 
FFNN 
CNN 
RNN
ConvoluNonal 
neural 
networks 
2014 
23 
FFNN 
CNN 
Chris?an 
Szegedy 
et 
al. 
2014 
ILSVRCNN
Deep 
Neural 
Networks 
24 
C_1 
C_2 
ŸŸŸ 
C_n 
FFNN 
CNN 
Non 
Linear 
Ac?va?on 
Sigmoid 
ŸŸŸ 
ŸŸŸ 
ŸŸŸ 
Output 
Layer 
Hidden 
Layer 
Input 
Layer
Back 
-­‐ 
propagaNon 
25 
C_1 
C_2 
ŸŸŸ 
C_n 
FFNN 
CNN 
Non 
Linear 
Ac?va?on 
Sigmoid 
ŸŸŸ 
ŸŸŸ 
ŸŸŸ 
Back-­‐ 
propgata?on 
Error 의 미분값
Exploding 
or 
Vanishing 
26 
FFNN 
CNN
Recurrent 
neural 
networks 
27 
FFNN 
CNN 
RNN
Recurrent 
Neural 
Networks 
28 
• SequenNal 
data 
Analysis 
§ Video 
§ RoboNcs 
§ Language 
model
Recurrent 
neural 
networks 
29 
FFNN 
CNN 
T 
= 
R1N0N0 
?
Recurrent 
Neural 
Networks 
30 
100 
개의 
Hidden 
Layer
Long 
Short-­‐Term 
Memory 
31 
FFNN 
CNN 
T 
= 
R1N0N0 
?
Language 
Model 
32 
• Language 
model 
§ w(1), 
w(2), 
w(3), 
… 
, 
w(n) 
의 단어열이 주어졌을 때, 
§ 확률 p( 
w(1), 
w(2), 
w(3), 
… 
, 
w(m) 
) 위한 확률 모델
LM 
ApplicaNon 
(1) 
33 
• Speech 
recogniNon 
§ 같은 발음 / 다른 철자 
o 위성이 지구 ( 궤도 / 괴도 ) 
를 돌고 있다. 
o (궤도 / 괴도) 루팡을 읽었다.
LM 
ApplicaNon 
(2) 
34 
• Machine 
translaNon 
§ 같은 철자 / 다른 의미 
o The 
town 
lies 
on 
the 
coat. 
o She 
lies 
about 
her 
age.
Recurrent 
Neural 
Networks 
-­‐ 
LM 
35 
• 에측 정확도 향상 
§ 기존 방법 : 
KN-­‐5 
§ 실험 Data 
set 
: 
Penn 
tree 
bank 
§ 척도 : 
perplexity 
§ 20% 향상
Primary 
Check 
Points 
of 
Deep 
Learning 
36 
New Algorithms Big Data Hardware
Neural 
Networks 
37 
• Data-­‐driven 
§ 학습 Data 
에 기초해 해당 Class 
의 확률이 최대가 되도록 
§ Prior 
knowledge 
를 사용하지 않는다. 
! 학습 Data를 그대로 모사하는 것
Neural 
Networks 
38 
• Big 
data 
가 아니면 문제가 생긴다 
§ Why? 
§ 보유한 학습 Data 
가 이 세상의 모든 Data 
라고 할 수 있나?
Before 
– 
Big 
Data 
39 
Real 
World 
Data 
Small 
Learning 
Data
Before 
– 
Big 
Data 
40 
Real 
World 
Data
Before 
– 
Big 
Data 
41 
Real 
World 
Data 
Small 
Learning 
Data
Before 
– 
Big 
Data 
42 
Real 
World 
Data
Before 
– 
Big 
Data 
43 
Real 
World 
Data
Before 
– 
Big 
Data 
44 
Real 
World 
Data 
ERROR 
!!!
Human 
Case 
45 
MiYe 
광고 이미지
Before 
– 
Big 
Data 
46 
Real 
World 
Data 
ERROR 
!!!
Prior 
Know-­‐ 
ledge 
Before 
– 
Big 
Data 
47 
Real 
World 
Data 
Small 
Leaning 
Data 
ERROR 
!!!
GMM 
Before 
– 
Big 
Data 
48 
Real 
World 
Data 
Small 
Speech 
Data 
ERROR 
!!!
Gabor 
Filter 
Before 
– 
Big 
Data 
49 
Real 
World 
Data 
Small 
Image 
Data 
ERROR 
!!!
Image 
classificaNon 
(m.ndrive.naver.com) 
50 
FFNN 
CNN 
RNN
ConvoluNonal 
neural 
networks 
51 
FFNN 
CNN 
Alex 
Krizhevsky 
et 
al. 
2012 
NIPSNN
기존 Approach 
52 
FFNN 
CNN 
RNN 
Image 
Pixels 
Hand-­‐ 
designed 
Feature 
Extractor 
Trainable 
Classifier 
Object 
Class
Deep 
Learning 
53 
FFNN 
CNN 
RNN 
Image 
Pixels 
Trainable 
Feature 
Extractor 
Trainable 
Classifier 
Object 
Class
ConvoluNonal 
neural 
networks 
54 
FFNN 
CNN 
Hierarchical 
Trained 
Filters 
Alex 
Krizhevsky 
et 
al. 
2012 
NIPSNN
RepresentaNon 
Learning 
55 
RNN 
Rob 
Fergus 
2013 
NIPSNN
Prior 
Know-­‐ 
ledge 
Afer 
– 
Big 
Data 
56 
Real 
World 
Data 
Small 
Leaning 
Data 
ERROR 
!!! 
Big 
Learning 
Data
Big 
Data 
57 
• Neural 
networks 
§ Supervised 
learning 
§ 무작정 data 
만 많이 있어서는 안된다 
§ 정답이 있는 data가 많이 있어야 한다
Big 
Data 
58 
• 정답이 있는 DB를 대량으로 구축하기 힘들다면… 
§ Semi-­‐supervised 
learning 
§ Supervised 
Learning 
+ 
Unsupervised 
Learning
Big 
Data 
59 
• Big 
Data 
를 구축하기 힘들다면… 
§ 서비스 개발 진입 장벽
Big 
Data 
60 
• Big 
Data 
를 구축하기 힘들다면… 
§ 일단 만들고 beta 
서비스 하자 !
Transfer 
Learning 
61 
• Deep 
learning 
! 
RepresentaNon 
learning 
§ 유사 Domain 
에서 학습된 내용을 다른 도메인으로 
§ 한국어 음성인식 ! 일본어 음성인식 
§ 중국어 OCR 
! 
일본어 OCR 
§ 이미지 feature 
extractor
Transfer 
Learning 
– 
Image 
62 
FFNN 
CNN 
Supervised 
Trained 
Feature 
Extractor 
Alex 
Krizhevsky 
et 
al. 
2012 
NIPSNN
Transfer 
Learning 
– 
Speech 
RecogniNon 
63
Speech 
recogniNon 
64 
FFNN 
CNN 
RNN 
• Transfer 
learning 
§ 일본어, 영어 인식 엔진 개발 
§ 소량의 데이터 
§ 한국어와 유사한 정확도
Primary 
Check 
Points 
of 
Deep 
Learning 
65 
New Algorithms Big Data Hardware
계산량 
7000 
3000 
… 
3000 
500 
5 
hidden 
layers 
58.5 
M 
parameters
Sparseness 
• 작은 weight 
값들 존재
계산량 감소 Idea 
모든 weight가 필요하지는 않다.
Dimension 
ReducNon 
• PCA와 같은 Dimension 
ReducNon 
§ Layer의 output을 Feature 
§ Linear 
TransformaNon을 이용 
§ 필요없는 Dimension을 줄이는 ReducNon
7000 
R 
3000 
Dimension 
ReducNon 
• 7000 
x 
3000 
>= 
7000 
x 
R 
+ 
3000 
x 
R 
• 2100 
>= 
R
• Weight 
matrix 를 두 개의 matrix 
곱으로 분리하는 matrix 
factorizaNon 
• Weight 
matrix는 low-­‐rank 
matrix 
Dimension 
ReducNon 
[7000 ×3000] =[7000 × R][R×3000]
Dimension 
ReducNon 
• NAVER 
§ Weight 
64% 
감소 
§ 성능 저하 없음
GPU 
• OpNmizaNon 
의 어려움 
§ Deep 
learning 
용 GPU 
library 
들 release 
§ Ex) 
Nvidia 
-­‐ 
CuNet
3. 
Deep 
Learning?
Deep 
Learning 
– 
The 
One 
Ring? 
75 
State 
of 
the 
art 
by 
Abstruse 
Goose
Deep 
Learning 
– 
The 
One 
Ring? 
76 
State 
of 
the 
art 
by 
Abstruse 
Goose
Deep 
Learning 
– 
The 
One 
Ring? 
77 
State 
of 
the 
art 
by 
Abstruse 
Goose
Deep 
Learning 
– 
Current 
Status 
78 
Yoshua 
Bengio 
KDD 
2014
79

More Related Content

[2A4]DeepLearningAtNAVER