Optimization in Deep Learning

Optimization in Deep Learning
Houston Machine Learning Deep Learning Series

Roadmap
• Tour of machine learning algorithms (1 session)
• Feature engineering (1 session)
• Feature selection - Yan
• Supervised learning (4 sessions)
• Regression models -Yan
• SVM and kernel SVM - Yan
• Tree-based models - Dario
• Bayesian method - Xiaoyang
• Ensemble models - Yan
• Unsupervised learning (3 sessions)
• K-means clustering
• DBSCAN - Cheng
• Mean shift
• Agglomerative clustering – Kunal
• Spectral clustering – Yan
• Dimension reduction for data visualization - Yan
• Deep learning
• Neural network - Yan
• Convolutional neural network – Hengyang Lu
• Recurrent neural networks – Yan
• Hands-on session with deep nets - Yan
Slides posted on:
http://www.slideshare.net/xuyangela

More deep learning coming up!
• Optimization in Deep learning (today’s session)
• Behind AlphaGo
• Mastering the game of Go with deep neural networks
and tree search
• Attention network
• Application of Deep Learning and showcase

Outline
• Gradient Descent
• Stochastic Gradient Descent (SGD)
• Variants of SGD
• Use “momentum”
• Nestrov’s Accelerated Gradient (NAG)
• Adaptive Gradient (AdaGrad)
• Root Mean Square Propagation (RMSProp)
• Adaptive Moment Estimation (Adam)

Stochastic Gradient Descent (SGD)

SGD recommendation
• Randomly shuffle training samples
• Monitor training and validation error
• Experiment learning rates using small sample of
training set
• Leverage sparsity of training samples
• Varying learning rate:

Variants of SGD
• Use “momentum”
• Nestrov’s Accelerated Gradient (NAG)
• Adaptive Gradient (AdaGrad)
• Root Mean Square Propagation (RMSProp)
• Adaptive Moment Estimation (Adam)
Ref:
https://moodle2.cs.huji.ac.il/nu15/pluginfile.php/316969/mod_resource/conte
nt/1/adam_pres.pdf

Performance comparison
http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html

Long Valley

Saddle point

The momentum method by Dr. Geoffrey Hinton
https://www.youtube.com/watch?v=LdkkZglLZ0Q&list=PLoRl3Ht4JOcdU872GhiYWf6jwrk_SNhz9&index=27

SGD with momentum
Start with 0.5

NAG (Nesterov’s Accelerated Gradient)

AdaGrad
Adaptive learning rate:
• weights that receive high gradients will have their
effective learning rate reduced
• weights that receive small or infrequent updates
will have their effective learning rate increased

Comparisons of Different Optimization
Methods

MINIST
Methods

CIFAR-10
Methods

Summary of learning methods for DL
https://www.youtube.com/watch?v=defQQqkXEfE&list=PLoRl3Ht4JOcdU872GhiYWf6jwrk_SNhz9&index=29 from:7:33

Try it out!
From hands-on session: https://www.dropbox.com/s/92sckhnf1hjgjlo/CNN.zip?dl=0
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
…….
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Optimizer: SGD, RMSprop, Adagrad, Adam…. (https://keras.io/optimizers/)

Summary
Full-batch
GD
SGD
Momentum
SGD
NAG
AdaGrad
RMSProp
Adam
Speed up by
momentum
Adaptive learning rate

More deep learning coming up!
• Optimization in Deep learning (today’s session)
• Behind AlphaGo
• Mastering the game of Go with deep neural networks
and tree search
• Attention network
• Application of Deep Learning and showcase
• Any proposal?

Thank you
Slides will be posted at: http://www.slideshare.net/xuyangela
Leave a
group
review
please 

Optimization in Deep Learning

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to Optimization in Deep Learning

Similar to Optimization in Deep Learning (20)

More from Yan Xu

More from Yan Xu (20)

Recently uploaded

Recently uploaded (20)

Optimization in Deep Learning

Editor's Notes