21. 参考⽂献
• Sebastian Ruder. “An overview of gradient descent optimization algorithms”. http://ruder.io/optimizing-gradient-descent/.
• Sebastian Ruder. “Optimization for Deep Learning Highlights in 2017”. http://ruder.io/deep-learning-optimization-
2017/index.html.
• Ian Goodfellow and Yoshua Bengio and Aaron Courville. “Deep Learning”. http://www.deeplearningbook.org.
• Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention Is All You
Need. In Advances in Neural Information Processing Systems.
• Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of ICLR 2017.
• Loshchilov, I., & Hutter, F. (2017). Fixing Weight Decay Regularization in Adam. arXiv Preprint arXi1711.05101. Retrieved
from http://arxiv.org/abs/1711.05101
• Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. Retrieved from http://arxiv.org/abs/1212.5701
• Kingma, D. P., & Ba, J. L. (2015). Adam: a Method for Stochastic Optimization. International Conference on Learning
Representations, 1‒13.
• Masaaki Imaizumi. “深層学習による⾮滑らかな関数の推定”. SlideShare.
https://www.slideshare.net/masaakiimaizumi1/ss-87969960.
• nishio.”勾配降下法の最適化アルゴリズム”. SlideShare. https://www.slideshare.net/nishio/ss-66840545