research-article

Free access

Stochastic flows and geometric optimization on the orthogonal group

AUTHORs:

Krzysztof Choromanski,

Valerii Likhosherstov,

Achille Nazaret,

Achraf Bahamou,

Mrugank Akarte,

Jack Parker-Holder,

Jacob Bergquist,

Aldo Pacchiano,

Vikas SindhwaniAuthors Info & Claims

ICML'20: Proceedings of the 37th International Conference on Machine Learning

Article No.: 179, Pages 1918 - 1928

Published: 13 July 2020 Publication History

PDF eReader Publisher Site

Abstract

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group O(d) and naturally reductive homogeneous manifolds obtained from the action of the rotation group SO(d). We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult Humanoid agent from OpenAI Gym and improving convolutional neural networks.

Supplementary Material

Additional material (3524938.3525117_supp.pdf)

Supplemental material.

Download
363.30 KB

References

[1]

Absil, P., Mahony, R. E., and Sepulchre, R. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008. ISBN 978-0-691-13298-3. URL http://press.princeton.edu/titles/8586.html.

[2]

Arjovsky, M., Shah, A., and Bengio, Y. Unitary evolution recurrent neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pp. 1120-1128, 2016. URL http://proceedings.mlr.press/v48/arjovsky16.html.

[3]

Assadi, S., Bateni, M., and Mirrokni, V. S. Distributed weighted matching via randomized composable coresets. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pp. 333-343, 2019. URL http://proceedings.mlr.press/v97/assadi19a.html.

[4]

Bansal, N., Chen, X., and Wang, Z. Can we gain more from orthogonality regularizations in training deep networks? In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pp. 4266-4276, 2018.

[5]

Bengio, Y., Frasconi, P., and Simard, P. Y. The problem of learning long-term dependencies in recurrent networks. In Proceedings of International Conference on Neural Networks (ICNN'88), San Francisco, CA, USA, March 28 - April 1, 1993, pp. 1183-1188, 1993.

[6]

Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1724-1734, 2014. URL https://www.aclweb.org/anthology/D14-1179/.

[7]

Choromanski, K., Rowland, M., Sindhwani, V., Turner, R. E., and Weller, A. Structured evolution with compact architectures for scalable policy optimization. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 969-977, 2018. URL http://proceedings.mlr.press/v80/choromanski18a.html.

[8]

Choromanski, K., Pacchiano, A., Pennington, J., and Tang, Y. Kama-nns: Low-dimensional rotation based neural networks. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16- 18 April 2019, Naha, Okinawa, Japan, pp. 236-245, 2019. URL http://proceedings.mlr.press/v89/choromanski19a.html.

[9]

Chow, Y., Nachum, O., Duéñez-Guzmán, E. A., and Ghavamzadeh, M. A lyapunov-based approach to safe reinforcement learning. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3- 8 December 2018, Montréal, Canada, pp. 8103-8112, 2018.

[10]

Chu, T., Gao, Y., Peng, R., Sachdeva, S., Sawlani, S., and Wang, J. Graph sparsification, spectral sketches, and faster resistance computation, via short cycle decompositions. CoRR, abs/1805.12051, 2018. URL http://arxiv.org/abs/1805.12051.

[11]

Czumaj, A., Lacki, J., Madry, A., Mitrovic, S., Onak, K., and Sankowski, P. Round compression for parallel matching algorithms. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pp. 471-484, 2018.

Digital Library

[12]

Diestel, R. Graph Theory, 4th Edition, volume 173 of Graduate texts in mathematics. Springer, 2012. ISBN 978-3-642-14278-9.

[13]

Edelman, A., Arias, T. A., and Smith, S. T. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Analysis Applications, 20(2):303-353, 1998.

Digital Library

[14]

Gallier, J. Geometric Methods and Applications: For Computer Science and Engineering. Texts in Applied Mathematics. Springer New York, 2011. ISBN 9781441999610. URL https://books.google.co.uk/books?id=4v5VOTZ-vMcC.

[15]

Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., and Schmidhuber, J. LSTM: A search space odyssey. CoRR, abs/1503.04069, 2015. URL http://arxiv.org/abs/1503.04069.

[16]

Ha, D. and Schmidhuber, J. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3- 8 December 2018, Montréal, Canada, pp. 2455-2467, 2018.

[17]

Hairer, E. Important aspects of geometric numerical integration. J. Sci. Comput., 25(1):67-81, 2005.

Digital Library

[18]

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770-778, 2016.

[19]

Helfrich, K., Willmott, D., and Ye, Q. Orthogonal recurrent neural networks with scaled cayley transform. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 1974-1983, 2018. URL http://proceedings.mlr.press/v80/helfrich18a.html.

[20]

Henaff, M., Szlam, A., and LeCun, Y. Recurrent orthogonal networks and long-memory tasks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pp. 2034-2042, 2016. URL http://proceedings.mlr.press/v48/henaff16.html.

[21]

Higham, N. J. The scaling and squaring method for the matrix exponential revisited. SIAM Review, 51(4):747- 764, 2009.

Digital Library

[22]

Holyer, I. The np-completeness of edge-coloring. SIAM J. Comput., 10(4):718-720, 1981.

Digital Library

[23]

Huang, L., Liu, X., Lang, B., Yu, A. W., Wang, Y., and Li, B. Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 3271-3278, 2018. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17072.

[24]

Jain, V., Koehler, F., and Mossel, E. Approximating partition functions in constant time. CoRR, abs/1711.01655, 2017. URL http://arxiv.org/abs/1711.01655.

[25]

Jia, K., Li, S., Wen, Y., Liu, T., and Tao, D. Orthogonal deep neural networks. CoRR, abs/1905.05929, 2019. URL http://arxiv.org/abs/1905.05929.

[26]

Jing, L., Shen, Y., Dubcek, T., Peurifoy, J., Skirlo, S. A., LeCun, Y., Tegmark, M., and Soljacic, M. Tunable efficient unitary neural networks (EUNN) and their application to rnns. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 1733-1741, 2017. URL http://proceedings.mlr.press/v70/jing17a.html.

[27]

Kavis, A., Levy, K. Y., Bach, F., and Cevher, V. Unixgrad: A universal, adaptive algorithm with optimal guarantees for constrained optimization. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 6257-6266, 2019.

[28]

Lattanzi, S., Moseley, B., Suri, S., and Vassilvitskii, S. Filtering: a method for solving graph problems in MapReduce. In SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4-6, 2011 (Colocated with FCRC 2011), pp. 85-94, 2011.

Digital Library

[29]

Lee, J. Introduction to smooth manifolds. 2nd revised ed, volume 218. 01 2012.

[30]

Mania, H., Guy, A., and Recht, B. Simple random search of static linear policies is competitive for reinforcement learning. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pp. 1805-1814, 2018.

[31]

Mhammedi, Z., Hellicar, A. D., Rahman, A., and Bailey, J. Efficient orthogonal parametrisation of recurrent neural networks using Householder reflections. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 2401-2409, 2017. URL http://proceedings.mlr.press/v70/mhammedi17a.html.

[32]

Micali, S. and Vazirani, V. V. An o(sqrt(|v|) |e|) algorithm for finding maximum matching in general graphs. In 21st Annual Symposium on Foundations of Computer Science, Syracuse, New York, USA, 13-15 October 1980, pp. 17-27, 1980.

Digital Library

[33]

Rosen, D. M., Carlone, L., Bandeira, A. S., and Leonard, J. J. Se-sync: A certifiably correct algorithm for synchronization over the special euclidean group. I. J. Robotics Res., 38(2-3), 2019.

Digital Library

[34]

Rowland, M., Choromanski, K., Chalus, F., Pacchiano, A., Sarlós, T., Turner, R. E., and Weller, A. Geometrically coupled monte carlo sampling. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pp. 195- 205, 2018.

[35]

Salimans, T., Ho, J., Chen, X., and Sutskever, I. Evolution strategies as a scalable alternative to reinforcement learning. CoRR, abs/1703.03864, 2017. URL http://arxiv.org/abs/1703.03864.

[36]

Saxe, A. M., McClelland, J. L., and Ganguli, S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. URL http://arxiv.org/abs/1312.6120.

[37]

Shalit, U. and Chechik, G. Coordinate-descent for learning orthogonal matrices through Givens rotations. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pp. 548-556, 2014. URL http://proceedings.mlr.press/v32/shalit14.html.

[38]

Shukla, A. and Anand, S. Distance metric learning by optimization on the stiefel manifold. pp. 7.1-7.10, 01 2015.

[39]

van den Berg, R., Hasenclever, L., Tomczak, J. M., and Welling, M. Sylvester normalizing flows for variational inference. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pp. 393- 402, 2018. URL http://auai.org/uai2018/proceedings/papers/156.pdf.

[40]

Vieillard, N., Pietquin, O., and Geist, M. On connections between constrained optimization and reinforcement learning. CoRR, abs/1910.08476, 2019. URL http://arxiv.org/abs/1910.08476.

[41]

Wang, J., Chen, Y., Chakraborty, R., and Yu, S. X. Orthogonal convolutional neural networks. In arXiv:1911.12207, 2019.

[42]

Xie, D., Xiong, J., and Pu, S. All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5075-5084, 2017.

[43]

Zavlanos, M. M. and Pappas, G. J. A dynamical systems approach to weighted graph matching. Automatica, 44(11):2817-2824, 2008.

Digital Library

Recommendations

Sphere decoder with box optimisation for faster‐than‐Nyquist non‐orthogonal frequency division multiplexing

In 1975, J. E. Mazo showed the potential faster‐than‐Nyquist (FTN) gain of the single‐carrier binary signal. If the inter‐symbol interference is eliminated by an optimal detector, FTN single‐carrier binary signal can transmit 24.7% more bits than the ...
Stochastic Multiobjective Optimization: Sample Average Approximation and Applications

We investigate one stage stochastic multiobjective optimization problems where the objectives are the expected values of random functions. Assuming that the closed form of the expected values is difficult to obtain, we apply the well known Sample ...
Stochastic Optimization Methods: Applications in Engineering and Operations Research

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'20: Proceedings of the 37th International Conference on Machine Learning

July 2020

11702 pages

Editors:
Hal Daumé,
Aarti Singh

Copyright © 2020.

Publisher

JMLR.org

Publication History

Published: 13 July 2020

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
62
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)28

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents