Proppo: a message passing framework for customizable and composable learning algorithms
Article No.: 2114, Pages 29152 - 29165
Abstract
While existing automatic differentiation (AD) frameworks allow flexibly composing model architectures, they do not provide the same flexibility for composing learning algorithms—everything has to be implemented in terms of back-propagation. To address this gap, we invent Automatic Propagation (AP) software, which generalizes AD, and allows custom and composable construction of complex learning algorithms. The framework allows packaging custom learning algorithms into propagators that automatically implement the necessary computations, and can be reused across different computation graphs. We implement Proppo, a prototype AP software package built on top of the Pytorch AD framework. To demonstrate the utility of Proppo, we use it to implement Monte Carlo gradient estimation techniques, such as reparameterization and likelihood ratio gradients, as well as the total propagation algorithm and Gaussian shaping gradients, which were previously used in model-based reinforcement learning, but do not have any publicly available implementation. Finally, in minimalistic experiments, we show that these methods allow increasing the gradient estimation accuracy by orders of magnitude, particularly when the machine learning system is at the edge of chaos.
Supplementary Material
Supplemental material.
- Download
- 1.57 MB
References
[1]
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. (2016). {TensorFlow}: A system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265-283. 1
[2]
Anonymous (2022). Anonymous. In Anonymous, page 0000. 1, 3.5, B.3
[3]
Barham, P. and Isard, M. (2019). Machine learning systems are stuck in a rut. In Proceedings of the Workshop on Hot Topics in Operating Systems, pages 177-183. 1
[4]
Bauer, F. L. (1974). Computational graphs and rounding error. SIAM Journal on Numerical Analysis, 11(1):87-96. 2.1
[5]
Baydin, A. G., Pearlmutter, B. A., Radul, A. A., and Siskind, J. M. (2018). Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18:1-43. 2.1
[6]
Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., and Zhang, Q. (2018). JAX: composable transformations of Python+NumPy programs. 1
[7]
Chollet, F. et al. (2015). Keras. https://keras.io. 1
[8]
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., et al. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311. 6
[9]
Dangel, F., Kunstner, F., and Hennig, P. (2020). BackPACK: Packing more into backprop. In International Conference on Learning Representations. 1
[10]
Fairbank, M. (2008). Reinforcement learning by value gradients. arXiv preprint arXiv:0803.3539. A.2
[11]
Foerster, J., Farquhar, G., Al-Shedivat, M., Rocktäschel, T., Xing, E., and Whiteson, S. (2018). Dice: The infinitely differentiable Monte Carlo estimator. In International Conference on Machine Learning, pages 1529-1538. 1, 5
[12]
Glynn, P. W. (1990). Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33(10):75-84. 2.2
[13]
Greensmith, E., Bartlett, P. L., and Baxter, J. (2004). Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471-1530. A.1
[14]
Hafner, D., Lillicrap, T., Ba, J., and Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations. 1
[15]
Jankowiak, M. and Obermeyer, F. (2018). Pathwise derivatives beyond the reparameterization trick. In International Conference on Machine Learning, pages 2240-2249. A.1
[16]
Kingma, D. P. and Welling, M. (2013). Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114. 2.2
[17]
Krieken, E., Tomczak, J., and Ten Teije, A. (2021). Storchastic: A framework for general stochastic automatic differentiation. Advances in Neural Information Processing Systems, 34. 1, 5
[18]
Metz, L., Maheswaranathan, N., Nixon, J., Freeman, C. D., and Sohl-Dickstein, J. (2019). Understanding and correcting pathologies in the training of learned optimizers. In International Conference on Machine Learning. 4.2, C.1
[19]
Minka, T. (2019). From automatic differentiation to message passing. Slides for Presentation at Advances and challenges in machine learning languages workshop. [Online; https://tminka.github.io/papers/acmll2019/minka-acmll2019-slides.pdf]. 1
[20]
Mohamed, S., Rosea, M., Figurnov, M., and Mnih, A. (2020). Monte Carlo gradient estimation in machine learning. Journal of Machine Learning Research, 21:1-62. 2.2
[21]
Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M. I., et al. (2018). Ray: A distributed framework for emerging {AI} applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI18), pages 561-577. 1
[22]
Oktay, D., McGreivy, N., Aduol, J., Beatson, A., and Adams, R. P. (2020). Randomized automatic differentiation. In International Conference on Learning Representations. 1
[23]
Owen, A. B. (2013). Monte Carlo theory, methods and examples. A.1
[24]
Parmas, P. (2018). Total stochastic gradient algorithms and applications in reinforcement learning. In Advances in Neural Information Processing Systems, pages 10204-10214. 1, A.1, A.2, A.2, A.2, A.2, A.2, C.3
[25]
Parmas, P. (2020). Total stochastic gradient algorithms and applications to model-based reinforcement learning. PhD thesis, Okinawa Institute of Science and Technology Graduate University. 1, A.1
[26]
Parmas, P., Rasmussen, C. E., Peters, J., and Doya, K. (2018). PIPPS: Flexible model-based policy search robust to the curse of chaos. In International Conference on Machine Learning, pages 4062-4071. 1, 2.2, 4.1, 4.2, A.1, A.2, A.2, C.1, 8
[27]
Parmas, P. and Sugiyama, M. (2021). A unified view of likelihood ratio and reparameterization gradients. In International Conference on Artificial Intelligence and Statistics, pages 4078-4086. PMLR. A.1
[28]
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32. 1
[29]
Paszke, A., Johnson, D. D., Duvenaud, D., Vytiniotis, D., Radul, A., Johnson, M. J., Ragan-Kelley, J., and Maclaurin, D. (2021). Getting to the point: index sets and parallelism-preserving autodiff for pointful array programming. Proceedings of the ACM on Programming Languages, 5(ICFP):1-29. 1
[30]
Pearl, J. (1982). Reverend bayes on inference engines: a distributed hierarchical approach. In Proceedings of the Second AAAI Conference on Artificial Intelligence, pages 133-136. 1, B.1
[31]
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125. 6
[32]
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pages 1278-1286. 2.2, A.1
[33]
Rogozhnikov, A. (2021). Einops: Clear and reliable tensor manipulations with einstein-like notation. In International Conference on Learning Representations. 1
[34]
Schulman, J., Heess, N., Weber, T., and Abbeel, P. (2015). Gradient estimation using stochastic computation graphs. In Advances in Neural Information Processing Systems, pages 3528-3536. 1, 3.4, 5, A.1
[35]
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning, pages 387-395. PMLR. A.2
[36]
Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057-1063. A.2
[37]
Tokui, S., Oono, K., Hido, S., and Clayton, J. (2015). Chaîner: a next-generation open source framework for deep learning. In Proceedings of workshop on machine learning systems (LearningSys) in the twenty-ninth annual conference on neural information processing systems (NIPS), volume 5, pages 1-6. 1
[38]
Wang, X. (1991). Period-doublings to chaos in a simple neural network: An analytical proof. Complex Systems, 5(4):425-441. 4.1, 4.1, 4, C.1, 8
[39]
Weaver, L. and Tao, N. (2001). The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pages 538-545. Morgan Kaufmann Publishers Inc. A.1
[40]
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256. 2.2
Index Terms
- Proppo: a message passing framework for customizable and composable learning algorithms
Index terms have been assigned to the content through auto-classification.
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Comments
Information & Contributors
Information
Published In
November 2022
39114 pages
ISBN:9781713871088
- Editors:
- S. Koyejo,
- S. Mohamed,
- A. Agarwal,
- D. Belgrave,
- K. Cho,
- A. Oh
Copyright © 2022 Neural Information Processing Systems Foundation, Inc.
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Published: 03 April 2024
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024