Abstract
Mathematical optimization is at the algorithmic core of machine learning. Almost any known algorithm for solving mathematical optimization problems has been applied in machine learning and the machine learning community itself is actively designing and implementing new algorithms for specific problems. These implementations have to be made available to machine learning practitioners which is mostly accomplished by distributing them as standalone software. Successful well-engineered implementations are collected in machine learning toolboxes that provide a more uniform access to the different solvers. A disadvantage of the toolbox approach is a lack of flexibility as toolboxes only provide access to a fixed set of machine learning models that cannot be modified. This can be a problem for the typical machine learning workflow that iterates the process of modeling, solving and validating. If a model does not perform well on validation data, it needs to be modified. In most cases these modifications require a new solver for the entailed optimization problems. Optimization frameworks that combine a modeling language for specifying optimization problems with a solver are better suited to the iterative workflow since they allow to address large problem classes. Here, we provide examples of the use of optimization frameworks in machine learning. We also illustrate the use of one such framework in a case study that follows the typical machine learning workflow.
Funding source: Deutsche Forschungsgemeinschaft
Award Identifier / Grant number: LA-2971/1-1
Funding source: Deutsche Forschungsgemeinschaft
Award Identifier / Grant number: GI-711/5-1
Funding statement: Sören Laue acknowledges funding from DFG grant LA-2971/1-1 for work on the basic GENO framework. Joachim Giesen, Sören Laue, and Matthias Mitterreiter acknowledge funding from DFG grant GI-711/5-1 for scaling up GENO to be used within parallel and distributed computing environments.
About the authors
Joachim Giesen is a Professor at the Institute of Computer Science at Friedrich Schiller University Jena. After completing his doctorate in computer science at ETH Zurich, he spent a year as a postdoc at Ohio State University in Columbus, OH. This was followed by positions as Senior Researcher at ETH Zurich and the Max Planck Institute for Computer Science in Saarbrücken. Since 2008 Dr. Giesen has been Professor of Theoretical Computer Science at Friedrich Schiller University Jena.
Sören Laue is a senior research scientist at the Institute of Computer Science at Friedrich Schiller University Jena. He received his MSc degree in computer science from Saarland university in 2004. From 2005 until 2008 he has been a PhD student at the Max Planck Institute for Computer Science in Saarbrücken. In 2008 he joined the Institute of Computer Science at Friedrich Schiller University Jena. His work includes efficient algorithms for matrix and tensor calculus (www.MatrixCalculus.org) and efficient and generic optimization for machine learning problems (www.geno-project.org).
Matthias Mitterreiter received his MSc degree in computer science from Friedrich Schiller University Jena and is currently working on his PhD thesis. The focus of his thesis work is on scaling up generic optimization for applications in machine learning. His work is funded by the DFG priority program 1736 Algorithms for Big Data.
References
1. Christopher M. Bishop. Pattern Recognition and Machine Learning, 5th Edition. Information science and statistics. Springer, 2007.Search in Google Scholar
2. Trevor Hastie, Robert Tibshirani and Jerome H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition. Springer Series in Statistics. Springer, 2009.10.1007/978-0-387-84858-7Search in Google Scholar
3. Kevin P. Murphy. Machine Learning – A Probabilistic Perspective. Adaptive computation and machine learning series. MIT Press, 2012.Search in Google Scholar
4. David R. Cox. The regression analysis of binary sequences (with discussion). J. Roy. Stat. Soc. B, 20:215–242, 1958.10.1111/j.2517-6161.1958.tb00292.xSearch in Google Scholar
5. Bernhard Scholkopf, Ralf Herbrich and Alex J. Smola. A generalized representer theorem. In International Conference on Computational Learning Theory (COLT), 2001.10.1007/3-540-44581-1_27Search in Google Scholar
6. Suvrit Sra, Sebastian Nowozin and Stephen J. Wright Optimization for Machine Learning. MIT Press, 2012.Search in Google Scholar
7. Herbert Robbins and Sutton Monro. A stochastic approximation method. Ann. Math. Statist., 22(3):400–407, 1951.10.1007/978-1-4612-5110-1_9Search in Google Scholar
8. L. Bottou, F. Curtis and J. Nocedal. Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60(2):223–311, 2018.10.1137/16M1080173Search in Google Scholar
9. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.Search in Google Scholar
10. Eibe Frank, Mark A. Hall and Ian H. Witten. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann, fourth edition, 2016.Search in Google Scholar
11. Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia and Ameet Talwalkar. Mllib: Machine learning in apache spark. Journal of Machine Learning Research, 17(1), January 2016.Search in Google Scholar
12. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011.10.1145/1961189.1961199Search in Google Scholar
13. Jerome H. Friedman, Trevor Hastie and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010.10.18637/jss.v033.i01Search in Google Scholar
14. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu and Xiaoqiang Zheng. TensorFlow: A system for large-scale machine learning. In USENIX Conference on Operating Systems Design and Implementation (OSDI), pages 265–283, 2016.Search in Google Scholar
15. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga and Adam Lerer. Automatic differentiation in pytorch. In NIPS Autodiff workshop, 2017.Search in Google Scholar
16. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Search in Google Scholar
17. Andreas Griewank and Andrea Walther. Evaluating derivatives – principles and techniques of algorithmic differentiation. SIAM, 2008.10.1137/1.9780898717761Search in Google Scholar
18. B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.10.1016/0041-5553(64)90137-5Search in Google Scholar
19. Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence
20. Robert Fourer, David M. Gay and Brian W. Kernighan. AMPL: a modeling language for mathematical programming. Thomson/Brooks/Cole, 2003.Search in Google Scholar
21. A. Brooke, D. Kendrick and A. Meeraus. GAMS: release 2.25: a user’s guide. The Scientific press series. Scientific Press, 1992.Search in Google Scholar
22. Iain Dunning, Joey Huchette and Miles Lubin. JuMP: A modeling language for mathematical optimization. SIAM Review, 59(2):295–320, 2017.10.1137/15M1020575Search in Google Scholar
23. CVX Research, Inc. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, December 2018.Search in Google Scholar
24. M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95–110. 2008.10.1007/978-1-84800-155-8_7Search in Google Scholar
25. Akshay Agrawal, Robin Verschueren, Steven Diamond and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.10.1080/23307706.2017.1397554Search in Google Scholar
26. Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.Search in Google Scholar
27. Jacob Mattingley and Stephen Boyd. CVXGEN: A Code Generator for Embedded Convex Optimization. Optimization and Engineering, 13(1):1–27, 2012.10.1007/s11081-011-9176-9Search in Google Scholar
28. P. Giselsson and S. Boyd. Linear convergence and metric selection for Douglas-Rachford splitting and ADMM. IEEE Transactions on Automatic Control, 62(2):532–544, Feb. 2017.10.1109/TAC.2016.2564160Search in Google Scholar
29. Goran Banjac, Bartolomeo Stellato, Nicholas Moehle, Paul Goulart, Alberto Bemporad and Stephen P. Boyd. Embedded code generation using the OSQP solver. In Conference on Decision and Control, (CDC), pages 1906–1911, 2017.10.1109/CDC.2017.8263928Search in Google Scholar
30. Sören Laue, Matthias Mitterreiter and Joachim Giesen. GENO – GENeric Optimization for Classical Machine Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2019.Search in Google Scholar
31. Sören Laue, Matthias Mitterreiter and Joachim Giesen. Computing higher order derivatives of matrix and tensor expressions. In Advances in Neural Information Processing Systems (NeurIPS), 2018.Search in Google Scholar
32. Sören Laue, Matthias Mitterreiter and Joachim Giesen. A Simple and Efficient Tensor Calculus. In AAAI Conference on Artificial Intelligence (AAAI), 2020.10.1609/aaai.v34i04.5881Search in Google Scholar
33. Yurii Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1):127–152, 2005.10.1007/s10107-004-0552-5Search in Google Scholar
34. Richard H. Byrd, Peihuang Lu, Jorge Nocedal and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Scientific Computing, 16(5):1190–1208, 1995.10.2172/204262Search in Google Scholar
35. Ciyou Zhu, Richard H. Byrd, Peihuang Lu and Jorge Nocedal. Algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw., 23(4):550–560, 1997.10.1145/279232.279236Search in Google Scholar
36. José Luis Morales and Jorge Nocedal. Remark on “algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound constrained optimization”. ACM Trans. Math. Softw., 38(1):7:1–7:4, 2011.10.1145/2049662.2049669Search in Google Scholar
37. Magnus R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4(5):303–320, 1969.10.1007/BF00927673Search in Google Scholar
38. M. J. D. Powell. Algorithms for nonlinear constraints that use Lagrangian functions. Mathematical Programming, 14(1):224–248, 1969.10.1007/BF01588967Search in Google Scholar
39. Ernesto G. Birgin and José Mario Martínez. Practical augmented Lagrangian methods for constrained optimization, volume 10 of Fundamentals of Algorithms. SIAM, 2014.10.1137/1.9781611973365Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston