Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3618408.3619156guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Complexity of block coordinate descent with proximal regularization and applications to Wasserstein CP-dictionary learning

Published: 23 July 2023 Publication History

Abstract

We consider the block coordinate descent methods of Gauss-Seidel type with proximal regularization (BCD-PR), which is a classical method of minimizing general nonconvex objectives under constraints that has a wide range of practical applications. We theoretically establish the worst-case complexity bound for this algorithm. Namely, we show that for general nonconvex smooth objective with block-wise constraints, the classical BCD-PR algorithm converges to an e-stationary point within Õ(ε-1) iterations. Under a mild condition, this result still holds even if the algorithm is executed inexactly in each step. As an application, we propose a provable and efficient algorithm for 'Wasserstein CP-dictionary learning', which seeks a set of elementary probability distributions that can well-approximate a given set of d-dimensional joint probability distributions. Our algorithm is a version of BCD-PR that operates in the dual space, where the primal problem is regularized both entropically and proximally

References

[1]
Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdykałojasiewicz inequality. Mathematics of operations research, 35(2):438-457, 2010.
[2]
Bauschke, H. H., Combettes, P. L., et al. Convex analysis and monotone operator theory in Hilbert spaces, volume 408. Springer, 2011.
[3]
Bolte, J., Sabach, S., and Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1-2):459- 494, 2014.
[4]
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pp. 177-186. Springer, 2010.
[5]
Boyd, S., Boyd, S. P., and Vandenberghe, L. Convex optimization. Cambridge university press, 2004.
[6]
Carroll, J. D. and Chang, J.-J. Analysis of individual differences in multidimensional scaling via an n-way generalization of "Eckart-Young" decomposition. Psychometrika, 35(3):283-319, 1970.
[7]
Cartis, C., Gould, N. I., and Toint, P. L. On the complexity of steepest descent, newton's and regularized newton's methods for nonconvex unconstrained optimization problems. SIAM Journal on Optimization, 20(6):2833-2852, 2010.
[8]
Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26, 2013.
[9]
Cuturi, M. and Peyré, G. A smoothed dual approach for variational wasserstein problems. SIAM Journal on Imaging Sciences, 9(1):320-343, 2016.
[10]
Daniilidis, A. and Malick, J. Filling the gap between lower-c1 and lower-c2 functions. Journal of Convex Analysis, 12(2):315-329, 2005.
[11]
Elad, M. and Aharon, M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736-3745, 2006.
[12]
Ghassemi, M., Shakeri, Z., Sarwate, A. D., and Bajwa, W. U. Stark: Structured dictionary learning through rank-one tensor recovery. In Proceedings of the IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 1-5. IEEE, 2017.
[13]
Grippo, L. and Sciandrone, M. On the convergence of the block nonlinear gauss-seidel method under convex constraints. Operations research letters, 26(3):127-136, 2000.
[14]
Harshman, R. A. Foundations of the parafac procedure: Models and conditions for an "explanatory" multimodal factor analysis. 1970.
[15]
Hong, M., Wang, X., Razaviyayn, M., and Luo, Z.-Q. Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1):85-114, 2017.
[16]
Kolda, T. G. and Bader, B. W. Tensor decompositions and applications. SIAM Review, 51(3):455-500, 2009.
[17]
Lee, D. D. and Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755): 788, 1999.
[18]
Lee, D. D. and Seung, H. S. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, pp. 556-562, 2001.
[19]
Lyu, H. Convergence and complexity of stochastic block majorization-minimization. arXiv preprint arXiv:2201.01652, 2022.
[20]
Lyu, H., Strohmeier, C., and Needell, D. Online tensor factorization and cp-dictionary learning for markovian data. arXiv preprint arXiv:2009.07612, 2020.
[21]
Mairal, J. Optimization with first-order surrogate functions. In International Conference on Machine Learning, pp. 783-791, 2013.
[22]
Mairal, J., Elad, M., and Sapiro, G. Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1):53-69, 2007.
[23]
Mairal, J., Bach, F., Ponce, J., and Sapiro, G. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(Jan):19-60, 2010.
[24]
Nesterov, Y. Introductory lectures on convex programming volume i: Basic course. Lecture notes, 3(4):5, 1998.
[25]
Nesterov, Y. Gradient methods for minimizing composite functions. Mathematical programming, 140(1):125-161, 2013.
[26]
Peyre, G. Sparse modeling of textures. Journal of Mathematical Imaging and Vision, 34(1):17-31, 2009.
[27]
Powell, M. J. On search directions for minimization algorithms. Mathematical programming, 4(1):193-201, 1973.
[28]
Rolet, A., Cuturi, M., and Peyre, G. Fast dictionary learning with a smoothed wasserstein loss. In Artificial Intelligence and Statistics, pp. 630-638. PMLR, 2016.
[29]
Sandler, R. and Lindenbaum, M. Nonnegative matrix factorization with earth mover's distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1590-1602, 2011.
[30]
Santambrogio, F. Optimal transport for applied mathematicians. Birkauser, NY, 55(58-63):94, 2015.
[31]
Shakeri, Z., Bajwa, W. U., and Sarwate, A. D. Minimax lower bounds for kronecker-structured dictionary learning. In IEEE International Symposium on Information Theory, pp. 1148-1152. IEEE, 2016.
[32]
Shashua, A. and Hazan, T. Non-negative tensor factorization with applications to statistics and computer vision. In Proceedings of the 22nd international conference on Machine learning, pp. 792-799. ACM, 2005.
[33]
Sun, J., Qu, Q., and Wright, J. When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096, 2015.
[34]
Tucker, L. R. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279-311, 1966.
[35]
Wang, Y.-X. and Zhang, Y.-J. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on knowledge and data engineering, 25(6):1336-1353, 2012.
[36]
Wright, S. J. Coordinate descent algorithms. Mathematical Programming, 151(1):3-34, 2015.
[37]
Xu, Y. and Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3):1758-1789, 2013.
[38]
Zafeiriou, S. Algorithms for nonnegative tensor factorization. In Tensors in Image Processing and Computer Vision, pp. 105-124. Springer, 2009.
[39]
Zen, G., Ricci, E., and Sebe, N. Simultaneous ground metric learning and matrix factorization with earth mover's distance. In 2014 22nd International Conference on Pattern Recognition, pp. 3690-3695. IEEE, 2014.
[40]
Zeng, J., Lau, T. T.-K., Lin, S., and Yao, Y. Global convergence of block coordinate descent in deep learning. In International Conference on Machine Learning, pp. 7313-7323. PMLR, 2019.
[41]
Zhang, Z. and Brand, M. Convergent block coordinate descent for training tikhonov regularized deep neural networks. Advances in Neural Information Processing Systems, 30, 2017.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media