Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3618408.3619156guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype

Complexity of block coordinate descent with proximal regularization and applications to Wasserstein CP-dictionary learning

Published: 23 July 2023 Publication History


We consider the block coordinate descent methods of Gauss-Seidel type with proximal regularization (BCD-PR), which is a classical method of minimizing general nonconvex objectives under constraints that has a wide range of practical applications. We theoretically establish the worst-case complexity bound for this algorithm. Namely, we show that for general nonconvex smooth objective with block-wise constraints, the classical BCD-PR algorithm converges to an e-stationary point within Õ(ε-1) iterations. Under a mild condition, this result still holds even if the algorithm is executed inexactly in each step. As an application, we propose a provable and efficient algorithm for 'Wasserstein CP-dictionary learning', which seeks a set of elementary probability distributions that can well-approximate a given set of d-dimensional joint probability distributions. Our algorithm is a version of BCD-PR that operates in the dual space, where the primal problem is regularized both entropically and proximally


Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdykałojasiewicz inequality. Mathematics of operations research, 35(2):438-457, 2010.
Bauschke, H. H., Combettes, P. L., et al. Convex analysis and monotone operator theory in Hilbert spaces, volume 408. Springer, 2011.
Bolte, J., Sabach, S., and Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1-2):459- 494, 2014.
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pp. 177-186. Springer, 2010.
Boyd, S., Boyd, S. P., and Vandenberghe, L. Convex optimization. Cambridge university press, 2004.
Carroll, J. D. and Chang, J.-J. Analysis of individual differences in multidimensional scaling via an n-way generalization of "Eckart-Young" decomposition. Psychometrika, 35(3):283-319, 1970.
Cartis, C., Gould, N. I., and Toint, P. L. On the complexity of steepest descent, newton's and regularized newton's methods for nonconvex unconstrained optimization problems. SIAM Journal on Optimization, 20(6):2833-2852, 2010.
Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26, 2013.
Cuturi, M. and Peyré, G. A smoothed dual approach for variational wasserstein problems. SIAM Journal on Imaging Sciences, 9(1):320-343, 2016.
Daniilidis, A. and Malick, J. Filling the gap between lower-c1 and lower-c2 functions. Journal of Convex Analysis, 12(2):315-329, 2005.
Elad, M. and Aharon, M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736-3745, 2006.
Ghassemi, M., Shakeri, Z., Sarwate, A. D., and Bajwa, W. U. Stark: Structured dictionary learning through rank-one tensor recovery. In Proceedings of the IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 1-5. IEEE, 2017.
Grippo, L. and Sciandrone, M. On the convergence of the block nonlinear gauss-seidel method under convex constraints. Operations research letters, 26(3):127-136, 2000.
Harshman, R. A. Foundations of the parafac procedure: Models and conditions for an "explanatory" multimodal factor analysis. 1970.
Hong, M., Wang, X., Razaviyayn, M., and Luo, Z.-Q. Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1):85-114, 2017.
Kolda, T. G. and Bader, B. W. Tensor decompositions and applications. SIAM Review, 51(3):455-500, 2009.
Lee, D. D. and Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755): 788, 1999.
Lee, D. D. and Seung, H. S. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, pp. 556-562, 2001.
Lyu, H. Convergence and complexity of stochastic block majorization-minimization. arXiv preprint arXiv:2201.01652, 2022.
Lyu, H., Strohmeier, C., and Needell, D. Online tensor factorization and cp-dictionary learning for markovian data. arXiv preprint arXiv:2009.07612, 2020.
Mairal, J. Optimization with first-order surrogate functions. In International Conference on Machine Learning, pp. 783-791, 2013.
Mairal, J., Elad, M., and Sapiro, G. Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1):53-69, 2007.
Mairal, J., Bach, F., Ponce, J., and Sapiro, G. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(Jan):19-60, 2010.
Nesterov, Y. Introductory lectures on convex programming volume i: Basic course. Lecture notes, 3(4):5, 1998.
Nesterov, Y. Gradient methods for minimizing composite functions. Mathematical programming, 140(1):125-161, 2013.
Peyre, G. Sparse modeling of textures. Journal of Mathematical Imaging and Vision, 34(1):17-31, 2009.
Powell, M. J. On search directions for minimization algorithms. Mathematical programming, 4(1):193-201, 1973.
Rolet, A., Cuturi, M., and Peyre, G. Fast dictionary learning with a smoothed wasserstein loss. In Artificial Intelligence and Statistics, pp. 630-638. PMLR, 2016.
Sandler, R. and Lindenbaum, M. Nonnegative matrix factorization with earth mover's distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1590-1602, 2011.
Santambrogio, F. Optimal transport for applied mathematicians. Birkauser, NY, 55(58-63):94, 2015.
Shakeri, Z., Bajwa, W. U., and Sarwate, A. D. Minimax lower bounds for kronecker-structured dictionary learning. In IEEE International Symposium on Information Theory, pp. 1148-1152. IEEE, 2016.
Shashua, A. and Hazan, T. Non-negative tensor factorization with applications to statistics and computer vision. In Proceedings of the 22nd international conference on Machine learning, pp. 792-799. ACM, 2005.
Sun, J., Qu, Q., and Wright, J. When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096, 2015.
Tucker, L. R. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279-311, 1966.
Wang, Y.-X. and Zhang, Y.-J. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on knowledge and data engineering, 25(6):1336-1353, 2012.
Wright, S. J. Coordinate descent algorithms. Mathematical Programming, 151(1):3-34, 2015.
Xu, Y. and Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3):1758-1789, 2013.
Zafeiriou, S. Algorithms for nonnegative tensor factorization. In Tensors in Image Processing and Computer Vision, pp. 105-124. Springer, 2009.
Zen, G., Ricci, E., and Sebe, N. Simultaneous ground metric learning and matrix factorization with earth mover's distance. In 2014 22nd International Conference on Pattern Recognition, pp. 3690-3695. IEEE, 2014.
Zeng, J., Lau, T. T.-K., Lin, S., and Yao, Y. Global convergence of block coordinate descent in deep learning. In International Conference on Machine Learning, pp. 7313-7323. PMLR, 2019.
Zhang, Z. and Brand, M. Convergent block coordinate descent for training tikhonov regularized deep neural networks. Advances in Neural Information Processing Systems, 30, 2017.



Information & Contributors


Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages



Publication History

Published: 23 July 2023


  • Research-article
  • Research
  • Refereed limited


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics


View Options

View options






Share this Publication link

Share on social media