Abstract
Tucker decomposition is a popular technique for many data analysis and machine learning applications. Finding a Tucker decomposition is a nonconvex optimization problem. As the scale of the problems increases, local search algorithms such as stochastic gradient descent have become popular in practice. In this paper, we characterize the optimization landscape of the Tucker decomposition problem. In particular, we show that if the tensor has an exact Tucker decomposition, for a standard nonconvex objective of Tucker decomposition, all local minima are also globally optimal. We also give a local search algorithm that can find an approximate local (and global) optimal solution in polynomial time.
Similar content being viewed by others
Notes
This can be achieved by initializing at 0, or any point with norm O(1).
References
Anandkumar, A., Ge, R.: Efficient approaches for escaping higher order saddle points in non-convex optimization. In: Conference on Learning Theory, pp. 81–102 (2016)
Bandeira, A.S., Boumal, N., Voroninski, V.: On the Low-Rank Approach for Semidefinite Programs Arising in Synchronization and Community Detection. arXiv preprint arXiv:1602.04426 (2016)
Bhojanapalli, S., Neyshabur, B., Srebro, N.: Global optimality of local search for low rank matrix recovery. In: Advances in Neural Information Processing Systems, pp. 3873–3881 (2016)
Carbery, A., Wright, J.: Distributional and \(l^q\) norm inequalities for polynomials over convex bodies in \(r^n\). Math. Res. Lett. 8(3), 233–248 (2001)
Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35(3), 283–319 (1970)
Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview. IEEE Trans. Signal Process. 67(20), 5239–5269 (2019)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
De Lathauwer, L., De Moor, B.: On the best rank-1 and rank-(r 1, r 2,., rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21(4), 1324–1342 (2000)
Eldén, L., Savas, B.: A newton-grassmann method for computing the best multilinear rank-(r_1, r_2, r_3) approximation of a tensor. SIAM J. Matrix Anal. Appl. 31(2), 248–271 (2009)
Frandsen, A., Ge, R.: Understanding composition of word embeddings via tensor decomposition. In: International Conference on Learning Representations (2019)
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)
Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis. In: International Conference on Machine Learning (2017)
Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. In: Advances in Neural Information Processing Systems, pp. 2973–2981 (2016)
Gunasekar, S., Woodworth, B.E., Bhojanapalli, S., Neyshabur, B., Srebro, N.: Implicit regularization in matrix factorization. In: Advances in Neural Information Processing Systems, pp. 6151–6159 (2017)
Harshman, R.A.: Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics, vol. 16. pp. 1–84. University Microfilms, Ann Arbor, Michigan, No. 10,085 (1970)
Hitchcock, F.L.: The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 6(1–4), 164–189 (1927)
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1724–1732. JMLR. org (2017)
Koren, Y.: The bellkor solution to the netflix grand prize. Netflix Prize Doc. 81(2009), 1–10 (2009)
Li, Y., Ma, T., Zhang, H.: Algorithmic Regularization in Over-Parameterized Matrix Sensing and Neural Networks with Quadratic Activations. arXiv preprint arXiv:1712.09203 (2017)
Lim, L.H.: Singular values and eigenvalues of tensors: a variational approach. In: 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2005., pp. 129–132. IEEE (2005)
Nesterov, Y., Polyak, B.T.: Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Park, D., Kyrillidis, A., Caramanis, C., Sanghavi, S.: Non-square Matrix Sensing Without Spurious Local Minima via the Burer-Monteiro Approach. arXiv preprint arXiv:1609.03240 (2016)
Phan, A.H., Cichocki, A., Tichavskỳ, P.: On fast algorithms for orthogonal tucker decomposition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6766–6770. IEEE (2014)
Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Program. Comput. 5(2), 201–226 (2013)
Savas, B., Eldén, L.: Handwritten digit classification using higher order singular value decomposition. Pattern Recogn. 40(3), 993–1003 (2007)
Stewart, G.W.: Perturbation theory for the singular value decomposition. Tech. rep. (1998)
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere i: overview and the geometric picture. IEEE Trans. Inf. Theory 63, 853–884 (2016)
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2379–2383. IEEE (2016)
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)
Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: Tensorfaces. In: European Conference on Computer Vision, pp. 447–460. Springer (2002)
Wang, H., Ahuja, N.: Compact representation of multidimensional data using tensor rank-one decomposition. Vectors 1, 5 (2004)
Wang, M., Duc, K.D., Fischer, J., Song, Y.S.: Operator norm inequalities between tensor unfoldings on the partition lattice. Linear Algebra Its Appl. 520, 44–66 (2017)
Wedin, P.Å.: Perturbation bounds in connection with singular value decomposition. BIT Numer. Math. 12(1), 99–111 (1972)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by NSF Awards CCF-1704656 and CCF-1845171 (CAREER); Sloan Fellowship; and Google Faculty Research Award.
Rights and permissions
About this article
Cite this article
Frandsen, A., Ge, R. Optimization landscape of Tucker decomposition. Math. Program. 193, 687–712 (2022). https://doi.org/10.1007/s10107-020-01531-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-020-01531-z