article

Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning

Authors:

Jieping YeAuthors Info & Claims

SIAM Journal on Optimization, Volume 20, Issue 6

Pages 3465 - 3489

https://doi.org/10.1137/090763184

Published: 01 December 2010 Publication History

Abstract

We consider a recently proposed optimization formulation of multi-task learning based on trace norm regularized least squares. While this problem may be formulated as a semidefinite program (SDP), its size is beyond general SDP solvers. Previous solution approaches apply proximal gradient methods to solve the primal problem. We derive new primal and dual reformulations of this problem, including a reduced dual formulation that involves minimizing a convex quadratic function over an operator-norm ball in matrix space. This reduced dual problem may be solved by gradient-projection methods, with each projection involving a singular value decomposition. The dual approach is compared with existing approaches and its practical effectiveness is illustrated on simulations and an application to gene expression pattern analysis.

References

[1]

J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert, A new approach to collaborative filtering: Operator estimation with spectral regularization, J. Mach. Learn. Res., 10 (2009), pp. 803-826.

[2]

Y. Amit, M. Fink, N. Srebro, and S. Ullman, Uncovering shared structures in multiclass classification, in Proceedings of the International Conference on Machine Learning, 2007, pp. 17-24.

[3]

R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeled data, J. Mach. Learn. Res., 6 (2005), pp. 1817-1853.

[4]

R. K. Ando, BioCreative II gene mention tagging system at IBM Watson, in Proceedings of the Second BioCreative Challenge Evaluation Workshop, 2007.

[5]

A. Argyriou, T. Evgeniou, and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), pp. 243-272.

[6]

A. Argyriou, A. Maurer, and M. Pontil, An algorithm for transfer learning in a heterogeneous environment, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008, p. 85.

[7]

A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying, A spectral regularization framework for multi-task structure learning, in Proceedings of the Conference on Neural Information Processing Systems, 2007.

[8]

B. Bakker and T. Heskes, Task clustering and gating for bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), pp. 83-99.

[9]

J. Baxter, A model of inductive bias learning, J. Artificial Intelligence Res., 12 (2000), pp. 149-198.

[10]

A. Beck and M. Teboulle, Fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183-202.

[11]

D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear System Theory, Princeton University Press, Princeton, NJ, 2005.

[12]

D. P. Bertsekas, Nonlinear Programming, 2nd ed., Athena Scientific, Belmont, 1999.

[13]

S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.

[14]

J.-F. Cai, E. J. Candés, and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim., 20 (2010), pp. 1956-1982.

[15]

E. J. Candés and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), pp. 717-772.

[16]

R. Caruana, Multitask learning, Mach. Learn., 28 (1997), pp. 41-75.

[17]

R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Comput. Surv., 40 (2008), pp. 1-60.

[18]

T. Evgeniou, C. A. Micchelli, and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), pp. 615-637.

[19]

M. Fazel, H. Hindi, and S. P. Boyd, A rank minimization heuristic with application to minimum order system approximation, in Proceedings of the American Control Conference, 2001, pp. 4734-4739.

[20]

M. Fazel, H. Hindi, and S. P. Boyd, Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices, in Proceedings of the American Control Conference, 2003, pp. 2156-2162.

[21]

G. H. Golub and C. F. Van Loan, Matrix Computation, 3rd ed., Johns Hopkins University Press, Baltimore, MD 1996.

[22]

B. Heisele, T. Serre, M. Pontil, T. Vetter, and T. Poggio, Categorization by learning and combining object parts, in Proceedings of the Conference on Neural Information Processing Systems, 2001.

[23]

R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, New York, 2005.

[24]

L. Jacob, F. Bach, and J.-P. Vert, Clustered multi-task learning: A convex formulation, in Proceedings of the Conference on Neural Information Processing Systems, 2008.

[25]

S. Ji, Y.-X. Li, Z.-H. Zhou, S. Kumar, and J. Ye, A bag-of-words approach for drosophila gene expression pattern annotation, BMC Bioinformatics, 10 (2009), p. 119.

[26]

S. Ji, L. Sun, R. Jin, S. Kumar, and J. Ye, Automated annotation of Drosophila geneexpression patterns using a controlled vocabulary, Bioinformatics, 24 (2008), pp. 1881-1888.

[27]

S. Ji and J. Ye, An accelerated gradient method for trace norm minimization, in Proceedings of the International Conference on Machine Learning, 2009.

[28]

S. Ji, L. Yuan, Y.-X. Li, Z.-H. Zhou, S. Kumar, and J. Ye, Drosophila gene expression pattern annotation using sparse features and term-term interactions, in Proceedings of the Conference on Knowledge Discovery and Data Mining, 2009.

[29]

E. Lécuyer and P. Tomancak, Mapping the gene expression universe, Curr. Opin. Genet. Dev., 18 (2008), pp. 506-512.

[30]

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, RCV1: A new benchmark collection for text categorization research, J. Mach. Learn. Res., 5 (2004), pp. 361-397.

[31]

Z. Liu and L. Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 1235-1256.

[32]

D. G. Lowe, Distinctive image features from scale-invariant keypoints, Internat. J. Comput. Vision, 60 (2004), pp. 91-110.

[33]

Z. Lu, R. D. C. Monteiro, and M. Yuan, Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression, Math. Program., to appear.

[34]

S. Ma, D. Goldfarb, and L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Technical report 08-78, UCLA Computational and Applied Math., Los Angeles, CA, 2008.

[35]

Y. Nesterov, A method for solving a convex programming problem with convergence rate $O(1/k^2)$, Sov. Math. Dokl., 27 (1983), pp. 372-376.

[36]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Norwell, MA, 2003.

[37]

Y. Nesterov, Excessive gap technique in nonsmooth convex minimization, SIAM J. Optim., 16 (2005), pp. 235-249.

[38]

Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program., 103 (2005), pp. 127-152.

[39]

Y. Nesterov, Gradient methods for minimizing composite objective function, Technical report 2007/76, Center for Operations Research and Econometrics, Université Catholique de Louvain, Brussels, Belgium, 2007.

[40]

G. Obozinski, B. Taskar, and M. I. Jordan, Multi-task feature selection, Technical report, Dept. of Statistics, University of California, Berkeley, CA, 2006.

[41]

G. Obozinski, B. Taskar, and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Stat. Comput., 20 (2010), pp. 231-252.

[42]

A. Quattoni, M. Collins, and T. Darrell, Learning visual representations using images with captions, in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2007.

[43]

J. D. M. Rennie and N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, in Proceedings of the International Conference on Machine Learning, 2005, pp. 713-719.

[44]

R. Rifkin and A. Klautau, In defense of one-vs-all classification, J. Mach. Learn. Res., 5 (2004), pp. 101-141.

[45]

R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970.

[46]

N. Srebro, J. D. M. Rennie, and T. S. Jaakkola, Maximum-margin matrix factorization, in Proceedings of the Conference on Neural Information Processing Systems, 2005, pp. 1329-1336.

[47]

K.-C. Toh and S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems, Pacific J. Optim., 6 (2010), pp. 615-640.

[48]

P. Tomancak, A. Beaton, R. Weiszmann, E. Kwan, S. Q. Shu, S. E. Lewis, S. Richards, M. Ashburner, V. Hartenstein, S. E. Celniker, and G. Rubin, Systematic determination of patterns of gene expression during Drosophila embryogenesis, Genome Biology, 3 (2002).

[49]

P. Tomancak, B. Berman, A. Beaton, R. Weiszmann, E. Kwan, V. Hartenstein, S. Celniker, and G. Rubin, Global analysis of patterns of gene expression during Drosophila embryogenesis, Genome Biology, 8 (2007), p. R145.

[50]

P. Tseng, On accelerated proximal gradient methods for convex-concave optimization, SIAM J. Optim., 2008, submitted.

[51]

M. Weimer, A. Karatzoglou, Q. Le, and A. Smola, $\mbox{COFI}^{\mbox{{rank}}}$ - maximum margin matrix factorization for collaborative ranking, in Proceedings of the Conference on Neural Information Processing Systems, 2008, pp. 1593-1600.

[52]

M. Weimer, A. Karatzoglou, and A. Smola, Improving maximum margin matrix factorization, Mach. Learn., 72 (2008), pp. 263-276.

[53]

Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, Multi-task learning for classification with dirichlet process priors, J. Mach. Learn. Res., 8 (2007), pp. 35-63.

[54]

M. Yuan, A. Ekici, Z. Lu, and R. Monteiro, Dimension reduction and coefficient estimation in multivariate linear regression, J. R. Stat. Soc. Ser. B Stat. Methodol., 69 (2007), pp. 329-346.

[55]

S. Yun and P. Tseng, A block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization, J. Optim. Theory Appl., 140 (2009), pp. 513-535.

Cited By

Zhang YZhang YWang W(2023)Learning Linear and Nonlinear Low-Rank Structure in Multi-Task LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320390435:8(8157-8170)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3203904
Poon CPeyré G(2023)Smooth over-parameterized solvers for non-smooth structured optimizationMathematical Programming: Series A and B10.1007/s10107-022-01923-3201:1-2(897-952)Online publication date: 8-Feb-2023
https://dl.acm.org/doi/10.1007/s10107-022-01923-3
Liang JLiu ZZhou JJiang XZhang CWang F(2022)Model-Protected Multi-Task LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.301585944:2(1002-1019)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TPAMI.2020.3015859
Show More Cited By

Recommendations

Exact Duality in Semidefinite Programming Based on Elementary Reformulations

In semidefinite programming (SDP), unlike in linear programming, Farkas' lemma may fail to prove infeasibility. Here we obtain an exact, short certificate of infeasibility in SDP by an elementary approach: we reformulate any equality constrained ...
Exact quadratic convex reformulations of mixed-integer quadratically constrained problems

We propose a solution approach for the general problem (QP) of minimizing a quadratic function of bounded integer variables subject to a set of quadratic constraints. The resolution is based on the reformulation of the original problem (QP) into an ...
An introduction to convex optimization for communications and signal processing

Convex optimization methods are widely used in the design and analysis of communication systems and signal processing algorithms. This tutorial surveys some of recent progress in this area. The tutorial contains two parts. The first part gives a survey ...

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization

SIAM Journal on Optimization Volume 20, Issue 6

August 2010

822 pages

ISSN:1052-6234

Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 December 2010

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YZhang YWang W(2023)Learning Linear and Nonlinear Low-Rank Structure in Multi-Task LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320390435:8(8157-8170)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3203904
Poon CPeyré G(2023)Smooth over-parameterized solvers for non-smooth structured optimizationMathematical Programming: Series A and B10.1007/s10107-022-01923-3201:1-2(897-952)Online publication date: 8-Feb-2023
https://dl.acm.org/doi/10.1007/s10107-022-01923-3
Liang JLiu ZZhou JJiang XZhang CWang F(2022)Model-Protected Multi-Task LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.301585944:2(1002-1019)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TPAMI.2020.3015859
Abhadiomhen SNzeh RGanaa ENwagwu HOkereke GRoutray S(2022)Supervised Shallow Multi-task Learning: Analysis of MethodsNeural Processing Letters10.1007/s11063-021-10703-754:3(2491-2508)Online publication date: 29-Jan-2022
https://dl.acm.org/doi/10.1007/s11063-021-10703-7
Zhang YYing SWen Z(2022)Multitask transfer learning with kernel representationNeural Computing and Applications10.1007/s00521-022-07126-334:15(12709-12721)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s00521-022-07126-3
Zeng C(2022)Proximal linearization methods for Schatten p-quasi-norm minimizationNumerische Mathematik10.1007/s00211-022-01335-7153:1(213-248)Online publication date: 2-Dec-2022
https://dl.acm.org/doi/10.1007/s00211-022-01335-7
Shen JZhen XWorring MShao LRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Variational multi-task learning with gumbel-softmax priorsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541870(21031-21042)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541870
Sriramanan GAddepalli SBaburaj ABabu RRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Towards efficient and effective adversarial trainingProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541165(11821-11833)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541165
Poon CPeyré GRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Smooth bilevel programming for sparse regularizationProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540380(1543-1555)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540380
Zhang YZhang YWang WZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)Multi-Task Learning via Generalized Tensor Trace NormProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467329(2254-2262)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467329
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents