Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning

Published: 01 December 2010 Publication History

Abstract

We consider a recently proposed optimization formulation of multi-task learning based on trace norm regularized least squares. While this problem may be formulated as a semidefinite program (SDP), its size is beyond general SDP solvers. Previous solution approaches apply proximal gradient methods to solve the primal problem. We derive new primal and dual reformulations of this problem, including a reduced dual formulation that involves minimizing a convex quadratic function over an operator-norm ball in matrix space. This reduced dual problem may be solved by gradient-projection methods, with each projection involving a singular value decomposition. The dual approach is compared with existing approaches and its practical effectiveness is illustrated on simulations and an application to gene expression pattern analysis.

References

[1]
J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert, A new approach to collaborative filtering: Operator estimation with spectral regularization, J. Mach. Learn. Res., 10 (2009), pp. 803-826.
[2]
Y. Amit, M. Fink, N. Srebro, and S. Ullman, Uncovering shared structures in multiclass classification, in Proceedings of the International Conference on Machine Learning, 2007, pp. 17-24.
[3]
R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeled data, J. Mach. Learn. Res., 6 (2005), pp. 1817-1853.
[4]
R. K. Ando, BioCreative II gene mention tagging system at IBM Watson, in Proceedings of the Second BioCreative Challenge Evaluation Workshop, 2007.
[5]
A. Argyriou, T. Evgeniou, and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), pp. 243-272.
[6]
A. Argyriou, A. Maurer, and M. Pontil, An algorithm for transfer learning in a heterogeneous environment, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008, p. 85.
[7]
A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying, A spectral regularization framework for multi-task structure learning, in Proceedings of the Conference on Neural Information Processing Systems, 2007.
[8]
B. Bakker and T. Heskes, Task clustering and gating for bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), pp. 83-99.
[9]
J. Baxter, A model of inductive bias learning, J. Artificial Intelligence Res., 12 (2000), pp. 149-198.
[10]
A. Beck and M. Teboulle, Fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183-202.
[11]
D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear System Theory, Princeton University Press, Princeton, NJ, 2005.
[12]
D. P. Bertsekas, Nonlinear Programming, 2nd ed., Athena Scientific, Belmont, 1999.
[13]
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.
[14]
J.-F. Cai, E. J. Candés, and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim., 20 (2010), pp. 1956-1982.
[15]
E. J. Candés and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), pp. 717-772.
[16]
R. Caruana, Multitask learning, Mach. Learn., 28 (1997), pp. 41-75.
[17]
R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Comput. Surv., 40 (2008), pp. 1-60.
[18]
T. Evgeniou, C. A. Micchelli, and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), pp. 615-637.
[19]
M. Fazel, H. Hindi, and S. P. Boyd, A rank minimization heuristic with application to minimum order system approximation, in Proceedings of the American Control Conference, 2001, pp. 4734-4739.
[20]
M. Fazel, H. Hindi, and S. P. Boyd, Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices, in Proceedings of the American Control Conference, 2003, pp. 2156-2162.
[21]
G. H. Golub and C. F. Van Loan, Matrix Computation, 3rd ed., Johns Hopkins University Press, Baltimore, MD 1996.
[22]
B. Heisele, T. Serre, M. Pontil, T. Vetter, and T. Poggio, Categorization by learning and combining object parts, in Proceedings of the Conference on Neural Information Processing Systems, 2001.
[23]
R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, New York, 2005.
[24]
L. Jacob, F. Bach, and J.-P. Vert, Clustered multi-task learning: A convex formulation, in Proceedings of the Conference on Neural Information Processing Systems, 2008.
[25]
S. Ji, Y.-X. Li, Z.-H. Zhou, S. Kumar, and J. Ye, A bag-of-words approach for drosophila gene expression pattern annotation, BMC Bioinformatics, 10 (2009), p. 119.
[26]
S. Ji, L. Sun, R. Jin, S. Kumar, and J. Ye, Automated annotation of Drosophila geneexpression patterns using a controlled vocabulary, Bioinformatics, 24 (2008), pp. 1881-1888.
[27]
S. Ji and J. Ye, An accelerated gradient method for trace norm minimization, in Proceedings of the International Conference on Machine Learning, 2009.
[28]
S. Ji, L. Yuan, Y.-X. Li, Z.-H. Zhou, S. Kumar, and J. Ye, Drosophila gene expression pattern annotation using sparse features and term-term interactions, in Proceedings of the Conference on Knowledge Discovery and Data Mining, 2009.
[29]
E. Lécuyer and P. Tomancak, Mapping the gene expression universe, Curr. Opin. Genet. Dev., 18 (2008), pp. 506-512.
[30]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, RCV1: A new benchmark collection for text categorization research, J. Mach. Learn. Res., 5 (2004), pp. 361-397.
[31]
Z. Liu and L. Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 1235-1256.
[32]
D. G. Lowe, Distinctive image features from scale-invariant keypoints, Internat. J. Comput. Vision, 60 (2004), pp. 91-110.
[33]
Z. Lu, R. D. C. Monteiro, and M. Yuan, Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression, Math. Program., to appear.
[34]
S. Ma, D. Goldfarb, and L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Technical report 08-78, UCLA Computational and Applied Math., Los Angeles, CA, 2008.
[35]
Y. Nesterov, A method for solving a convex programming problem with convergence rate $O(1/k^2)$, Sov. Math. Dokl., 27 (1983), pp. 372-376.
[36]
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Norwell, MA, 2003.
[37]
Y. Nesterov, Excessive gap technique in nonsmooth convex minimization, SIAM J. Optim., 16 (2005), pp. 235-249.
[38]
Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program., 103 (2005), pp. 127-152.
[39]
Y. Nesterov, Gradient methods for minimizing composite objective function, Technical report 2007/76, Center for Operations Research and Econometrics, Université Catholique de Louvain, Brussels, Belgium, 2007.
[40]
G. Obozinski, B. Taskar, and M. I. Jordan, Multi-task feature selection, Technical report, Dept. of Statistics, University of California, Berkeley, CA, 2006.
[41]
G. Obozinski, B. Taskar, and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Stat. Comput., 20 (2010), pp. 231-252.
[42]
A. Quattoni, M. Collins, and T. Darrell, Learning visual representations using images with captions, in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2007.
[43]
J. D. M. Rennie and N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, in Proceedings of the International Conference on Machine Learning, 2005, pp. 713-719.
[44]
R. Rifkin and A. Klautau, In defense of one-vs-all classification, J. Mach. Learn. Res., 5 (2004), pp. 101-141.
[45]
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970.
[46]
N. Srebro, J. D. M. Rennie, and T. S. Jaakkola, Maximum-margin matrix factorization, in Proceedings of the Conference on Neural Information Processing Systems, 2005, pp. 1329-1336.
[47]
K.-C. Toh and S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems, Pacific J. Optim., 6 (2010), pp. 615-640.
[48]
P. Tomancak, A. Beaton, R. Weiszmann, E. Kwan, S. Q. Shu, S. E. Lewis, S. Richards, M. Ashburner, V. Hartenstein, S. E. Celniker, and G. Rubin, Systematic determination of patterns of gene expression during Drosophila embryogenesis, Genome Biology, 3 (2002).
[49]
P. Tomancak, B. Berman, A. Beaton, R. Weiszmann, E. Kwan, V. Hartenstein, S. Celniker, and G. Rubin, Global analysis of patterns of gene expression during Drosophila embryogenesis, Genome Biology, 8 (2007), p. R145.
[50]
P. Tseng, On accelerated proximal gradient methods for convex-concave optimization, SIAM J. Optim., 2008, submitted.
[51]
M. Weimer, A. Karatzoglou, Q. Le, and A. Smola, $\mbox{COFI}^{\mbox{{rank}}}$ - maximum margin matrix factorization for collaborative ranking, in Proceedings of the Conference on Neural Information Processing Systems, 2008, pp. 1593-1600.
[52]
M. Weimer, A. Karatzoglou, and A. Smola, Improving maximum margin matrix factorization, Mach. Learn., 72 (2008), pp. 263-276.
[53]
Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, Multi-task learning for classification with dirichlet process priors, J. Mach. Learn. Res., 8 (2007), pp. 35-63.
[54]
M. Yuan, A. Ekici, Z. Lu, and R. Monteiro, Dimension reduction and coefficient estimation in multivariate linear regression, J. R. Stat. Soc. Ser. B Stat. Methodol., 69 (2007), pp. 329-346.
[55]
S. Yun and P. Tseng, A block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization, J. Optim. Theory Appl., 140 (2009), pp. 513-535.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization
SIAM Journal on Optimization  Volume 20, Issue 6
August 2010
822 pages

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 December 2010

Author Tags

  1. convex optimization
  2. duality
  3. gene expression pattern analysis
  4. multi-task learning
  5. proximal gradient method
  6. semidefinite programming
  7. trace norm regularization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Learning Linear and Nonlinear Low-Rank Structure in Multi-Task LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320390435:8(8157-8170)Online publication date: 1-Aug-2023
  • (2023)Smooth over-parameterized solvers for non-smooth structured optimizationMathematical Programming: Series A and B10.1007/s10107-022-01923-3201:1-2(897-952)Online publication date: 8-Feb-2023
  • (2022)Model-Protected Multi-Task LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.301585944:2(1002-1019)Online publication date: 1-Feb-2022
  • (2022)Supervised Shallow Multi-task Learning: Analysis of MethodsNeural Processing Letters10.1007/s11063-021-10703-754:3(2491-2508)Online publication date: 29-Jan-2022
  • (2022)Multitask transfer learning with kernel representationNeural Computing and Applications10.1007/s00521-022-07126-334:15(12709-12721)Online publication date: 1-Aug-2022
  • (2022)Proximal linearization methods for Schatten p-quasi-norm minimizationNumerische Mathematik10.1007/s00211-022-01335-7153:1(213-248)Online publication date: 2-Dec-2022
  • (2021)Variational multi-task learning with gumbel-softmax priorsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541870(21031-21042)Online publication date: 6-Dec-2021
  • (2021)Towards efficient and effective adversarial trainingProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541165(11821-11833)Online publication date: 6-Dec-2021
  • (2021)Smooth bilevel programming for sparse regularizationProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540380(1543-1555)Online publication date: 6-Dec-2021
  • (2021)Multi-Task Learning via Generalized Tensor Trace NormProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467329(2254-2262)Online publication date: 14-Aug-2021
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media