research-article

Efficient online learning for multitask feature selection

Authors:

Haiqin Yang,

Michael R. Lyu, and

Irwin KingAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 7, Issue 2

Article No.: 6, Pages 1 - 27

https://doi.org/10.1145/2499907.2499909

Published: 02 August 2013 Publication History

Get Access

Abstract

Learning explanatory features across multiple related tasks, or MultiTask Feature Selection (MTFS), is an important problem in the applications of data mining, machine learning, and bioinformatics. Previous MTFS methods fulfill this task by batch-mode training. This makes them inefficient when data come sequentially or when the number of training data is so large that they cannot be loaded into the memory simultaneously. In order to tackle these problems, we propose a novel online learning framework to solve the MTFS problem. A main advantage of the online algorithm is its efficiency in both time complexity and memory cost. The weights of the MTFS models at each iteration can be updated by closed-form solutions based on the average of previous subgradients. This yields the worst-case bounds of the time complexity and memory cost at each iteration, both in the order of O(d × Q), where d is the number of feature dimensions and Q is the number of tasks. Moreover, we provide theoretical analysis for the average regret of the online learning algorithms, which also guarantees the convergence rate of the algorithms. Finally, we conduct detailed experiments to show the characteristics and merits of the online learning algorithms in solving several MTFS problems.

References

[1]

Aaker, D. A., Kumar, V., and Day, G. S. 2006. Marketing Research 9^th Ed. Wiley.

Google Scholar

[2]

Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853.

Digital Library

Google Scholar

[3]

Argyriou, A., Evgeniou, T., and Pontil, M. 2006. Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41--48.

Google Scholar

[4]

Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272.

Digital Library

Google Scholar

[5]

Bai, J., Zhou, K., Xue, G.-R., Zha, H., Sun, G., Tseng, B. L., Zheng, Z., and Chang, Y. 2009. Multi-task learning for learning to rank in web search. In Proceedings of the 18^th ACM Conference on Information and Knowledge Management (CIKM'09). 1549--1552.

Digital Library

Google Scholar

[6]

Bakker, B. and Heskes, T. 2003. Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83--99.

Digital Library

Google Scholar

[7]

Balakrishnan, S. and Madigan, D. 2008. Algorithms for sparse linear classifiers in the massive data setting. J. Mach. Learn. Res. 9, 313--337.

Digital Library

Google Scholar

[8]

Ben-David, S. and Borbely, R. S. 2008. A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73, 3, 273--287.

Digital Library

Google Scholar

[9]

Ben-David, S. and Schuller, R. 2003. Exploiting task relatedness for mulitple task learning. In Proceedings of the 16^th Annual Conference on Learning Theory (COLT'03). 567--580.

Google Scholar

[10]

Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.

Digital Library

Google Scholar

[11]

Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75.

Digital Library

Google Scholar

[12]

Chen, J., Liu, J., and Ye, J. 2010. Learning incoherent sparse and low-rank patterns from multiple tasks. In Proceedings of the 16^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10). 1179--1188.

Digital Library

Google Scholar

[13]

Chen, J., Liu, J., and Ye, J. 2012. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5, 4.

Digital Library

Google Scholar

[14]

Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the 26^th Annual International Conference on Machine Learning (ICML'09). 18.

Digital Library

Google Scholar

[15]

Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the 9^th IEEE International Conference on Data Mining (ICDM'09). 746--751.

Digital Library

Google Scholar

[16]

Dekel, O., Long, P. M., and Singer, Y. 2006. Online multitask learning. In Proceedings of the 19^th Annual Conference on Learning Theory (COLT'06). 453--467.

Digital Library

Google Scholar

[17]

Dhillon, P. S., Foster, D. P., and Ungar, L. H. 2011. Minimum description length penalization for group and multi-task sparse learning. J. Mach. Learn. Res. 12, 525--564.

Digital Library

Google Scholar

[18]

Dhillon, P. S., Tomasik, B., Foster, D. P., and Ungar, L. H. 2009. Multi-task feature selection using the multiple inclusion criterion (mic). In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'09). 276--289.

Digital Library

Google Scholar

[19]

Duchi, J. and Singer, Y. 2009. Efficient learning using forward-backward splitting. J. Mach. Learn. Res. 10, 2873--2898.

Digital Library

Google Scholar

[20]

Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637.

Digital Library

Google Scholar

[21]

Evgeniou, T. and Pontil, M. 2004. Regularized multi--task learning. In Proceedings of the 10^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). 109--117.

Digital Library

Google Scholar

[22]

Friedman, J., Hastie, T., and Tibshirani, R. 2010. A note on the group lasso and a sparse group lasso. http://arxiv.org/pdf/1001.0736.pdf.

Google Scholar

[23]

Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques 2^nd Ed. Morgan Kaufmann, San Fransisco.

Digital Library

Google Scholar

[24]

Han, Y., Wu, F., Jia, J., Zhuang, Y., and Yu, B. 2010. Multi-task sparse discriminant analysis (mtsda) with overlapping categories. In Proceedings of the 24^th AAAI Conference on Artificial Intelligence.

Google Scholar

[25]

Hazan, E., Agarwal, A., and Kale, S. 2007. Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69, 2--3, 169--192.

Digital Library

Google Scholar

[26]

Hu, C., Kwok, J., and Pan, W. 2009. Accelerated gradient methods for stochastic optimization and online learning. Adv. Neural Inf. Process. Syst. 22, 781--789.

Google Scholar

[27]

Jebara, T. 2004. Multi-task feature and kernel selection for svms. In Proceedings of the 21^st International Conference on Machine Learning (ICML'04).

Digital Library

Google Scholar

[28]

Jebara, T. 2011. Multitask sparsity via maximum entropy discrimination. J. Mach. Learn. Res. 12, 75--110.

Digital Library

Google Scholar

[29]

Langford, J., Li, L., and Zhang, T. 2009. Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777--801.

Digital Library

Google Scholar

[30]

Lenk, P. J., Desarbo, W. S., Green, P. E., and Young, M. R. 1996. Hierarchical bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Market. Sci. 15, 2, 173--191.

Digital Library

Google Scholar

[31]

Ling, G., Yang, H., King, I., and Lyu, M. R. 2012. Online learning for collaborative filtering. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI'12). 1--8.

Google Scholar

[32]

Liu, H., Palatucci, M., and Zhang, J. 2009a. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In Proceedings of the 26^th Annual International Conference on Machine Learning (ICML'09). 649--656.

Digital Library

Google Scholar

[33]

Liu, J., Chen, J., and Ye, J. 2009b. Large-scale sparse logistic regression. In Proceedings of the 15^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 547--556.

Digital Library

Google Scholar

[34]

Liu, J., Ji, S., and Ye, J. 2009c. Multi-task feature learning via efficient l2; 1 norm minimization. In Proceedings of the 25^th Conference on Uncertainty in Artificial Intelligence (UAI'09).

Digital Library

Google Scholar

[35]

Liu, J., Ji, S., and Ye., J. 2009d. Slep: Sparse learning with efficient projections. http://www.public.asu.edu/ye02/Software/SLEP.

Google Scholar

[36]

Nesterov, Y. 2009. Primal-dual subgradient methods for convex problems. Math. Program. 120, 1, 221--259.

Digital Library

Google Scholar

[37]

Obozinski, G., Taskar, B., and Jordan, M. I. 2009. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252.

Digital Library

Google Scholar

[38]

Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6, 3465--3489.

Digital Library

Google Scholar

[39]

Quattoni, A., Carreras, X., Collins, M., and Darrell, T. 2009. An efficient projection for l1, ∞ regularization. In Proceedings of the 26^th International Conference on Machine Learning (ICML'09), 857--864.

Digital Library

Google Scholar

[40]

Shalev-Shwartz, S. and Singer, Y. 2007. A primal-dual perspective of online learning algorithms. Mach. Learn. 69, 2--3, 115--142.

Digital Library

Google Scholar

[41]

Shalev-Shwartz, S., Singer, Y., and Srebro, N. 2007. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24^th International Conference on Machine learning (ICML'07). 807--814.

Digital Library

Google Scholar

[42]

Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B58, 1, 267--288.

Google Scholar

[43]

Vapnik, V. 1999. The Nature of Statistical Learning Theory 2^nd Ed. Springer, New York.

Digital Library

Google Scholar

[44]

Xiao, L. 2010. Dual averaging method for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543--2596.

Digital Library

Google Scholar

[45]

Xu, Z., Jin, R., Yang, H., King, I., and Lyu, M. R. 2010. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27^th International Conference on Machine Learning (ICML'10). 1175--1182.

Google Scholar

[46]

Yang, H., King, I., and Lyu, M. R. 2010a. Multi-task learning for one-class classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'10). 1--8.

Google Scholar

[47]

Yang, H., King, I., and Lyu, M. R. 2010b. Online learning for multi-task feature selection. In Proceedings of the 19^th ACM Conference on Information and Knowledge Management (CIKM'10). 1693--1696.

Digital Library

Google Scholar

[48]

Yang, H., King, I., and Lyu, M. R. 2011a. Sparse Learning under Regularization Framework 1^st Ed. Lambert Academic Publishing.

Google Scholar

[49]

Yang, H., Xu, Z., King, I., and Lyu, M. R. 2010c. Online learning for group lasso. In Proceedings of the 27^th International Conference on Machine Learning (ICML'10). 1191--1198.

Google Scholar

[50]

Yang, H., Xu, Z., Ye, J., King, I., and Lyu, M. R. 2011b. Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 22, 3, 433--446.

Digital Library

Google Scholar

[51]

Zhang, Y. 2010. Multi-task active learning with output constraints. In Proceedings of the 24^th AAAI Conference on Artificial Intelligence (AAAI'10).

Google Scholar

[52]

Zhang, Y., Yeung, D.-Y., and Xu, Q. 2010. Probabilistic multi-task feature selection. Adv. Neural Inf. Process. Syst. 23, 2559--2567.

Google Scholar

[53]

Zhao, P., Hoi, S. C. H., and Jin, R. 2011a. Double updating online learning. J. Mach. Learn. Res. 12, 1587--1615.

Digital Library

Google Scholar

[54]

Zhao, P., Hoi, S. C. H., Jin, R., and Yang, T. 2011b. Online auc maximization. In Proceedings of the 28^th International Conference on Machine Learning (ICML'11). 233--240.

Digital Library

Google Scholar

[55]

Zhou, Y., Jin, R., and Hoi, S. C. 2010. Exclusive lasso for multi-task feature selection. In Proceeding of the 14^th International Conference on Artificial Intelligence and Statistics (AISTATS'10).

Google Scholar

[56]

Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net. J. Royal Statist. Soc. B67, 301--320.

Crossref

Google Scholar

Cited By

View all

Shui CWang WHedhli IWong CWan FWang BGagné C(2023)Lifelong Online Learning from Accumulated KnowledgeACM Transactions on Knowledge Discovery from Data10.1145/356394717:4(1-23)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.1145/3563947
Kang YWang HPu BTao LChen JYu P(2023)A Hybrid Two-Stage Teaching-Learning-Based Optimization Algorithm for Feature Selection in BioinformaticsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.321512920:3(1746-1760)Online publication date: 1-May-2023
https://doi.org/10.1109/TCBB.2022.3215129
Zhou JZheng JCao BWu W(2023)MVMAFOL: A Multi-Access Three-Layer Federated Online Learning Algorithm for Internet of Vehicles2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191843(1-8)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191843
Show More Cited By

Index Terms

Efficient online learning for multitask feature selection
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Online learning for multi-task feature selection
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Multi-task feature selection (MTFS) is an important tool to learn the explanatory features across multiple related tasks. Previous MTFS methods fulfill this task in batch-mode training. This makes them inefficient when data come in sequence or when the ...
Read More
Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Lifelong multitask learning is a multitask learning framework in which a learning agent faces the tasks that need to be learnt in an online manner. Lifelong multitask learning framework may be applied to a variety of applications such as image ...
Read More
Multitask Learning
Special issue on inductive transfer

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared ...
Read More

Reviews

Reviewer: Anca Doloc-Mihu

Web search algorithms such as multitask feature selection (MTFS) aim to efficiently learn explanatory features from multiple related user tasks simultaneously. The algorithms extract shared information from these tasks and determine the relative weights, which are systematically adjusted when more information comes in. While existing MTFS algorithms use batch-mode training (learning the weights all at once), this paper proposes a new online learning method called dual-averaging MTFS (DA-MTFS), which is capable of learning the weights sequentially. The paper asserts that this algorithm outperforms the standard MTFS method in both time complexity and memory cost. The DA-MTFS algorithm consists of three steps at each iteration: calculate the subgradient of the loss function on the weights, calculate the average of the subgradients, and update the weights. The weights are updated sequentially by deriving closed-form solutions based on the average of previous gradients. The time complexity and memory cost at each iteration is a maximum of O ( d x Q ), where d is the number of feature dimensions and Q is the number of tasks. The authors prove the convergence rate of their algorithm via theoretical analysis and then apply it to real-world data about student ratings of personal computers [1]. Experimental data shows that the DA-MTFS algorithm has a performance close to the performance of the corresponding batch-trained algorithms, but at a lower time and memory cost. For those interested in online web-based learning algorithms, this paper is worth reading. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 7, Issue 2

July 2013

107 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/2499907

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2013

Accepted: 01 November 2012

Revised: 01 July 2012

Received: 01 January 2011

Published in TKDD Volume 7, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Ministry of Science and Technology of the People's Republic of China
Basic Research Program of Shenzhen (Project no. JCYJ20120619152419087)
Research Grants Council, University Grants Committee, Hong Kong

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
1,050
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

View all

Shui CWang WHedhli IWong CWan FWang BGagné C(2023)Lifelong Online Learning from Accumulated KnowledgeACM Transactions on Knowledge Discovery from Data10.1145/356394717:4(1-23)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.1145/3563947
Kang YWang HPu BTao LChen JYu P(2023)A Hybrid Two-Stage Teaching-Learning-Based Optimization Algorithm for Feature Selection in BioinformaticsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.321512920:3(1746-1760)Online publication date: 1-May-2023
https://doi.org/10.1109/TCBB.2022.3215129
Zhou JZheng JCao BWu W(2023)MVMAFOL: A Multi-Access Three-Layer Federated Online Learning Algorithm for Internet of Vehicles2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191843(1-8)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191843
Villa-Blanco CBielza CLarrañaga P(2023)Feature subset selection for data and feature streams: a reviewArtificial Intelligence Review10.1007/s10462-023-10546-956:Suppl 1(1011-1062)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1007/s10462-023-10546-9
Anjum B(2021)A conversation with Xiaokui ShuUbiquity10.1145/34574162021:March(1-6)Online publication date: 21-Mar-2021
https://dl.acm.org/doi/10.1145/3457416
Sychugov A(2021)Dynamic Selection of Variables When Analyzing the State of Technological ProcessesAdvances in Automation II10.1007/978-3-030-71119-1_60(617-623)Online publication date: 19-Mar-2021
https://doi.org/10.1007/978-3-030-71119-1_60
Li LLin YZhao HChen JLi S(2021)Causality‐based online streaming feature selectionConcurrency and Computation: Practice and Experience10.1002/cpe.634733:20Online publication date: 11-May-2021
https://doi.org/10.1002/cpe.6347
Dai CPi DBecker S(2020)Shapelet-transformed Multi-channel EEG Channel SelectionACM Transactions on Intelligent Systems and Technology10.1145/339785011:5(1-27)Online publication date: 10-Aug-2020
https://dl.acm.org/doi/10.1145/3397850
Liu RLiu RPugliese ASubrahmanian V(2020)STARSACM Transactions on Intelligent Systems and Technology10.1145/339746311:5(1-25)Online publication date: 24-Jul-2020
https://dl.acm.org/doi/10.1145/3397463
Chen CZhou JWu BFang WWang LQi YZheng X(2020)Practical Privacy Preserving POI RecommendationACM Transactions on Intelligent Systems and Technology10.1145/339413811:5(1-20)Online publication date: 5-Jul-2020
https://dl.acm.org/doi/10.1145/3394138
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Online learning for multi-task feature selection

Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning

Multitask Learning

Reviews

Access critical reviews of Computing literature here

Comments

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Online learning for multi-task feature selection

Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning

Multitask Learning

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations