Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient online learning for multitask feature selection

Published: 02 August 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Learning explanatory features across multiple related tasks, or MultiTask Feature Selection (MTFS), is an important problem in the applications of data mining, machine learning, and bioinformatics. Previous MTFS methods fulfill this task by batch-mode training. This makes them inefficient when data come sequentially or when the number of training data is so large that they cannot be loaded into the memory simultaneously. In order to tackle these problems, we propose a novel online learning framework to solve the MTFS problem. A main advantage of the online algorithm is its efficiency in both time complexity and memory cost. The weights of the MTFS models at each iteration can be updated by closed-form solutions based on the average of previous subgradients. This yields the worst-case bounds of the time complexity and memory cost at each iteration, both in the order of O(d × Q), where d is the number of feature dimensions and Q is the number of tasks. Moreover, we provide theoretical analysis for the average regret of the online learning algorithms, which also guarantees the convergence rate of the algorithms. Finally, we conduct detailed experiments to show the characteristics and merits of the online learning algorithms in solving several MTFS problems.

    References

    [1]
    Aaker, D. A., Kumar, V., and Day, G. S. 2006. Marketing Research 9th Ed. Wiley.
    [2]
    Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853.
    [3]
    Argyriou, A., Evgeniou, T., and Pontil, M. 2006. Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41--48.
    [4]
    Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272.
    [5]
    Bai, J., Zhou, K., Xue, G.-R., Zha, H., Sun, G., Tseng, B. L., Zheng, Z., and Chang, Y. 2009. Multi-task learning for learning to rank in web search. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). 1549--1552.
    [6]
    Bakker, B. and Heskes, T. 2003. Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83--99.
    [7]
    Balakrishnan, S. and Madigan, D. 2008. Algorithms for sparse linear classifiers in the massive data setting. J. Mach. Learn. Res. 9, 313--337.
    [8]
    Ben-David, S. and Borbely, R. S. 2008. A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73, 3, 273--287.
    [9]
    Ben-David, S. and Schuller, R. 2003. Exploiting task relatedness for mulitple task learning. In Proceedings of the 16th Annual Conference on Learning Theory (COLT'03). 567--580.
    [10]
    Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press.
    [11]
    Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75.
    [12]
    Chen, J., Liu, J., and Ye, J. 2010. Learning incoherent sparse and low-rank patterns from multiple tasks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10). 1179--1188.
    [13]
    Chen, J., Liu, J., and Ye, J. 2012. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Trans. Knowl. Discov. Data 5, 4.
    [14]
    Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML'09). 18.
    [15]
    Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM'09). 746--751.
    [16]
    Dekel, O., Long, P. M., and Singer, Y. 2006. Online multitask learning. In Proceedings of the 19th Annual Conference on Learning Theory (COLT'06). 453--467.
    [17]
    Dhillon, P. S., Foster, D. P., and Ungar, L. H. 2011. Minimum description length penalization for group and multi-task sparse learning. J. Mach. Learn. Res. 12, 525--564.
    [18]
    Dhillon, P. S., Tomasik, B., Foster, D. P., and Ungar, L. H. 2009. Multi-task feature selection using the multiple inclusion criterion (mic). In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'09). 276--289.
    [19]
    Duchi, J. and Singer, Y. 2009. Efficient learning using forward-backward splitting. J. Mach. Learn. Res. 10, 2873--2898.
    [20]
    Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637.
    [21]
    Evgeniou, T. and Pontil, M. 2004. Regularized multi--task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). 109--117.
    [22]
    Friedman, J., Hastie, T., and Tibshirani, R. 2010. A note on the group lasso and a sparse group lasso. http://arxiv.org/pdf/1001.0736.pdf.
    [23]
    Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques 2nd Ed. Morgan Kaufmann, San Fransisco.
    [24]
    Han, Y., Wu, F., Jia, J., Zhuang, Y., and Yu, B. 2010. Multi-task sparse discriminant analysis (mtsda) with overlapping categories. In Proceedings of the 24th AAAI Conference on Artificial Intelligence.
    [25]
    Hazan, E., Agarwal, A., and Kale, S. 2007. Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69, 2--3, 169--192.
    [26]
    Hu, C., Kwok, J., and Pan, W. 2009. Accelerated gradient methods for stochastic optimization and online learning. Adv. Neural Inf. Process. Syst. 22, 781--789.
    [27]
    Jebara, T. 2004. Multi-task feature and kernel selection for svms. In Proceedings of the 21st International Conference on Machine Learning (ICML'04).
    [28]
    Jebara, T. 2011. Multitask sparsity via maximum entropy discrimination. J. Mach. Learn. Res. 12, 75--110.
    [29]
    Langford, J., Li, L., and Zhang, T. 2009. Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777--801.
    [30]
    Lenk, P. J., Desarbo, W. S., Green, P. E., and Young, M. R. 1996. Hierarchical bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Market. Sci. 15, 2, 173--191.
    [31]
    Ling, G., Yang, H., King, I., and Lyu, M. R. 2012. Online learning for collaborative filtering. In Proceedings of the IEEE World Congress on Computational Intelligence (WCCI'12). 1--8.
    [32]
    Liu, H., Palatucci, M., and Zhang, J. 2009a. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML'09). 649--656.
    [33]
    Liu, J., Chen, J., and Ye, J. 2009b. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 547--556.
    [34]
    Liu, J., Ji, S., and Ye, J. 2009c. Multi-task feature learning via efficient l2; 1 norm minimization. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI'09).
    [35]
    Liu, J., Ji, S., and Ye., J. 2009d. Slep: Sparse learning with efficient projections. http://www.public.asu.edu/ye02/Software/SLEP.
    [36]
    Nesterov, Y. 2009. Primal-dual subgradient methods for convex problems. Math. Program. 120, 1, 221--259.
    [37]
    Obozinski, G., Taskar, B., and Jordan, M. I. 2009. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252.
    [38]
    Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6, 3465--3489.
    [39]
    Quattoni, A., Carreras, X., Collins, M., and Darrell, T. 2009. An efficient projection for l1, ∞ regularization. In Proceedings of the 26th International Conference on Machine Learning (ICML'09), 857--864.
    [40]
    Shalev-Shwartz, S. and Singer, Y. 2007. A primal-dual perspective of online learning algorithms. Mach. Learn. 69, 2--3, 115--142.
    [41]
    Shalev-Shwartz, S., Singer, Y., and Srebro, N. 2007. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th International Conference on Machine learning (ICML'07). 807--814.
    [42]
    Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B58, 1, 267--288.
    [43]
    Vapnik, V. 1999. The Nature of Statistical Learning Theory 2nd Ed. Springer, New York.
    [44]
    Xiao, L. 2010. Dual averaging method for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543--2596.
    [45]
    Xu, Z., Jin, R., Yang, H., King, I., and Lyu, M. R. 2010. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27th International Conference on Machine Learning (ICML'10). 1175--1182.
    [46]
    Yang, H., King, I., and Lyu, M. R. 2010a. Multi-task learning for one-class classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'10). 1--8.
    [47]
    Yang, H., King, I., and Lyu, M. R. 2010b. Online learning for multi-task feature selection. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM'10). 1693--1696.
    [48]
    Yang, H., King, I., and Lyu, M. R. 2011a. Sparse Learning under Regularization Framework 1st Ed. Lambert Academic Publishing.
    [49]
    Yang, H., Xu, Z., King, I., and Lyu, M. R. 2010c. Online learning for group lasso. In Proceedings of the 27th International Conference on Machine Learning (ICML'10). 1191--1198.
    [50]
    Yang, H., Xu, Z., Ye, J., King, I., and Lyu, M. R. 2011b. Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 22, 3, 433--446.
    [51]
    Zhang, Y. 2010. Multi-task active learning with output constraints. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI'10).
    [52]
    Zhang, Y., Yeung, D.-Y., and Xu, Q. 2010. Probabilistic multi-task feature selection. Adv. Neural Inf. Process. Syst. 23, 2559--2567.
    [53]
    Zhao, P., Hoi, S. C. H., and Jin, R. 2011a. Double updating online learning. J. Mach. Learn. Res. 12, 1587--1615.
    [54]
    Zhao, P., Hoi, S. C. H., Jin, R., and Yang, T. 2011b. Online auc maximization. In Proceedings of the 28th International Conference on Machine Learning (ICML'11). 233--240.
    [55]
    Zhou, Y., Jin, R., and Hoi, S. C. 2010. Exclusive lasso for multi-task feature selection. In Proceeding of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS'10).
    [56]
    Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net. J. Royal Statist. Soc. B67, 301--320.

    Cited By

    View all
    • (2023)Lifelong Online Learning from Accumulated KnowledgeACM Transactions on Knowledge Discovery from Data10.1145/356394717:4(1-23)Online publication date: 24-Feb-2023
    • (2023)A Hybrid Two-Stage Teaching-Learning-Based Optimization Algorithm for Feature Selection in BioinformaticsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.321512920:3(1746-1760)Online publication date: 1-May-2023
    • (2023)MVMAFOL: A Multi-Access Three-Layer Federated Online Learning Algorithm for Internet of Vehicles2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191843(1-8)Online publication date: 18-Jun-2023
    • Show More Cited By

    Index Terms

    1. Efficient online learning for multitask feature selection

        Recommendations

        Reviews

        Anca Doloc-Mihu

        Web search algorithms such as multitask feature selection (MTFS) aim to efficiently learn explanatory features from multiple related user tasks simultaneously. The algorithms extract shared information from these tasks and determine the relative weights, which are systematically adjusted when more information comes in. While existing MTFS algorithms use batch-mode training (learning the weights all at once), this paper proposes a new online learning method called dual-averaging MTFS (DA-MTFS), which is capable of learning the weights sequentially. The paper asserts that this algorithm outperforms the standard MTFS method in both time complexity and memory cost. The DA-MTFS algorithm consists of three steps at each iteration: calculate the subgradient of the loss function on the weights, calculate the average of the subgradients, and update the weights. The weights are updated sequentially by deriving closed-form solutions based on the average of previous gradients. The time complexity and memory cost at each iteration is a maximum of O ( d x Q ), where d is the number of feature dimensions and Q is the number of tasks. The authors prove the convergence rate of their algorithm via theoretical analysis and then apply it to real-world data about student ratings of personal computers [1]. Experimental data shows that the DA-MTFS algorithm has a performance close to the performance of the corresponding batch-trained algorithms, but at a lower time and memory cost. For those interested in online web-based learning algorithms, this paper is worth reading. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 7, Issue 2
        July 2013
        107 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/2499907
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 August 2013
        Accepted: 01 November 2012
        Revised: 01 July 2012
        Received: 01 January 2011
        Published in TKDD Volume 7, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Supervised learning
        2. dual averaging method
        3. feature selection
        4. multitask learning
        5. online learning

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)16
        • Downloads (Last 6 weeks)0

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Lifelong Online Learning from Accumulated KnowledgeACM Transactions on Knowledge Discovery from Data10.1145/356394717:4(1-23)Online publication date: 24-Feb-2023
        • (2023)A Hybrid Two-Stage Teaching-Learning-Based Optimization Algorithm for Feature Selection in BioinformaticsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.321512920:3(1746-1760)Online publication date: 1-May-2023
        • (2023)MVMAFOL: A Multi-Access Three-Layer Federated Online Learning Algorithm for Internet of Vehicles2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191843(1-8)Online publication date: 18-Jun-2023
        • (2023)Feature subset selection for data and feature streams: a reviewArtificial Intelligence Review10.1007/s10462-023-10546-956:Suppl 1(1011-1062)Online publication date: 13-Jul-2023
        • (2021)A conversation with Xiaokui ShuUbiquity10.1145/34574162021:March(1-6)Online publication date: 21-Mar-2021
        • (2021)Dynamic Selection of Variables When Analyzing the State of Technological ProcessesAdvances in Automation II10.1007/978-3-030-71119-1_60(617-623)Online publication date: 19-Mar-2021
        • (2021)Causality‐based online streaming feature selectionConcurrency and Computation: Practice and Experience10.1002/cpe.634733:20Online publication date: 11-May-2021
        • (2020)Shapelet-transformed Multi-channel EEG Channel SelectionACM Transactions on Intelligent Systems and Technology10.1145/339785011:5(1-27)Online publication date: 10-Aug-2020
        • (2020)STARSACM Transactions on Intelligent Systems and Technology10.1145/339746311:5(1-25)Online publication date: 24-Jul-2020
        • (2020)Practical Privacy Preserving POI RecommendationACM Transactions on Intelligent Systems and Technology10.1145/339413811:5(1-20)Online publication date: 5-Jul-2020
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media