research-article

Unsupervised Lifelong Learning with Curricula

Authors:

Xindong WuAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 3534 - 3545

https://doi.org/10.1145/3442381.3449839

Published: 03 June 2021 Publication History

Abstract

Lifelong machine learning (LML) has driven the development of extensive web applications, enabling the learning systems deployed on web servers to deal with a sequence of tasks in an incremental fashion. Such systems can retain knowledge from learned tasks in a knowledge base and seamlessly apply it to improve the future learning. Unfortunately, most existing LML methods require labels in every task, whereas providing persistent human labeling for all future tasks is costly, onerous, error-prone, and hence impractical. Motivated by this situation, we propose a new paradigm named unsupervised lifelong learning with curricula (ULLC), where only one task needs to be labeled for initialization and the system then performs lifelong learning for subsequent tasks in an unsupervised fashion. A main challenge of realizing this paradigm lies in the occurrence of negative knowledge transfer, where partial old knowledge becomes detrimental for learning a given task yet cannot be filtered out by the learner without the help of labels. To overcome this challenge, we draw insights from the learning behaviors of humans. Specifically, when faced with a difficult task that cannot be well tackled by our current knowledge, we usually postpone it and work on some easier tasks first, which allows us to grow our knowledge. Thereafter, once we go back to the postponed task, we are more likely to tackle it well as we are more knowledgeable now. The key idea of ULLC is similar – at any time, a pool of candidate tasks are organized in a curriculum by their distances to the knowledge base. The learner then starts from the closer tasks, accumulates knowledge from learning them, and moves to learn the faraway tasks with a gradually augmented knowledge base. The viability and effectiveness of our proposal are substantiated through extensive empirical studies on both synthetic and real datasets.

References

[1]

Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum. 2018. Never-ending learning for open-domain question answering over knowledge bases. In WWW. 1053–1062.

[2]

Arvind Agarwal, Samuel Gerber, and Hal Daume. 2010. Learning multiple tasks using manifold regularization. In NIPS. 46–54.

[3]

Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In CVPR. 3366–3375.

[4]

Maria-Florina Balcan and Avrim Blum. 2005. A PAC-style model for learning from labeled and unlabeled data. In COLT. Springer, 111–126.

[5]

María Teresa Ballestar, Pilar Grau-Carles, and Jorge Sainz. 2019. Predicting customer quality in e-commerce social networks: a machine learning approach. Review of Managerial Science 13, 3 (2019), 589–603.

[6]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In COLT. 92–100.

[7]

Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, and Qiang Yang. 2019. Learning to transfer examples for partial domain adaptation. In CVPR. 2985–2994.

[8]

Shih-Fu Chang, Wei-Ying Ma, and Arnold Smeulders. 2007. Recent advances and challenges of semantic image/video search. In ICASSP, Vol. 4. IEEE, IV–1205.

[9]

Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In WWW. 1–10.

[10]

Hsinchun Chen and Michael Chau. 2004. Web mining: Machine learning for web applications. Annual review of information science and technology 38, 1, 289–329.

[11]

Zhiyuan Chen, Nianzu Ma, and Bing Liu. 2015. Lifelong learning for sentiment classification. In ACL.

[12]

Chelsea Finn, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. 2019. Online Meta-Learning. In ICML. 1920–1930.

[13]

Yoav Freund and Robert E Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55, 1 (1997), 119–139.

Digital Library

[14]

Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In ICML. 1180–1189.

[15]

Alexander Gepperth and Cem Karaoguz. 2016. A bio-inspired incremental learning architecture for applied perceptual problems. Cognitive Computation 8, 5 (2016), 924–934.

[16]

Silviu Guiaşu. 1971. Weighted entropy. Reports on Mathematical Physics 2, 3 (1971), 165–179.

[17]

Yunhui Guo, Yandong Li, Liqiang Wang, and Tajana Rosing. 2020. AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning. In AAAI.

[18]

Lei Han and Yu Zhang. 2016. Multi-stage multi-task learning with reduced rank. In AAAI.

[19]

Shaobo Han, Xuejun Liao, and Lawrence Carin. 2012. Cross-domain multitask learning with latent probit models. In ICML.

[20]

Yuan Hao, Yanping Chen, Jesin Zakaria, Bing Hu, Thanawin Rakthanmanon, and Eamonn Keogh. 2013. Towards never-ending learning from time series streams. In KDD. 874–882.

[21]

Yi He, Baijun Wu, Di Wu, Ege Beyazit, Sheng Chen, and Xindong Wu. 2019. Online learning from capricious data streams: a generative approach. In IJCAI. 2491–2497.

[22]

Yi He, Baijun Wu, D Wu, and X Wu. 2020. On partial multi-task learning. In ECAI. 1174 – 1181.

[23]

Wenpeng Hu, Zhou Lin, Bing Liu, Chongyang Tao, Zhengwei Tao, Jinwen Ma, Dongyan Zhao, and Rui Yan. 2018. Overcoming catastrophic forgetting for continual learning via model adaptation. In ICLR.

[24]

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. 2016. Deep networks with stochastic depth. In ECCV. Springer, 646–661.

[25]

Yunhun Jang, Hankook Lee, Sung Ju Hwang, and Jinwoo Shin. 2019. Learning what and where to transfer. In ICML.

[26]

Lu Jiang. 2016. Web-scale multimedia search for internet video content. In WWW. 311–316.

[27]

Thorsten Joachims. 2003. Transductive learning via spectral graph partitioning. In ICML. 290–297.

[28]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13(2017), 3521–3526.

[29]

Abhishek Kumar and Hal Daume III. 2012. Learning task grouping and overlap in multi-task learning. In ICML. 1383–1390.

[30]

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2017. Fractalnet: Ultra-deep neural networks without residuals. In ICLR.

[31]

Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40, 12(2017), 2935–2947.

[32]

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I Jordan. 2015. Learning transferable features with deep adaptation networks. In ICML. 97–105.

[33]

David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In NeurIPS. 6467–6476.

[34]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.

[35]

Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi, Matt Gardner, Bryan Kisiel, 2018. Never-ending learning. Commun. ACM 61, 5 (2018), 103–115.

Digital Library

[36]

Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10(2009), 1345–1359.

Digital Library

[37]

German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. 2019. Continual lifelong learning with neural networks: A review. Neural Networks 113(2019), 54–71.

Digital Library

[38]

Shafin Rahman, Salman Khan, and Nick Barnes. 2019. Transductive learning for zero-shot object detection. In CVPR. 6082–6091.

[39]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In CVPR. 2001–2010.

[40]

Steffen Rendle and Lars Schmidt-Thieme. 2008. Online-updating regularized kernel matrix factorization models for large-scale recommender systems. In ACM Recommender systems. 251–258.

[41]

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. 2019. Experience replay for continual learning. In NeurIPS. 348–358.

[42]

Paul Ruvolo and Eric Eaton. 2013. Active task selection for lifelong machine learning. In AAAI.

[43]

Doyen Sahoo, Quang Pham, Jing Lu, and Steven CH Hoi. 2018. Online deep learning: Learning deep neural networks on the fly. In IJCAI.

[44]

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized experience replay. In ICLR.

[45]

Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. 2017. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE International Conference on Computer Vision. 3400–3409.

[46]

Daniel L Silver, Qiang Yang, and Lianghao Li. 2013. Lifelong machine learning systems: Beyond learning algorithms. In AAAI.

[47]

Tomer Simon, Avishay Goldberg, Limor Aharonson-Daniel, Dmitry Leykin, and Bruria Adini. 2014. Twitter in the cross fire—the use of social media in the Westgate Mall terror attack in Kenya. PloS one 9, 8 (2014), e104136.

[48]

Partha Pratim Talukdar and Koby Crammer. 2009. New regularized algorithms for transductive learning. In ECML-PKDD. Springer, 442–457.

[49]

Ben Tan, Yangqiu Song, Erheng Zhong, and Qiang Yang. 2015. Transitive transfer learning. In KDD. 1155–1164.

[50]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Glue: A multi-task benchmark and analysis platform for natural language understanding. In ICLR.

[51]

Chang Wang and Sridhar Mahadevan. 2011. Heterogeneous domain adaptation using manifold alignment. In IJCAI.

[52]

Zirui Wang, Zihang Dai, Barnabás Póczos, and Jaime Carbonell. 2019. Characterizing and avoiding negative transfer. In CVPR. 11293–11302.

[53]

Baijun Wu, John Peter Campora III, Yi He, Alexander Schlecht, and Sheng Chen. 2019. Generating precise error specifications for C: a zero shot learning approach. In OOPSLA. ACM, 160–191.

[54]

Yonghui Xu, Sinno Jialin Pan, Hui Xiong, Qingyao Wu, Ronghua Luo, Huaqing Min, and Hengjie Song. 2017. A unified framework for metric transfer learning. IEEE Transactions on Knowledge and Data Engineering 29, 6(2017), 1158–1171.

Digital Library

[55]

Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. 2007. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research 8, Jan (2007), 35–63.

Digital Library

[56]

Ziang Yan, Yiwen Guo, and Changshui Zhang. 2019. Adversarial Margin Maximization Networks. IEEE transactions on pattern analysis and machine intelligence (2019).

[57]

Wei Ying, Yu Zhang, Junzhou Huang, and Qiang Yang. 2018. Transfer learning via learning to transfer. In ICML. 5085–5094.

[58]

Yizhou Zhang, Guojie Song, Lun Du, Shuwen Yang, and Yilun Jin. 2019. DANE: Domain Adaptive Network Embedding. In IJCAI.

[59]

Yu Zhang and Dit-Yan Yeung. 2010. Transfer metric learning by learning task relationships. In KDD. 1199–1208.

[60]

Yu Zhang and Dit-Yan Yeung. 2011. Multi-task learning in heterogeneous feature spaces. In AAAI.

[61]

Dengyong Zhou and Christopher JC Burges. 2007. Spectral clustering and transductive learning with multiple views. In ICML. 1159–1166.

Cited By

Lian HWu DHou BWu JHe Y(2024)Online Learning From Evolving Feature Spaces With Deep Variational ModelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332636536:8(4144-4162)Online publication date: Aug-2024
https://doi.org/10.1109/TKDE.2023.3326365
Yuan WYin HHe TChen TWang QCui L(2022)Unified Question Generation with Continual Lifelong LearningProceedings of the ACM Web Conference 202210.1145/3485447.3511930(871-881)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3511930

Recommendations

Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Lifelong multitask learning is a multitask learning framework in which a learning agent faces the tasks that need to be learnt in an online manner. Lifelong multitask learning framework may be applied to a variety of applications such as image ...
Scalable lifelong reinforcement learning

Deriving a novel scalable algorithm for lifelong policy search in reinforcement learning.Acquiring linear convergence rate of our new algorithmDemonstrating the effectiveness of our technique dynamical systems and learning speed-ups on unobserved tasks. ...
Lifelong Machine Learning and Computer Reading the Web
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

This tutorial introduces Lifelong Machine Learning (LML) and Machine Reading. The core idea of LML is to learn continuously and accumulate the learned knowledge, and to use the knowledge to help future learning, which is perhaps the hallmark of human ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
315
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lian HWu DHou BWu JHe Y(2024)Online Learning From Evolving Feature Spaces With Deep Variational ModelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332636536:8(4144-4162)Online publication date: Aug-2024
https://doi.org/10.1109/TKDE.2023.3326365
Yuan WYin HHe TChen TWang QCui L(2022)Unified Question Generation with Continual Lifelong LearningProceedings of the ACM Web Conference 202210.1145/3485447.3511930(871-881)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3511930

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten