research-article

Rectification-Based Knowledge Retention for Task Incremental Learning

Authors:

Pratik Mazumder,

Pravendra Singh,

Vinay P. NamboodiriAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 46, Issue 3

Pages 1561 - 1575

https://doi.org/10.1109/TPAMI.2022.3225310

Published: 01 March 2024 Publication History

Abstract

In the task incremental learning problem, deep learning models suffer from catastrophic forgetting of previously seen classes/tasks as they are trained on new classes/tasks. This problem becomes even harder when some of the test classes do not belong to the training class set, i.e., the task incremental generalized zero-shot learning problem. We propose a novel approach to address the task incremental learning problem for both the non zero-shot and zero-shot settings. Our proposed approach, called Rectification-based Knowledge Retention (RKR), applies weight rectifications and affine transformations for adapting the model to any task. During testing, our approach can use the task label information (task-aware) to quickly adapt the network to that task. We also extend our approach to make it task-agnostic so that it can work even when the task label information is not available during testing. Specifically, given a continuum of test data, our approach predicts the task and quickly adapts the network to the predicted task. We experimentally show that our proposed approach achieves state-of-the-art results on several benchmark datasets for both non zero-shot and zero-shot task incremental learning.

References

[1]

M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” Psychol. Learn. Motivation, vol. 24, pp. 109–165, 1989.

[2]

M. Delange et al., “A continual learning survey: Defying forgetting in classification tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3366–3385, Jul. 2021.

[3]

L. Zhang, T. Xiang, and S. Gong, “Learning a deep embedding model for zero-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2021–2030.

[4]

P. Singh, P. Mazumder, P. Rai, and V. P. Namboodiri, “Rectification-based knowledge retention for continual learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 15277–15286.

[5]

J. Rajasegaran, S. Khan, M. Hayat, F. S. Khan, and M. Shah, “iTAML: An incremental task-agnostic meta-learning approach,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13588–13597.

[6]

S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2001–2010.

[7]

R. Aljundi, K. Kelchtermans, and T. Tuytelaars, “Task-free continual learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11 254–11 263.

[8]

L. Yu et al., “Semantic drift compensation for class-incremental learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6982–6991.

[9]

R. Kemker and C. Kanan, “FearNet: Brain-inspired model for incremental learning,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=SJ1Xmf-Rb

[10]

N. Kamra, U. Gupta, and Y. Liu, “Deep generative dual memory network for continual learning,” 2017,. [Online]. Available: https://arxiv.org/pdf/1710.10368.pdf

[11]

D. Lopez-Paz and M. A. Ranzato, “Gradient episodic memory for continual learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6467–6476.

[12]

Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 12, pp. 2935–2947, Dec. 2018.

Digital Library

[13]

F. M. Castro, M. J. Marín-Jiménez, N. Guil, C. Schmid, and K. Alahari, “End-to-end incremental learning,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 233–248.

[14]

J. Zhang et al., “Class-incremental learning via deep model consolidation,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2020, pp. 1120–1129.

[15]

R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 139–154.

[16]

J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural networks,” Proc. Nat. Acad. Sci., vol. 114, no. 13, pp. 3521–3526, 2017.

[17]

F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3987–3995.

[18]

A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. S. Torr, “Riemannian walk for incremental learning: Understanding forgetting and intransigence,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 556–572.

[19]

J. Schwarz et al., “Progress & compress: A scalable framework for continual learning,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 4535–4544.

[20]

A. A. Rusu et al., “Progressive neural networks,” 2016,. [Online]. Available: https://arxiv.org/pdf/1606.04671.pdf

[21]

J. Xu and Z. Zhu, “Reinforced continual learning,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 899–908.

[22]

J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with dynamically expandable networks,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=Sk7KsfW0-

[23]

J. Rajasegaran, M. Hayat, S. H. Khan, F. S. Khan, and L. Shao, “Random path selection for continual learning,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 12669–12679.

[24]

J. Yoon, S. Kim, E. Yang, and S. J. Hwang, “Scalable and order-robust continual learning with additive parameter decomposition,” in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://openreview.net/forum?id=r1gdj2EKPB

[25]

A. Rosenfeld and J. K. Tsotsos, “Incremental learning through deep adaptation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 3, pp. 651–663, Mar. 2020.

[26]

P. Singh, V. K. Verma, P. Mazumder, L. Carin, and P. Rai, “Calibrating CNNs for lifelong learning,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 15579–15590.

[27]

B. Cheung, A. Terekhov, Y. Chen, P. Agrawal, and B. Olshausen, “Superposition of many models into one,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 10868–10877.

[28]

A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7765–7773.

[29]

L. Chen, H. Zhang, J. Xiao, W. Liu, and S.-F. Chang, “Zero-shot visual recognition using semantics-preserving adversarial embedding networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1043–1052.

[30]

R. Felix, V. B. Kumar, I. Reid, and G. Carneiro, “Multi-modal cycle-consistent generalized zero-shot learning,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 21–37.

[31]

Y. Zhu, M. Elhoseiny, B. Liu, X. Peng, and A. Elgammal, “A generative adversarial approach for zero-shot learning from noisy texts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1004–1013.

[32]

Y. Xian, T. Lorenz, B. Schiele, and Z. Akata, “Feature generating networks for zero-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5542–5551.

[33]

E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata, “Generalized zero-and few-shot learning via aligned variational autoencoders,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8247–8255.

[34]

K. Wei, C. Deng, and X. Yang, “Lifelong zero-shot learning,” in Proc. 29th Int. Joint Conf. Artif. Intell., 2020, pp. 551–557.

[35]

M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 41–50.

[36]

W. Liu, X. Wang, J. Owens, and Y. Li, “Energy-based out-of-distribution detection,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 21 464–21 475, 2020.

[37]

J. J. Thiagarajan, B. Venkatesh, P. Sattigeri, and P.-T. Bremer, “Building calibrated deep models via uncertainty matching with auxiliary interval predictors,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 6005–6012.

[38]

A. Krizhevsky et al., Learning multiple layers of features from tiny images, 2009. [Online]. Available: http://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf

[39]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011. [Online]. Available: http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf

[40]

O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.

Digital Library

[41]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[42]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.

[43]

J. von Oswald, C. Henning, J. Sacramento, and B. F. Grewe, “Continual learning with hypernetworks,” in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJgwNerKvB

[44]

A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1778–1785.

[45]

Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata, “Zero-shot learning–A comprehensive evaluation of the good, the bad and the ugly,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 9, pp. 2251–2265, Sep. 2019.

[46]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-UCSD birds-200–2011 dataset, 2011. [Online]. Available: https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf

[47]

G. Patterson and J. Hays, “Sun attribute database: Discovering, annotating, and recognizing scene attributes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 2751–2758.

Index Terms

Rectification-Based Knowledge Retention for Task Incremental Learning
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Knowledge Distillation for Multi-task Learning
Computer Vision – ECCV 2020 Workshops
Abstract
Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with ...
Few-shot partial multi-label learning via prototype rectification
Abstract
Partial multi-label learning (PML) models the scenario where each training sample is annotated with a candidate label set, among which only a subset corresponds to the ground-truth labels. Existing PML approaches generally promise that there are ...
Incremental Task Learning with Incremental Rank Updates
Computer Vision – ECCV 2022
Abstract
Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 46, Issue 3

March 2024

579 pages

Issue’s Table of Contents

0162-8828 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 March 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents