Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Rectification-Based Knowledge Retention for Task Incremental Learning

Published: 01 March 2024 Publication History

Abstract

In the task incremental learning problem, deep learning models suffer from catastrophic forgetting of previously seen classes/tasks as they are trained on new classes/tasks. This problem becomes even harder when some of the test classes do not belong to the training class set, i.e., the task incremental generalized zero-shot learning problem. We propose a novel approach to address the task incremental learning problem for both the non zero-shot and zero-shot settings. Our proposed approach, called Rectification-based Knowledge Retention (RKR), applies weight rectifications and affine transformations for adapting the model to any task. During testing, our approach can use the task label information (task-aware) to quickly adapt the network to that task. We also extend our approach to make it task-agnostic so that it can work even when the task label information is not available during testing. Specifically, given a continuum of test data, our approach predicts the task and quickly adapts the network to the predicted task. We experimentally show that our proposed approach achieves state-of-the-art results on several benchmark datasets for both non zero-shot and zero-shot task incremental learning.

References

[1]
M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” Psychol. Learn. Motivation, vol. 24, pp. 109–165, 1989.
[2]
M. Delange et al., “A continual learning survey: Defying forgetting in classification tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3366–3385, Jul. 2021.
[3]
L. Zhang, T. Xiang, and S. Gong, “Learning a deep embedding model for zero-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2021–2030.
[4]
P. Singh, P. Mazumder, P. Rai, and V. P. Namboodiri, “Rectification-based knowledge retention for continual learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 15277–15286.
[5]
J. Rajasegaran, S. Khan, M. Hayat, F. S. Khan, and M. Shah, “iTAML: An incremental task-agnostic meta-learning approach,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13588–13597.
[6]
S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2001–2010.
[7]
R. Aljundi, K. Kelchtermans, and T. Tuytelaars, “Task-free continual learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11 254–11 263.
[8]
L. Yu et al., “Semantic drift compensation for class-incremental learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6982–6991.
[9]
R. Kemker and C. Kanan, “FearNet: Brain-inspired model for incremental learning,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=SJ1Xmf-Rb
[10]
N. Kamra, U. Gupta, and Y. Liu, “Deep generative dual memory network for continual learning,” 2017,. [Online]. Available: https://arxiv.org/pdf/1710.10368.pdf
[11]
D. Lopez-Paz and M. A. Ranzato, “Gradient episodic memory for continual learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6467–6476.
[12]
Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 12, pp. 2935–2947, Dec. 2018.
[13]
F. M. Castro, M. J. Marín-Jiménez, N. Guil, C. Schmid, and K. Alahari, “End-to-end incremental learning,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 233–248.
[14]
J. Zhang et al., “Class-incremental learning via deep model consolidation,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2020, pp. 1120–1129.
[15]
R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 139–154.
[16]
J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural networks,” Proc. Nat. Acad. Sci., vol. 114, no. 13, pp. 3521–3526, 2017.
[17]
F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3987–3995.
[18]
A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. S. Torr, “Riemannian walk for incremental learning: Understanding forgetting and intransigence,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 556–572.
[19]
J. Schwarz et al., “Progress & compress: A scalable framework for continual learning,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 4535–4544.
[20]
A. A. Rusu et al., “Progressive neural networks,” 2016,. [Online]. Available: https://arxiv.org/pdf/1606.04671.pdf
[21]
J. Xu and Z. Zhu, “Reinforced continual learning,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 899–908.
[22]
J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with dynamically expandable networks,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=Sk7KsfW0-
[23]
J. Rajasegaran, M. Hayat, S. H. Khan, F. S. Khan, and L. Shao, “Random path selection for continual learning,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 12669–12679.
[24]
J. Yoon, S. Kim, E. Yang, and S. J. Hwang, “Scalable and order-robust continual learning with additive parameter decomposition,” in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://openreview.net/forum?id=r1gdj2EKPB
[25]
A. Rosenfeld and J. K. Tsotsos, “Incremental learning through deep adaptation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 3, pp. 651–663, Mar. 2020.
[26]
P. Singh, V. K. Verma, P. Mazumder, L. Carin, and P. Rai, “Calibrating CNNs for lifelong learning,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 15579–15590.
[27]
B. Cheung, A. Terekhov, Y. Chen, P. Agrawal, and B. Olshausen, “Superposition of many models into one,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 10868–10877.
[28]
A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7765–7773.
[29]
L. Chen, H. Zhang, J. Xiao, W. Liu, and S.-F. Chang, “Zero-shot visual recognition using semantics-preserving adversarial embedding networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1043–1052.
[30]
R. Felix, V. B. Kumar, I. Reid, and G. Carneiro, “Multi-modal cycle-consistent generalized zero-shot learning,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 21–37.
[31]
Y. Zhu, M. Elhoseiny, B. Liu, X. Peng, and A. Elgammal, “A generative adversarial approach for zero-shot learning from noisy texts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1004–1013.
[32]
Y. Xian, T. Lorenz, B. Schiele, and Z. Akata, “Feature generating networks for zero-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5542–5551.
[33]
E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata, “Generalized zero-and few-shot learning via aligned variational autoencoders,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8247–8255.
[34]
K. Wei, C. Deng, and X. Yang, “Lifelong zero-shot learning,” in Proc. 29th Int. Joint Conf. Artif. Intell., 2020, pp. 551–557.
[35]
M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 41–50.
[36]
W. Liu, X. Wang, J. Owens, and Y. Li, “Energy-based out-of-distribution detection,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 21 464–21 475, 2020.
[37]
J. J. Thiagarajan, B. Venkatesh, P. Sattigeri, and P.-T. Bremer, “Building calibrated deep models via uncertainty matching with auxiliary interval predictors,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 6005–6012.
[38]
A. Krizhevsky et al., Learning multiple layers of features from tiny images, 2009. [Online]. Available: http://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf
[39]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011. [Online]. Available: http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf
[40]
O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[41]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[42]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[43]
J. von Oswald, C. Henning, J. Sacramento, and B. F. Grewe, “Continual learning with hypernetworks,” in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJgwNerKvB
[44]
A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1778–1785.
[45]
Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata, “Zero-shot learning–A comprehensive evaluation of the good, the bad and the ugly,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 9, pp. 2251–2265, Sep. 2019.
[46]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-UCSD birds-200–2011 dataset, 2011. [Online]. Available: https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf
[47]
G. Patterson and J. Hays, “Sun attribute database: Discovering, annotating, and recognizing scene attributes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 2751–2758.

Index Terms

  1. Rectification-Based Knowledge Retention for Task Incremental Learning
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
        IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 46, Issue 3
        March 2024
        579 pages

        Publisher

        IEEE Computer Society

        United States

        Publication History

        Published: 01 March 2024

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 08 Mar 2025

        Other Metrics

        Citations

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media