research-article

Layerwise proximal replay: a proximal point method for online continual learning

AUTHORs:

Geoff PleissAuthors Info & Claims

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 2360, Pages 57199 - 57216

Published: 21 July 2024 Publication History

Abstract

In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.

References

[1]

Ash, J. and Adams, R. P. On warm-starting neural network training. Advances in neural information processing systems, 33:3884-3894, 2020.

[2]

Buzzega, P., Boschini, M., Porrello, A., Abati, D., and Calderara, S. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920-15930, 2020.

[3]

Caccia, L., Aljundi, R., Asadi, N., Tuytelaars, T., Pineau, J., and Belilovsky, E. New insights on reducing abrupt representation change in online continual learning. arXiv preprint arXiv:2104.05025, 2021.

[4]

Caccia, M., Rodriguez, P., Ostapenko, O., Normandin, F., Lin, M., Caccia, L., Laradji, I., Rish, I., Lacoste, A., Vazquez, D., et al. Online fast adaptation and knowledge accumulation: a new approach to continual learning. arXiv preprint arXiv:2003.05856, 2020.

[5]

Carta, A., Pellegrini, L., Cossu, A., Hemati, H., and Lomonaco, V. Avalanche: A pytorch library for deep continual learning. Journal of Machine Learning Research, 24(363):1-6, 2023a.

[6]

Carta, A., Van de Weijer, J., et al. Improving online continual learning performance and stability with temporal ensembles. arXiv preprint arXiv:2306.16817, 2023b.

[7]

Censor, Y. and Zenios, S. A. Proximal minimization algorithm with d-functions. Journal of Optimization Theory and Applications, 73(3):451-464, June 1992. ISSN 1573-2878.

[8]

Chaudhry, A., Ranzato, M., Rohrbach, M., and Elhoseiny, M. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420, 2018.

[9]

Chaudhry, A., Rohrbach, M., Elhoseiny, M., Ajanthan, T., Dokania, P., Torr, P., and Ranzato, M. Continual learning with tiny episodic memories. In Workshop on Multi-Task and Lifelong Reinforcement Learning, 2019.

[10]

De Lange, M., van de Ven, G., and Tuytelaars, T. Continual evaluation for lifelong learning: Identifying the stability gap. arXiv preprint arXiv:2205.13452, 2022.

[11]

Deng, D., Chen, G., Hao, J., Wang, Q., and Heng, P.-A. Flattening sharpness for dynamic gradient projection memory benefits continual learning. Advances in Neural Information Processing Systems, 34:18710-18721, 2021.

[12]

Drusvyatskiy, D. The proximal point method revisited, 2017.

[13]

Duncker, L., Driscoll, L., Shenoy, K. V., Sahani, M., and Sussillo, D. Organizing recurrent network dynamics by task-computation to enable continual learning. Advances in neural information processing systems, 33: 14387-14397, 2020.

[14]

Farajtabar, M., Azizan, N., Mott, A., and Li, A. Orthogonal gradient descent for continual learning. In International Conference on Artificial Intelligence and Statistics, pp. 3762-3773. PMLR, 2020.

[15]

French, R. M. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128-135, 1999.

[16]

Guo, Y., Hu, W., Zhao, D., and Liu, B. Adaptive orthogonal projection for batch and online continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 6783-6791, 2022.

[17]

Hess, T., Tuytelaars, T., and van de Ven, G. M. Two complementary perspectives to continual learning: Ask not only what to optimize, but also how. arXiv preprint arXiv:2311.04898, 2023.

[18]

Kao, T.-C., Jensen, K., van de Ven, G., Bernacchia, A., and Hennequin, G. Natural continual learning: success is a journey, not (just) a destination. Advances in neural information processing systems, 34:28067-28079, 2021.

[19]

Konishi, T., Kurokawa, M., Ono, C., Ke, Z., Kim, G., and Liu, B. Parameter-level soft-masking for continual learning. arXiv preprint arXiv:2306.14775, 2023.

[20]

Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images, 2009.

[21]

Le, Y. and Yang, X. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.

[22]

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[23]

Liang, Y.-S. and Li, W.-J. Loss decoupling for task-agnostic continual learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

[24]

Lin, S., Yang, L., Fan, D., and Zhang, J. Trgp: Trust region gradient projection for continual learning. arXiv preprint arXiv:2202.02931, 2022.

[25]

Lin, Z., Shi, J., Pathak, D., and Ramanan, D. The clear benchmark: Continual learning on real-world imagery. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2), 2021.

[26]

Lomonaco, V. and Maltoni, D. Core50: a new dataset and benchmark for continuous object recognition. In Conference on robot learning, pp. 17-26. PMLR, 2017.

[27]

Martens, J. and Grosse, R. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pp. 2408-2417. PMLR, 2015.

Digital Library

[28]

Parikh, N. and Boyd, S. Proximal algorithms. Found. Trends Optim., 1(3):127-239, jan 2014. ISSN 2167-3888.

Digital Library

[29]

Saha, G. and Roy, K. Continual learning with scaled gradient projection. arXiv preprint arXiv:2302.01386, 2023.

[30]

Saha, G., Garg, I., Ankit, A., and Roy, K. Space: Structured compression and sharing of representational space for continual learning. IEEE Access, 9:150480-150494, 2021a.

[31]

Saha, G., Garg, I., and Roy, K. Gradient projection memory for continual learning. arXiv preprint arXiv:2103.09762, 2021b.

[32]

Shu, K., Li, H., Cheng, J., Guo, Q., Leng, L., Liao, J., Hu, Y., and Liu, J. Replay-oriented gradient projection memory for continual learning in medical scenarios. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1724-1729, 2022.

[33]

Soutif-Cormerais, A., Carta, A., Cossu, A., Hurtado, J., Hemati, H., Lomonaco, V., and deWeijer, J. V. A comprehensive empirical evaluation on online continual learning, 2023.

[34]

van de Ven, G. M., Tuytelaars, T., and Tolias, A. S. Three types of incremental learning, December 2022.

[35]

Zeng, G., Chen, Y., Cui, B., and Yu, S. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8):364-372, 2019.

[36]

Zhang, Y., Pfahringer, B., Frank, E., Bifet, A., Lim, N. J. S., and Jia, Y. A simple but strong baseline for online continual learning: Repeated augmented rehearsal. Advances in Neural Information Processing Systems, 35: 14771-14783, 2022.

[37]

Zhao, Z., Zhang, Z., Tan, X., Liu, J., Qu, Y., Xie, Y., and Ma, L. Rethinking gradient projection continual learning: Stability/plasticity feature space decoupling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3718-3727, 2023.

Index Terms

Layerwise proximal replay: a proximal point method for online continual learning
1. Computing methodologies
  1. Machine learning
2. Theory of computation
  1. Models of computation
    1. Interactive computation
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Online learning theory

Index terms have been assigned to the content through auto-classification.

Recommendations

Replay debugging: leveraging record and replay for program debugging
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

Hardware-assisted Record and Deterministic Replay (RnR) of programs has been proposed as a primitive for debugging hard-to-repeat software bugs. However, simply providing support for repeatedly stumbling on the same bug does not help diagnose it. For ...
Replay debugging: leveraging record and replay for program debugging
ISCA '14

Hardware-assisted Record and Deterministic Replay (RnR) of programs has been proposed as a primitive for debugging hard-to-repeat software bugs. However, simply providing support for repeatedly stumbling on the same bug does not help diagnose it. For ...
Extended layerwise method for laminated composite plates with multiple delaminations and transverse cracks

In this paper, the extended layerwise method (XLWM), which was developed for laminated composite beams with multiple delaminations and transverse cracks (Li et al. in Int J Numer Methods Eng 101:407---434, 2015), is extended to laminated composite ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'24: Proceedings of the 41st International Conference on Machine Learning

July 2024

63010 pages

Copyright © 2024.

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

Research-article
Research
Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten