Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3618408.3619963guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Adaptive compositional continual meta-learning

Published: 23 July 2023 Publication History

Abstract

This paper focuses on continual meta-learning, where few-shot tasks are heterogeneous and sequentially available. Recent works use a mixture model for meta-knowledge to deal with the heterogeneity. However, these methods suffer from parameter inefficiency caused by two reasons: (1) the underlying assumption of mutual exclusiveness among mixture components hinders sharing meta-knowledge across heterogeneous tasks. (2) they only allow increasing mixture components and cannot adaptively filter out redundant components. In this paper, we propose an Adaptive Compositional Continual Meta-Learning (ACML) algorithm, which employs a compositional premise to associate a task with a subset of mixture components, allowing meta-knowledge sharing among heterogeneous tasks. Moreover, to adaptively adjust the number of mixture components, we propose a component sparsification method based on evidential theory to filter out redundant components. Experimental results show ACML outperforms strong baselines, showing the effectiveness of our compositional meta-knowledge, and confirming that ACML can adaptively learn meta-knowledge.

References

[1]
Amit, R. and Meir, R. Meta-learning by adjusting priors based on extended pac-bayes theory. In International Conference on Machine Learning, pp. 205-214. PMLR, 2018.
[2]
Benjamin, A., Rolnick, D., and Kording, K. Measuring and regularizing networks in function space. In International Conference on Learning Representations, 2018.
[3]
Bertinetto, L., Henriques, J. F., Torr, P., and Vedaldi, A. Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2018.
[4]
Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859-877, 2017.
[5]
Bronskill, J., Gordon, J., Requeima, J., Nowozin, S., and Turner, R. Tasknorm: Rethinking batch normalization for meta-learning. In International Conference on Machine Learning, pp. 1153-1164. PMLR, 2020.
[6]
Chen, P., Itkina, M., Senanayake, R., and Kochenderfer, M. J. Evidential softmax for sparse multimodal distributions in deep generative models. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 11565-11576, 2021.
[7]
Conklin, H., Wang, B., Smith, K., and Titov, I. Meta-learning to compositionally generalize. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 3322-3335, 2021.
[8]
Delange, M., Aljundi, R., Masana, M., Parisot, S., Jia, X., Leonardis, A., Slabaugh, G., and Tuytelaars, T. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
[9]
Dempster, A. P. A generalization of bayesian inference. In Classic Works of the Dempster-Shafer Theory of Belief Functions, pp. 73-104. Springer, 2008.
[10]
Denevi, G., Stamos, D., Ciliberto, C., and Pontil, M. Online-within-online meta-learning. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E. B., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 13089-13099, 2019.
[11]
Denoeux, T. A k-nearest neighbor classification rule based on dempster-shafer theory. In Classic works of the Dempster-Shafer theory of belief functions, pp. 737-760. Springer, 2008.
[12]
Denœux, T. Logistic regression, neural networks and dempster-shafer theory: A new perspective. Knowledge-Based Systems, 176:54-67, 2019.
[13]
Figurnov, M., Mohamed, S., and Mnih, A. Implicit reparameterization gradients. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montreal, Canada, pp. 439-450, 2018.
[14]
Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126-1135. PMLR, 2017.
[15]
Finn, C., Xu, K., and Levine, S. Probabilistic model-agnostic meta-learning. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montreal, Canada, pp. 9537-9548, 2018.
[16]
Finn, C., Rajeswaran, A., Kakade, S., and Levine, S. Online meta-learning. In International Conference on Machine Learning, pp. 1920-1930. PMLR, 2019.
[17]
Gordon, J., Bronskill, J., Bauer, M., Turner, R. E., Stuhmer, J., and Nowozin, S. Meta-learning probabilistic inference for prediction. In International Conference on Learning Representations, 2019.
[18]
Griffiths, T. L. and Ghahramani, Z. The indian buffet process: An introduction and review. Journal of Machine Learning Research, 12(4), 2011.
[19]
Ha, D., Dai, A., and Le, Q. V. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
[20]
Hoffman, M. and Blei, D. Stochastic structured variational inference. In Artificial Intelligence and Statistics, pp. 361-369. PMLR, 2015.
[21]
Hospedales, T., Antoniou, A., Micaelli, P., and Storkey, A. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439, 2020.
[22]
Hu, W., Lin, Z., Liu, B., Tao, C., Tao, Z. T., Zhao, D., Ma, J., and Yan, R. Overcoming catastrophic forgetting for continual learning via model adaptation. In International Conference on Learning Representations, 2019.
[23]
Iakovleva, E., Verbeek, J., and Alahari, K. Meta-learning with shared amortized variational inference. In International Conference on Machine Learning, pp. 4572-4582. PMLR, 2020.
[24]
Itkina, M., Ivanovic, B., Senanayake, R., Kochenderfer, M. J., and Pavone, M. Evidential sparsification of multimodal latent spaces in conditional variational autoencoders. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
[25]
Jang, E., Gu, S., and Poole, B. Categorical reparameterization with gumbel-softmax. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
[26]
Jerfel, G., Grant, E., Griffiths, T., and Heller, K. A. Reconciling meta-learning and continual learning with online mixtures of tasks. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E. B., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 9119-9130, 2019.
[27]
Kalai, A. and Vempala, S. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291-307, 2005.
[28]
Kessler, S., Nguyen, V., Zohren, S., and Roberts, S. J. Hierarchical indian buffet neural networks for bayesian continual learning. In Uncertainty in Artificial Intelligence, pp. 749-759. PMLR, 2021.
[29]
Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[30]
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521-3526, 2017.
[31]
Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009.
[32]
Kumar, A., Chatterjee, S., and Rai, P. Bayesian structural adaptation for continual learning. In International Conference on Machine Learning, pp. 5850-5860. PMLR, 2021.
[33]
Laha, A., Chemmengath, S. A., Agrawal, P., Khapra, M. M., Sankaranarayanan, K., and Ramaswamy, H. G. On controllable sparse alternatives to softmax. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montreal, Canada, pp. 6423-6433, 2018.
[34]
Lake, B., Salakhutdinov, R., Gross, J., and Tenenbaum, J. One shot learning of simple visual concepts. In Proceedings of the annual meeting of the cognitive science society, volume 33, 2011.
[35]
Lee, S., Kim, J., Jun, J., Ha, J., and Zhang, B. Overcoming catastrophic forgetting by incremental moment matching. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 4652-4662, 2017.
[36]
Maddison, C. J., Mnih, A., and Teh, Y. W. The concrete distribution: A continuous relaxation of discrete random variables. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
[37]
Martins, A. and Astudillo, R. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International conference on machine learning, pp. 1614- 1623. PMLR, 2016.
[38]
Mendez, J. A. and EATON, E. Lifelong learning of compositional structures. In International Conference on Learning Representations, 2021.
[39]
Munkhdalai, T. and Yu, H. Meta networks. In International Conference on Machine Learning, pp. 2554-2563. PMLR, 2017.
[40]
Nalisnick, E. T. and Smyth, P. Stick-breaking variational autoencoders. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
[41]
Nguyen, C. V., Li, Y., Bui, T. D., and Turner, R. E. Variational continual learning. In International Conference on Learning Representations, 2018.
[42]
Nilsback, M.-E. and Zisserman, A. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722-729. IEEE, 2008.
[43]
Oreshkin, B. N., López, P. R., and Lacoste, A. TADAM: task dependent adaptive metric for improved few-shot learning. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montreal, Canada, pp. 719-729, 2018.
[44]
Pan, P., Swaroop, S., Immer, A., Eschenhagen, R., Turner, R. E., and Khan, M. E. Continual deep learning by functional regularisation of memorable past. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
[45]
Raghu, A., Raghu, M., Bengio, S., and Vinyals, O. Rapid learning or feature reuse? towards understanding the effectiveness of MAML. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
[46]
Ravi, S. and Beatson, A. Amortized bayesian meta-learning. In International Conference on Learning Representations, 2018.
[47]
Ravi, S. and Larochelle, H. Optimization as a model for few-shot learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
[48]
Requeima, J., Gordon, J., Bronskill, J., Nowozin, S., and Turner, R. E. Fast and flexible multi-task classification using conditional neural adaptive processes. Advances in Neural Information Processing Systems, 32, 2019.
[49]
Shafer, G. A mathematical theory of evidence, volume 42. Princeton university press, 1976.
[50]
Snell, J., Swersky, K., and Zemel, R. S. Prototypical networks for few-shot learning. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 4077-4087, 2017.
[51]
Teh, Y., Jordan, M., Beal, M., and Blei, D. Sharing clusters among related groups: Hierarchical dirichlet processes. Advances in neural information processing systems, 17, 2004.
[52]
Titsias, M. K., Schwarz, J., de G. Matthews, A. G., Pascanu, R., and Teh, Y. W. Functional regularisation for continual learning using gaussian processes. CoRR, abs/1901.11356, 2019.
[53]
Vanschoren, J. Meta-learning: A survey. arXiv preprint arXiv:1810.03548, 2018.
[54]
Wu, B., Meng, Z., Zhang, Q., and Liang, S. Meta-learning helps personalized product search. In Proceedings of the ACM Web Conference 2022, pp. 2277-2287, 2022.
[55]
Yager, R. R. and Liu, L. Classic works of the Dempster-Shafer theory of belief functions, volume 219. Springer, 2008.
[56]
Yao, H., Wei, Y., Huang, J., and Li, Z. Hierarchically structured meta-learning. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97, pp. 7045-7054. PMLR, 2019.
[57]
Yao, H., Zhou, Y., Mahdavi, M., Li, Z., Socher, R., and Xiong, C. Online structured meta-learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
[58]
Yap, P., Ritter, H., and Barber, D. Addressing catastrophic forgetting in few-shot problems. In International Conference on Machine Learning, pp. 11909-11919. PMLR, 2021.
[59]
Zhang, Q., Fang, J., Meng, Z., Liang, S., and Yilmaz, E. Variational continual bayesian meta-learning. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 24556-24568, 2021.
[60]
Zhang, S., Yao, L., Sun, A., and Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR), 52(1):1-38, 2019.
[61]
Zhuang, Z., Wang, Y., Yu, K., and Lu, S. No-regret nonconvex online meta-learning. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3942-3946. IEEE, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media