Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Not All Tasks Are Equal: A Parameter-Efficient Task Reweighting Method for Few-Shot Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14170))

  • 1079 Accesses

Abstract

Meta-learning has emerged as an effective and popular approach for few-shot learning (FSL) due to its fast adaptation to novel tasks. However, this kind of method assumes that the meta-training and testing tasks come from the same task distribution and assigns equal weights to all tasks during meta-training. This assumption limits their ability to perform well in real-world scenarios where some meta-training tasks contribute more to the testing tasks than others. To address this issue, we propose a parameter-efficient task reweighting (PETR) method, which assigns proper weights to meta-training tasks according to their contribution to the testing tasks while using few parameters. Specifically, we formulate a bi-level optimization problem to jointly learn the few-shot learning model and the task weights. In the inner loop, the meta-parameters of the few-shot learning model are updated based on a weighted training loss. In the outer loop, the task weight parameters are updated with the implicit gradient. Additionally, to address the challenge of a large number of task weight parameters, we introduce a hypothesis that significantly reduces the required parameters by considering the factors that influence the importance of each meta-training task. Empirical evaluation results on both traditional FSL and FSL with out-of-distribution (OOD) tasks show that our PETR method outperforms state-of-the-art meta-learning-based FSL methods by assigning proper weights to different meta-training tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Communications ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  2. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference On Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  3. Schmidhuber, J.: Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. PhD thesis, Technische Universität München (1987)

    Google Scholar 

  4. Naik, D.K., Mammone, R.J.: Meta-neural networks that learn by learning. In: [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, vol. 1, pp. 437–442. IEEE (1992)

    Google Scholar 

  5. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)

    Google Scholar 

  6. Li, Z., Zhou, F., Chen, F., Li, H.: Meta-sgd: learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017)

  7. Nichol, A., Schulman, J.: Reptile: a scalable metalearning algorithm, vol. 2(3), p. 4. arXiv preprint arXiv:1803.02999 (2018)

  8. Rajeswaran, A., Finn, C., Kakade, S.M., Levine, S.: Meta-learning with implicit gradients. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  9. Flennerhag, S., Rusu, A.A., Pascanu, R., Visin, F., Yin, H., Hadsell, R.: Meta-learning with warped gradient descent. arXiv preprint arXiv:1909.00025 (2019)

  10. Cai, D., Sheth, R., Mackey, L., Fusi, N.: Weighted meta-learning. arXiv preprint arXiv:2003.09465 (2020)

  11. Killamsetty, K., Li, C., Zhao, C., Chen, F., Iyer, R.: A nested bi-level optimization framework for robust few shot learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 7176–7184 (2022)

    Google Scholar 

  12. Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Planning Inference 90(2), 227–244 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  13. Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., Nau, P.B., Kawanabe, M.: Direct importance estimation for covariate shift adaptation. Annals Inst. Stat. Mathem. 60(4), 699–746 (2008)

    Google Scholar 

  14. Fang, T., Nan, L., Niu, G., Sugiyama, M.: Rethinking importance weighting for deep learning under distribution shift. Adv. Neural. Inf. Process. Syst. 33, 11996–12007 (2020)

    Google Scholar 

  15. Kuang, K., Xiong, R., Cui, P., Athey, S., Li, B.: Stable prediction with model misspecification and agnostic distribution shift. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4485–4492 (2020)

    Google Scholar 

  16. Zhang, X., Cui, P., Xu, R., Zhou, L., He, Y., Shen, Z.: Deep stable learning for out-of-distribution generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5372–5382 (2021)

    Google Scholar 

  17. Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In International Conference on Machine Learning, pp. 4334–4343. PMLR (2018)

    Google Scholar 

  18. Zhou, X., et al.: Model agnostic sample reweighting for out-of-distribution learning. In: International Conference on Machine Learning, pp. 27203–27221. PMLR (2022)

    Google Scholar 

  19. Finn, C., Xu, K., Levine, S.: Probabilistic model-agnostic meta-learning. In: Advances in Neural Information Processing Systems 31 (2018)

    Google Scholar 

  20. Grant, E., Finn, C., Levine, S., Darrell, T., Griffiths, T.: Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930 (2018)

  21. Ravi, S., Beatson, A.: Amortized bayesian meta-learning. In: International Conference on Learning Representations (2019)

    Google Scholar 

  22. Lee, H.B., Nam, T., Yang, E., Hwang, S.J.: Learning to perturb latent features for generalization, Meta dropout (2020)

    Google Scholar 

  23. Ni, R., Goldblum, M., Sharaf, A., Kong, K., Goldstein, T.: Data augmentation for meta-learning. In International Conference on Machine Learning, pp. 8152–8161. PMLR (2021)

    Google Scholar 

  24. Yao, H., Zhang, L., Finn, C.: Meta-learning with fewer tasks through task interpolation. arXiv preprint arXiv:2106.02695 (2021)

  25. Vuorio, R., Sun, S.-H., Hu, H., Lim, J.J.: Multimodal model-agnostic meta-learning via task-aware modulation. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  26. Lee, H.B., et al.: Learning to balance: Bayesian meta-learning for imbalanced and out-of-distribution tasks. arXiv preprint arXiv:1905.12917 (2019)

  27. Baik, S., Choi, J., Kim, H., Cho, D., Min, J., Lee, K.M.: Meta-learning with task-adaptive loss function for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9465–9474 (2021)

    Google Scholar 

  28. Liu, C., Wang, Z., Sahoo, D., Fang, Y., Zhang, K., Hoi, S.C.H.: Adaptive task sampling for meta-learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 752–769. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_44

    Chapter  Google Scholar 

  29. Zhang, J., Song, J., Yao, Y., Gao, L.: Curriculum-based meta-learning. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 1838–1846 (2021)

    Google Scholar 

  30. Zhou, Y., Wang, Y., Cai, J., Zhou, Y., Hu, Q., Wang, W.: Expert training: Task hardness aware meta-learning for few-shot classification. arXiv preprint arXiv:2007.06240 (2020)

  31. Bennequin, E., Bouvier, V., Tami, M., Toubhans, A., Hudelot, C.: Bridging few-shot learning and adaptation: new challenges of support-query shift. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12975, pp. 554–569. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86486-6_34

    Chapter  Google Scholar 

  32. Aimen, A., Ladrecha, B., Krishnan, N.C.: Adversarial projections to tackle support-query shifts in few-shot meta-learning. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, 19–23 September 2022, Proceedings, Part III, pp. 615–630. Springer (2023). https://doi.org/10.1007/978-3-031-26409-2_37

  33. Foo, C.-S., Ng, A., et al.: Efficient multiple hyperparameter learning for log-linear models. In: Advances in Neural Information Processing Systems 20 (2007)

    Google Scholar 

  34. Okuno, T., Takeda, A., Kawana, A., Watanabe, M.: On \(l_p-\)hyperparameter learning via bilevel nonsmooth optimization. arXiv preprint arXiv:1806.01520 (2018)

  35. Lorraine, J., Vicol, P., Duvenaud, D.: Optimizing millions of hyperparameters by implicit differentiation. In: International Conference on Artificial Intelligence and Statistics, pp. 1540–1552. PMLR (2020)

    Google Scholar 

  36. Mao, Y., Wang, Z., Liu, W., Lin, X., Xie, P.: Metaweighting: learning to weight tasks in multi-task learning. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 3436–3448 (2022)

    Google Scholar 

  37. Chen, H., Wang, X., Guan, C., Liu, Y., Zhu, W.: Auxiliary learning with joint task and data scheduling. In: International Conference on Machine Learning, pp. 3634–3647. PMLR (2022)

    Google Scholar 

  38. Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. In: International Conference on Machine Learning, pp. 1568–1577. PMLR (2018)

    Google Scholar 

  39. Lian, D., et al.: Towards fast adaptation of neural architectures with meta learning. In: International Conference on Learning Representations (2020)

    Google Scholar 

  40. Hu, Y., Wu, X., He, R.: TF-NAS: rethinking three search freedoms of latency-constrained differentiable neural architecture search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 123–139. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_8

    Chapter  Google Scholar 

  41. Chen, Z., Jiang, H., Shi, Y., Dai, B., Zhao, T.: Learning to defense by learning to attack (2019)

    Google Scholar 

  42. Tian, Y., Shen, L., Guinan, S., Li, Z., Liu, W.: Alphagan: fully differentiable architecture search for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 6752–6766 (2021)

    Article  Google Scholar 

  43. Yang, Z., Chen, Y., Hong, M., Wang, Z.: Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  44. Zhang, H., Chen, W., Huang, Z., Li, M., Yang, Y., Zhang, W., Wang, J.: Bi-level actor-critic for multi-agent coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7325–7332 (2020)

    Google Scholar 

  45. Rüschendorf, L.: The wasserstein distance and approximation theorems. Probab. Theory Relat. Fields 70(1), 117–129 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  46. Zhao, S., Sinha, A., He, Y., Perreault, A., Song, J., Ermon, S.: H-divergence: A decision-theoretic probability discrepancy measure

    Google Scholar 

  47. Arazo, E., Ortego, D., Albert, P., O’Connor, N., McGuinness, K.: Unsupervised label noise modeling and loss correction. In: International conference on machine learning, pp. 312–321. PMLR (2019)

    Google Scholar 

  48. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems 20 (2008)

    Google Scholar 

  49. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  50. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations (2017)

    Google Scholar 

  51. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)

    Google Scholar 

  52. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

Download references

Acknowledgments

This work was partly supported by the Fundamental Research Funds for the Central Universities (2019JBZ110); the Beijing Natural Science Foundation under Grant L211016; the National Natural Science Foundation of China under Grant 62176020; the National Key Research and Development Program (2020AAA0106800); and Chinese Academy of Sciences (OEIP-O-202004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liping Jing .

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

The authors declare that they have no conflict of interest. All procedures performed in studies involving human participants were by the ethical standards of the institutional and national research committees. This article does not contain any studies with animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Lyu, Y., Jing, L., Zeng, T., Yu, J. (2023). Not All Tasks Are Equal: A Parameter-Efficient Task Reweighting Method for Few-Shot Learning. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14170. Springer, Cham. https://doi.org/10.1007/978-3-031-43415-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43415-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43414-3

  • Online ISBN: 978-3-031-43415-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics