survey

Generalizing from a Few Examples: A Survey on Few-shot Learning

Authors:

Lionel M. NiAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 53, Issue 3

Article No.: 63, Pages 1 - 34

https://doi.org/10.1145/3386252

Published: 12 June 2020 Publication History

Abstract

Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this article, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications, and theories, are also proposed to provide insights for future research.¹

References

[1]

N. Abdo, H. Kretzschmar, L. Spinello, and C. Stachniss. 2013. Learning manipulation actions from a few demonstrations. In Proceedings of the International Conference on Robotics and Automation. 1268--1275.

[2]

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. 2013. Label-embedding for attribute-based classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 819--826.

[3]

M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. 2018. Continuous adaptation via meta-learning in nonstationary and competitive environments. In Proceedings of the International Conference on Learning Representations.

[4]

H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande. 2017. Low data drug discovery with one-shot learning. ACS Central Sci. 3, 4 (2017), 283--293.

[5]

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems. MIT Press, 3981--3989.

[6]

S. Arik, J. Chen, K. Peng, W. Ping, and Y. Zhou. 2018. Neural voice cloning with a few samples. In Advances in Neural Information Processing Systems. MIT Press, 10019--10029.

[7]

S. Azadi, M. Fisher, V. G. Kim, Z. Wang, E. Shechtman, and T. Darrell. 2018. Multi-content GAN for few-shot font-style transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7564--7573.

[8]

P. Bachman, A. Sordoni, and A. Trischler. 2017. Learning algorithms for active learning. In Proceedings of the International Conference on Machine Learning. 301--310.

[9]

Y. Bengio, D. Bahdanau, and K. Cho. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations.

[10]

E. Bart and S. Ullman. 2005. Cross-generalization: Learning novel classes from a single example by feature replacement. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1. 672--679.

[11]

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. 2007. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 137--144.

[12]

S. Benaim and L. Wolf. 2018. One-shot unsupervised cross domain translation. In Advances in Neural Information Processing Systems. MIT Press, 2104--2114.

[13]

L. Bertinetto, J. F. Henriques, P. Torr, and A. Vedaldi. 2019. Meta-learning with differentiable closed-form solvers. In Proceedings of the International Conference on Learning Representations.

[14]

L. Bertinetto, J. F. Henriques, J. Valmadre, P. Torr, and A. Vedaldi. 2016. Learning feed-forward one-shot learners. In Advances in Neural Information Processing Systems. MIT Press, 523--531.

[15]

C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.

Digital Library

[16]

J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. 2008. Learning bounds for domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 129--136.

[17]

L. Bottou and O. Bousquet. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems. MIT Press, 161--168.

[18]

L. Bottou, F. E. Curtis, and J. Nocedal. 2018. Optimization methods for large-scale machine learning. SIAM Rev. 60, 2 (2018), 223--311.

[19]

A. Brock, T. Lim, J.M. Ritchie, and N. Weston. 2018. SMASH: One-shot model architecture search through hypernetworks. In Proceedings of the International Conference on Learning Representations.

[20]

J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah. 1994. Signature verification using a “siamese” time delay neural network. In Advances in Neural Information Processing Systems. MIT Press, 737--744.

[21]

S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. 2017. One-shot video object segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 221--230.

[22]

Q. Cai, Y. Pan, T. Yao, C. Yan, and T. Mei. 2018. Memory matching networks for one-shot image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4080--4088.

[23]

R. Caruana. 1997. Multitask learning. Mach. Learn. 28, 1 (1997), 41--75.

Digital Library

[24]

J. Choi, J. Krishnamurthy, A. Kembhavi, and A. Farhadi. 2018. Structured set matching networks for one-shot part labeling. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3627--3636.

[25]

J. D. Co-Reyes, A. Gupta, S. Sanjeev, N. Altieri, J. DeNero, P. Abbeel, and S. Levine. 2019. Meta-learning language-guided policy learning. In Proceedings of the International Conference on Learning Representations.

[26]

J. J. Craig. 2009. Introduction to Robotics: Mechanics and Control. Pearson Education India.

[27]

E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. 2019. AutoAugment: Learning augmentation policies from data. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 113--123.

[28]

T. Deleu and Y. Bengio. 2018. The effects of negative adaptation in model-agnostic meta-learning. arXiv preprint arXiv:1812.02159.

[29]

G. Denevi, C. Ciliberto, D. Stamos, and M. Pontil. 2018. Learning to learn around a common mean. In Advances in Neural Information Processing Systems. MIT Press, 10190--10200.

[30]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 248--255.

[31]

X. Dong, L. Zhu, D. Zhang, Y. Yang, and F. Wu. 2018. Fast parameter adaptation for few-shot image captioning and visual question answering. In Proceedings of the ACM International Conference on Multimedia. 54--62.

[32]

M. Douze, A. Szlam, B. Hariharan, and H. Jégou. 2018. Low-shot learning with large-scale diffusion. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3349--3358.

[33]

Y. Duan, M. Andrychowicz, B. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba. 2017. One-shot imitation learning. In Advances in Neural Information Processing Systems. MIT Press, 1087--1098.

[34]

H. Edwards and A. Storkey. 2017. Towards a neural statistician. In Proceedings of the International Conference on Learning Representations.

[35]

L. Fei-Fei, R. Fergus, and P. Perona. 2006. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (2006), 594--611.

Digital Library

[36]

M. Fink. 2005. Object classification from a single example utilizing class relevance metrics. In Advances in Neural Information Processing Systems. MIT Press, 449--456.

[37]

C. Finn, P. Abbeel, and S. Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning. 1126--1135.

[38]

C. Finn and S. Levine. 2018. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. In Proceedings of the International Conference on Learning Representations.

[39]

C. Finn, K. Xu, and S. Levine. 2018. Probabilistic model-agnostic meta-learning. In Advances in Neural Information Processing Systems. MIT Press, 9537--9548.

[40]

L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. 2018. Bilevel programming for hyperparameter optimization and meta-learning. In Proceedings of the International Conference on Machine Learning. 1563--1572.

[41]

J. Friedman, T. Hastie, and R. Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer series in statistics New York.

[42]

H. Gao, Z. Shou, A. Zareian, H. Zhang, and S. Chang. 2018. Low-shot learning via covariance-preserving adversarial augmentation networks. In Advances in Neural Information Processing Systems. MIT Press, 983--993.

[43]

P. Germain, F. Bach, A. Lacoste, and S. Lacoste-Julien. 2016. PAC-Bayesian theory meets Bayesian inference. In Advances in Neural Information Processing Systems. MIT Press, 1884--1892.

[44]

S. Gidaris and N. Komodakis. 2018. Dynamic few-shot visual learning without forgetting. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4367--4375.

[45]

I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. MIT Press.

[46]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672--2680.

[47]

J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. Turner. 2019. Meta-learning probabilistic inference for prediction. In Proceedings of the International Conference on Learning Representations.

[48]

E. Grant, C. Finn, S. Levine, T. Darrell, and T. Griffiths. 2018. Recasting gradient-based meta-learning as hierarchical Bayes. In Proceedings of the International Conference on Learning Representations.

[49]

A. Graves, G. Wayne, and I. Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401.

[50]

L.-Y. Gui, Y.-X. Wang, D. Ramanan, and J. Moura. 2018. Few-shot human motion prediction via meta-learning. In Proceedings of the European Conference on Computer Vision. 432--450.

[51]

M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto. 2016. Learning assistive strategies from a few user-robot interactions: Model-based reinforcement learning approach. In Proceedings of the International Conference on Robotics and Automation. 3346--3351.

[52]

X. Han, H. Zhu, P. Yu, Z. Wang, Y. Yao, Z. Liu, and M. Sun. 2018. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4803--4809.

[53]

B. Hariharan and R. Girshick. 2017. Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the International Conference on Computer Vision.

[54]

H. He and E. A. Garcia. 2008. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9 (2008), 1263--1284.

[55]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.

[56]

A. Herbelot and M. Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 304--309.

[57]

L. B. Hewitt, M. I. Nye, A. Gane, T. Jaakkola, and J. B. Tenenbaum. 2018. The variational homoencoder: Learning to learn high capacity generative models from few examples. In Uncertainty in Artificial Intelligence. Elsevier/North Holland, 988--997.

[58]

S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.

Digital Library

[59]

S. Hochreiter, A. S. Younger, and P. R. Conwell. 2001. Learning to learn using gradient descent. In Proceedings of the International Conference on Artificial Neural Networks. 87--94.

[60]

J. Hoffman, E. Tzeng, J. Donahue, Y. Jia, K. Saenko, and T. Darrell. 2013. One-shot adaptation of supervised deep convolutional models. In Proceedings of the International Conference on Learning Representations.

[61]

Z. Hu, X. Li, C. Tu, Z. Liu, and M. Sun. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the International Conference on Computational Linguistics. 487--498.

[62]

S. J. Hwang and L. Sigal. 2014. A unified semantic embedding: Relating taxonomies and attributes. In Advances in Neural Information Processing Systems. MIT Press, 271--279.

[63]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. 675--678.

[64]

V. Joshi, M. Peters, and M. Hopkins. 2018. Extending a parser to distant domains using a few dozen partially annotated examples. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 1190--1199.

[65]

Ł. Kaiser, O. Nachum, A. Roy, and S. Bengio. 2017. Learning to remember rare events. In Proceedings of the International Conference on Learning Representations.

[66]

J. M. Kanter and K. Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In Proceedings of the International Conference on Data Science and Advanced Analytics. 1--10.

[67]

R. Keshari, M. Vatsa, R. Singh, and A. Noore. 2018. Learning structure and strength of CNN filters for small sample size training. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 9349--9358.

[68]

D. P. Kingma and M. Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations.

[69]

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A. 114, 13 (2017), 3521--3526.

[70]

G. Koch. 2015. Siamese Neural Networks for One-shot Image Recognition. Ph.D. Dissertation. University of Toronto.

[71]

L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18, 1 (2017), 826--830.

Digital Library

[72]

J. Kozerawski and M. Turk. 2018. CLEAR: Cumulative learning for one-shot one-class image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3446--3455.

[73]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 1097--1105.

[74]

R. Kwitt, S. Hegenbart, and M. Niethammer. 2016. One-shot learning of scene locations via feature trajectory transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 78--86.

[75]

B. Lake, C.-Y. Lee, J. Glass, and J. Tenenbaum. 2014. One-shot learning of generative speech concepts. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 36.

[76]

B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332--1338.

[77]

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. 2017. Building machines that learn and think like people. Behav. Brain Sci. 40 (2017).

[78]

C. H. Lampert, H. Nickisch, and S. Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 951--958.

[79]

Y. Lee and S. Choi. 2018. Gradient-based meta-learning with learned layerwise metric and subspace. In Proceedings of the International Conference on Machine Learning. 2933--2942.

[80]

K. Li and J. Malik. 2017. Learning to optimize. In Proceedings of the International Conference on Learning Representations.

[81]

X.-L. Li, P. S. Yu, B. Liu, and S.-K. Ng. 2009. Positive unlabeled learning for data stream classification. In Proceedings of the SIAM International Conference on Data Mining. 259--270.

[82]

B. Liu, X. Wang, M. Dixit, R. Kwitt, and N. Vasconcelos. 2018. Feature space transfer for data augmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 9090--9098.

[83]

H. Liu, K. Simonyan, and Y. Yang. 2019. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations.

[84]

Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y Yang. 2019. Learning to propopagate labels: Transductive propagation network for few-shot learning. In Proceedings of the International Conference on Learning Representations.

[85]

Z. Luo, Y. Zou, J. Hoffman, and L. Fei-Fei. 2017. Label efficient learning of transferable representations acrosss domains and tasks. In Advances in Neural Information Processing Systems. MIT Press, 165--177.

[86]

S. Mahadevan and P. Tadepalli. 1994. Quantifying prior determination knowledge using the PAC learning model. Mach. Learn. 17, 1 (1994), 69--105.

[87]

D. McNamara and M.-F. Balcan. 2017. Risk bounds for transferring representations with and without fine-tuning. In Proceedings of the International Conference on Machine Learning. 2373--2381.

[88]

T. Mensink, E. Gavves, and C. Snoek. 2014. Costa: Co-occurrence statistics for zero-shot classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2441--2448.

[89]

A. Miller, A. Fisch, J. Dodge, A.-H. Karimi, A. Bordes, and J. Weston. 2016. Key-value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1400--1409.

[90]

E. G. Miller, N. E. Matsakis, and P. A. Viola. 2000. Learning from one example through shared densities on transforms. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1. 464--471.

[91]

N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. 2018. A simple neural attentive meta-learner. In Proceedings of the International Conference on Learning Representations.

[92]

M. T. Mitchell. 1997. Machine Learning. McGraw-Hill.

Digital Library

[93]

S. H. Mohammadi and T. Kim. 2018. Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 2833--2837.

[94]

M. Mohri, A. Rostamizadeh, and A. Talwalkar. 2018. Foundations of Machine Learning. MIT Press.

[95]

S. Motiian, Q. Jones, S. Iranmanesh, and G. Doretto. 2017. Few-shot adversarial domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 6670--6680.

[96]

T. Munkhdalai and H. Yu. 2017. Meta networks. In Proceedings of the International Conference on Machine Learning. 2554--2563.

[97]

T. Munkhdalai, X. Yuan, S. Mehri, and A. Trischler. 2018. Rapid adaptation with conditionally shifted neurons. In Proceedings of the International Conference on Machine Learning. 3661--3670.

[98]

A. Nagabandi, C. Finn, and S. Levine. 2018. Deep online learning via meta-learning: Continual adaptation for model-based RL. In Proceedings of the International Conference on Learning Representations.

[99]

H. Nguyen and L. Zakynthinou. 2018. Improved algorithms for collaborative PAC learning. In Advances in Neural Information Processing Systems. MIT Press, 7631--7639.

[100]

B. Oreshkin, P. R. López, and A. Lacoste. 2018. TADAM: Task-dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 719--729.

[101]

S. J. Pan and Q. Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 10, 22 (2010), 1345--1359.

Digital Library

[102]

T. Pfister, J. Charles, and A. Zisserman. 2014. Domain-adaptive discriminative one-shot learning of gestures. In Proceedings of the European Conference on Computer Vision. 814--829.

[103]

H. Qi, M. Brown, and D. G. Lowe. 2018. Low-shot learning with imprinted weights. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 5822--5830.

[104]

T. Ramalho and M. Garnelo. 2019. Adaptive posterior learning: Few-shot learning with a surprise-based memory module. In Proceedings of the International Conference on Learning Representations.

[105]

S. Ravi and A. Beatson. 2019. Amortized Bayesian meta-learning. In Proceedings of the International Conference on Learning Representations.

[106]

S. Ravi and H. Larochelle. 2017. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations.

[107]

S. Reed, Y. Chen, T. Paine, A. van den Oord, S. M. A. Eslami, D. Rezende, O. Vinyals, and N. de Freitas. 2018. Few-shot autoregressive density estimation: Towards learning to learn distributions. In Proceedings of the International Conference on Learning Representations.

[108]

M. Ren, S. Ravi, E. Triantafillou, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel. 2018. Meta-learning for semi-supervised few-shot classification. In Proceedings of the International Conference on Learning Representations.

[109]

D. Rezende, I. Danihelka, K. Gregor, and D. Wierstra. 2016. One-shot generalization in deep generative models. In Proceedings of the International Conference on Machine Learning. 1521--1529.

[110]

A. Rios and R. Kavuluru. 2018. Few-shot and zero-shot multi-label learning for structured label spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3132.

[111]

A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell. 2019. Meta-learning with latent embedding optimization. In Proceedings of the International Conference on Learning Representations.

[112]

R. Salakhutdinov and G. Hinton. 2009. Deep boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 448--455.

[113]

R. Salakhutdinov, J. Tenenbaum, and A. Torralba. 2012. One-shot learning with a hierarchical nonparametric Bayesian model. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning. 195--206.

[114]

A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. 2016. Meta-learning with memory-augmented neural networks. In Proceedings of the International Conference on Machine Learning. 1842--1850.

[115]

V. G. Satorras and J. B. Estrach. 2018. Few-shot learning with graph neural networks. In Proceedings of the International Conference on Learning Representations.

[116]

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein. 2018. Delta-encoder: An effective sample synthesis method for few-shot object recognition. In Advances in Neural Information Processing Systems. MIT Press, 2850--2860.

[117]

B. Settles. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[118]

J. Shu, Z. Xu, and D Meng. 2018. Small sample learning in big data era. arXiv preprint arXiv:1808.04572.

[119]

P. Shyam, S. Gupta, and A. Dukkipati. 2017. Attentive recurrent comparators. In Proceedings of the International Conference on Machine Learning. 3173--3181.

[120]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.

[121]

J. Snell, K. Swersky, and R. S. Zemel. 2017. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 4077--4087.

[122]

M. D. Spivak. 1970. A Comprehensive Introduction to Differential Geometry. Publish or Perish.

[123]

R. K. Srivastava, K. Greff, and J. Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems. MIT Press, 2377--2385.

[124]

S. Sukhbaatar, J. Weston, R. Fergus, et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. MIT Press, 2440--2448.

[125]

J. Sun, S. Wang, and C. Zong. 2018. Memory, show the way: Memory based few shot word representation learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1435--1444.

[126]

F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1199--1208.

[127]

K. D. Tang, M. F. Tappen, R. Sukthankar, and C. H. Lampert. 2010. Optimizing one-shot recognition with micro-set learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3027--3034.

[128]

A. Tjandra, S. Sakti, and S. Nakamura. 2018. Machine speech chain with one-shot speaker adaptation. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 887--891.

[129]

A. Torralba, J. B. Tenenbaum, and R. R. Salakhutdinov. 2011. Learning to learn with compound HD models. In Advances in Neural Information Processing Systems. MIT Press, 2061--2069.

[130]

E. Triantafillou, R. Zemel, and R. Urtasun. 2017. Few-shot learning through an information retrieval lens. In Advances in Neural Information Processing Systems. MIT Press, 2255--2265.

[131]

E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P.-A. Manzagol, et al. 2019. Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096.

[132]

Y.-H. Tsai, L.-K. Huang, and R. Salakhutdinov. 2017. Learning robust visual-semantic embeddings. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3571--3580.

[133]

Y. H. Tsai and R. Salakhutdinov. 2017. Improving one-shot learning through fusing side information. arXiv preprint arXiv:1710.08347.

[134]

M. A. Turing. 1950. Computing machinery and intelligence. Mind 59, 236 (1950), 433--433.

[135]

A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems. MIT Press, 4790--4798.

[136]

V. N. Vapnik. 1992. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems. MIT Press, 831--838.

[137]

M. Vartak, A. Thiagarajan, C. Miranda, J. Bratman, and H. Larochelle. 2017. A meta-learning perspective on cold-start recommendations for items. In Advances in Neural Information Processing Systems. MIT Press, 6904--6914.

[138]

O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems. MIT Press, 3630--3638.

[139]

P. Wang, L. Liu, C. Shen, Z. Huang, A. van den Hengel, and H. Tao Shen. 2017. Multi-attention network for one shot learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2721--2729.

[140]

X. Wang, Y. Ye, and A. Gupta. 2018. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 6857--6866.

[141]

Y.-X. Wang, R. Girshick, M. Hebert, and B. Hariharan. 2018. Low-shot learning from imaginary data. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7278--7286.

[142]

Y.-X. Wang and M. Hebert. 2016. Learning from small sample sets by combining unsupervised meta-training with CNNs. In Advances in Neural Information Processing Systems. MIT Press, 244--252.

[143]

Y.-X. Wang and M. Hebert. 2016. Learning to learn: Model regression networks for easy small sample learning. In Proceedings of the European Conference on Computer Vision. 616--634.

[144]

J. Wei and K. Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing. 6383--6389.

[145]

J. Weston, S. Chopra, and A. Bordes. 2014. Memory networks. arXiv preprint arXiv:1410.3916.

[146]

M. Woodward and C. Finn. 2017. Active one-shot learning. arXiv preprint arXiv:1702.06559.

[147]

Y. Wu and Y. Demiris. 2010. Towards one shot learning by imitation for humanoid robots. In Proceedings of the International Conference on Robotics and Automation. 2889--2894.

[148]

Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Ouyang, and Y. Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 5177--5186.

[149]

Z. Xu, L. Zhu, and Y. Yang. 2017. Few-shot object recognition from machine-labeled web images. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1164--1172.

[150]

L. Yan, Y. Zheng, and J. Cao. 2018. Few-shot learning for short text classification. Multimedia Tools Appl. 77, 22 (2018), 29799--29810.

Digital Library

[151]

W. Yan, J. Yap, and G. Mori. 2015. Multi-task transfer methods to improve one-shot learning for multimedia event detection. In Proceedings of the British Machine Vision Conference.

[152]

H. Yang, X. He, and F. Porikli. 2018. One-shot action localization by learning sequence matching network. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1450--1459.

[153]

Q. Yao, M. Wang, E. H. Jair, I. Guyon, Y.-Q. Hu, Y.-F. Li, W.-W. Tu, Q. Yang, and Y. Yu. 2018. Taking human out of learning applications: A survey on automated machine learning. arXiv preprint arXiv:1810.13306.

[154]

Q. Yao, J. Xu, W.-W. Tu, and Z. Zhu. 2020. Efficient neural architecture search via proximal iterations. In Proceedings of the AAAI Conference on Artificial Intelligence.

[155]

D. Yoo, H. Fan, V. N. Boddeti, and K. M. Kitani. 2018. Efficient k-shot learning with regularized deep networks. In Proceedings of the AAAI Conference on Artificial Intelligence.

[156]

J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn. 2018. Bayesian model-agnostic meta-learning. In Advances in Neural Information Processing Systems. MIT Press, 7343--7353.

[157]

M. Yu, X. Guo, J. Yi, S. Chang, S. Potdar, Y. Cheng, G. Tesauro, H. Wang, and B. Zhou. 2018. Diverse few-shot text classification with multiple metrics. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1206--1215.

[158]

C. Zhang, J. Butepage, H. Kjellstrom, and S. Mandt. 2019. Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 2008--2026.

[159]

R. Zhang, T. Che, Z. Ghahramani, Y. Bengio, and Y. Song. 2018. MetaGAN: An adversarial approach to few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 2371--2380.

[160]

Y. Zhang, H. Tang, and K. Jia. 2018. Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data. In Proceedings of the European Conference on Computer Vision. 233--248.

[161]

Y. Zhang and Q. Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114.

[162]

F. Zhao, J. Zhao, S. Yan, and J. Feng. 2018. Dynamic conditional networks for few-shot learning. In Proceedings of the European Conference on Computer Vision.

[163]

Z.-H. Zhou. 2017. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5, 1 (2017), 44--53.

[164]

L. Zhu and Y. Yang. 2018. Compound memory networks for few-shot video classification. In Proceedings of the European Conference on Computer Vision. 751--766.

[165]

X. J. Zhu. 2005. Semi-supervised Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[166]

B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations.

Cited By

Bajpai SMishra GJain RJain DSaini DHussain A(2025) - Approx: A novel Resnet-Inception-based Fast -approximation method for face recognition Neurocomputing10.1016/j.neucom.2024.128708613(128708)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128708
Qiao YLe HZhang MAdhikary AZhang CHong C(2025)FedCCL: Federated dual-clustered feature contrast under domain heterogeneityInformation Fusion10.1016/j.inffus.2024.102645113(102645)Online publication date: Jan-2025
https://doi.org/10.1016/j.inffus.2024.102645
Li YHe JLiu HZhang YLi Z(2025)Semantic Guided prototype learning for Cross-Domain Few-Shot hyperspectral image classificationExpert Systems with Applications10.1016/j.eswa.2024.125453260(125453)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125453
Show More Cited By

Index Terms

Generalizing from a Few Examples: A Survey on Few-shot Learning
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms

Recommendations

Meta-learning Approaches for Few-Shot Learning: A Survey of Recent Advances
Despite its astounding success in learning deeper multi-dimensional data, the performance of deep learning declines on new unseen tasks mainly due to its focus on same-distribution prediction. Moreover, deep learning is notorious for poor generalization ...
A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities
Few-shot learning (FSL) has emerged as an effective learning method and shows great potential. Despite the recent creative works in tackling FSL tasks, learning valid information rapidly from just a few or even zero samples remains a serious challenge. In ...
Towards well-generalizing meta-learning via adversarial task augmentation
Abstract
Meta-learning aims to use the knowledge from previous tasks to facilitate the learning of novel tasks. Many meta-learning models elaborately design various task-shared inductive bias, and learn it from a large number of tasks, so the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 53, Issue 3

May 2021

787 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3403423

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2020

Online AM: 07 May 2020

Accepted: 01 March 2020

Revised: 01 January 2020

Received: 01 May 2019

Published in CSUR Volume 53, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,685
Total Citations
View Citations
22,115
Total Downloads

Downloads (Last 12 months)4,503
Downloads (Last 6 weeks)584

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bajpai SMishra GJain RJain DSaini DHussain A(2025) - Approx: A novel Resnet-Inception-based Fast -approximation method for face recognition Neurocomputing10.1016/j.neucom.2024.128708613(128708)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128708
Qiao YLe HZhang MAdhikary AZhang CHong C(2025)FedCCL: Federated dual-clustered feature contrast under domain heterogeneityInformation Fusion10.1016/j.inffus.2024.102645113(102645)Online publication date: Jan-2025
https://doi.org/10.1016/j.inffus.2024.102645
Li YHe JLiu HZhang YLi Z(2025)Semantic Guided prototype learning for Cross-Domain Few-Shot hyperspectral image classificationExpert Systems with Applications10.1016/j.eswa.2024.125453260(125453)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125453
Zio ERossi MGarcia EShuqin YGuangwei Z(2024)Enhanced Prompt Learning for Few-shot Text Classification MethodScientific Insights and Discoveries Review10.59782/sidr.v4i1.784(27-41)Online publication date: 14-Oct-2024
https://doi.org/10.59782/sidr.v4i1.78
Yasuda YMiyazaki TGoto J(2024)Weighted Asymmetric Loss for Multi-Label Text Classification on Imbalanced DataJournal of Natural Language Processing10.5715/jnlp.31.116631:3(1166-1192)Online publication date: 2024
https://doi.org/10.5715/jnlp.31.1166
Lakshay Bhardwaj Nitin Mishra Ashima Mehta (2024)Elevating Cloud Security via Serverless Computing: An in Depth ExplorationInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-17618(108-114)Online publication date: 24-Apr-2024
https://doi.org/10.48175/IJARSCT-17618
Kingsly SArchana T(2024)Understanding AI and Machine Learning ConceptsAI and Machine Learning Applications in Supply Chains and Marketing10.4018/979-8-3693-6760-5.ch015(371-392)Online publication date: 18-Oct-2024
https://doi.org/10.4018/979-8-3693-6760-5.ch015
Bhati NDuggar RAlzahrani A(2024)Exploring Few-Shot Learning Approaches for Bioinformatics AdvancementsApplying Machine Learning Techniques to Bioinformatics10.4018/979-8-3693-1822-5.ch016(303-316)Online publication date: 5-Apr-2024
https://doi.org/10.4018/979-8-3693-1822-5.ch016
Zeng W(2024)Image data augmentation techniques based on deep learning: A surveyMathematical Biosciences and Engineering10.3934/mbe.202427221:6(6190-6224)Online publication date: 2024
https://doi.org/10.3934/mbe.2024272
Tao JZhao WZhang YGuo QMin BXu XLiu F(2024)Cognitive diagnostic assessment: A Q-matrix constraint-based neural network methodBehavior Research Methods10.3758/s13428-024-02404-556:7(6981-7004)Online publication date: 30-Apr-2024
https://doi.org/10.3758/s13428-024-02404-5
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents