survey

A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness

Authors:

Branislav Pecher,

Ivan Srba,

Maria BielikovaAuthors Info & Claims

ACM Computing Surveys, Volume 57, Issue 1

Article No.: 19, Pages 1 - 40

https://doi.org/10.1145/3691339

Published: 07 October 2024 Publication History

Get Access

Abstract

Learning with limited labelled data, such as prompting, in-context learning, fine-tuning, meta-learning, or few-shot learning, aims to effectively train a model using only a small amount of labelled samples. However, these approaches have been observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in the training process. The randomness negatively affects the stability of the models, leading to large variances in results across training runs. When such sensitivity is disregarded, it can unintentionally, but unfortunately also intentionally, create an imaginary perception of research progress. Recently, this area started to attract research attention and the number of relevant studies is continuously growing. In this survey, we provide a comprehensive overview of 415 papers addressing the effects of randomness on the stability of learning with limited labelled data. We distinguish between four main tasks addressed in the papers (investigate/evaluate, determine, mitigate, benchmark/compare/report randomness effects), providing findings for each one. Furthermore, we identify and discuss seven challenges and open problems together with possible directions to facilitate further research. The ultimate goal of this survey is to emphasise the importance of this growing research area, which so far has not received an appropriate level of attention, and reveal impactful directions for future research.

Supplemental Material

Supplementary Material: A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness

In Section A, we provide a link to the digital appendix that includes the detailed categorisation of all the 415 papers analysed in this survey. In addition, it provides more information about what can be found in this digital appendix.\rIn Section B, we provide a basic idea and high-level description of the different machine learning approaches that specify the scope of the survey. We provide description for: 1) meta-learning; 2) language model fine-tuning; 3) prompting/in-context learning; 4) prompt-based learning; and 5) parameter-efficient fine-tuning.\rIn Section C, we provide a detailed implementation of the survey methodology. This includes information about how the search terms were formed, the definition of the scopes, the identification of the relevant libraries that were used to discover the papers, and how the relevant papers were identified using search and further filtering (according to the PRISMA methodology).

Download
457.93 KB

References

[1]

Rishabh Adiga, Lakshminarayanan Subramanian, and Varun Chandrasekaran. 2024. Designing informative metrics for few-shot example selection. Retrieved from https://arXiv:2403.03861

Abstract

Supplemental Material

References

Index Terms

Recommendations

Learning safe multi-label prediction for weakly labeled data

Learning Deep Visual Features from Limited Labeled Data

Semi-supervised learning using multiple clusterings with limited labeled data

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Full Text

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations