Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness

Published: 07 October 2024 Publication History

Abstract

Learning with limited labelled data, such as prompting, in-context learning, fine-tuning, meta-learning, or few-shot learning, aims to effectively train a model using only a small amount of labelled samples. However, these approaches have been observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in the training process. The randomness negatively affects the stability of the models, leading to large variances in results across training runs. When such sensitivity is disregarded, it can unintentionally, but unfortunately also intentionally, create an imaginary perception of research progress. Recently, this area started to attract research attention and the number of relevant studies is continuously growing. In this survey, we provide a comprehensive overview of 415 papers addressing the effects of randomness on the stability of learning with limited labelled data. We distinguish between four main tasks addressed in the papers (investigate/evaluate, determine, mitigate, benchmark/compare/report randomness effects), providing findings for each one. Furthermore, we identify and discuss seven challenges and open problems together with possible directions to facilitate further research. The ultimate goal of this survey is to emphasise the importance of this growing research area, which so far has not received an appropriate level of attention, and reveal impactful directions for future research.

Supplemental Material

Supplementary Material: A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness
In Section A, we provide a link to the digital appendix that includes the detailed categorisation of all the 415 papers analysed in this survey. In addition, it provides more information about what can be found in this digital appendix.\rIn Section B, we provide a basic idea and high-level description of the different machine learning approaches that specify the scope of the survey. We provide description for: 1) meta-learning; 2) language model fine-tuning; 3) prompting/in-context learning; 4) prompt-based learning; and 5) parameter-efficient fine-tuning.\rIn Section C, we provide a detailed implementation of the survey methodology. This includes information about how the search terms were formed, the definition of the scopes, the identification of the relevant libraries that were used to discover the papers, and how the relevant papers were identified using search and further filtering (according to the PRISMA methodology).

References

[1]
Rishabh Adiga, Lakshminarayanan Subramanian, and Varun Chandrasekaran. 2024. Designing informative metrics for few-shot example selection. Retrieved from https://arXiv:2403.03861
[2]
Mayank Agarwal, Mikhail Yurochkin, and Yuekai Sun. 2021. On sensitivity of meta-learning to support data. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, 20447–20460.
[3]
Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, and Marjan Ghazvininejad. 2023. In-context examples selection for machine translation. In Proceedings of the Association for Computational Linguistics (ACL’23). ACL, 8857–8873. DOI:
[4]
Anirudh Ajith, Mengzhou Xia, Ameet Deshpande, and Karthik R. Narasimhan. 2023. InstructEval: Systematic evaluation of instruction selection methods. In R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models. Retrieved from https://openreview.net/forum?id=6FwaSOEeKD
[5]
Riccardo Albertoni, Sara Colantonio, Piotr Skrzypczyński, and Jerzy Stefanowski. 2023. Reproducibility of machine learning: Terminology, recommendations and open issues. Retrieved from https://arXiv:2302.12691
[6]
James Urquhart Allingham, Jie Ren, Michael W. Dusenberry, Xiuye Gu, Yin Cui, Dustin Tran, Jeremiah Zhe Liu, and Balaji Lakshminarayanan. 2023. A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 202). PMLR, 547–568. Retrieved from https://proceedings.mlr.press/v202/allingham23a.html
[7]
Norah Alzahrani, Hisham Abdullah Alyahya, Yazeed Alnumay, Sultan Alrashed, Shaykhah Alsubaie, Yusef Almushaykeh, Faisal Mirza, Nouf Alotaibi, Nora Altwairesh, Areeb Alowisheq et al. 2024. When benchmarks are targets: Revealing the sensitivity of large language model leaderboards. Retrieved from https://arXiv:2402.01781
[8]
Shengnan An, Bo Zhou, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Weizhu Chen, and Jian-Guang Lou. 2023. Skill-based few-shot selection for in-context learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 13472–13492. DOI:
[9]
Shubham Atreja, Joshua Ashkinaze, Lingyao Li, Julia Mendelsohn, and Libby Hemphill. 2024. Prompt design matters for computational social science tasks but in unpredictable ways. Retrieved from https://arXiv:2406.11980
[10]
Sinjini Banerjee, Tim Marrinan, Reilly Cannon, Tony Chiang, and Anand D. Sarwate. 2024. Measuring Model Variability using Robust Non-parametric Testing. Retrieved from https://arxiv.org/abs/2406.08307
[11]
Amanda Bertsch, Maor Ivgi, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. 2024. In-context learning with long-context models: An in-depth exploration. Retrieved from https://arXiv:2405.00200
[12]
Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive et al. 2024. Lessons from the trenches on reproducible evaluation of language models. Retrieved from https://arXiv:2405.14782
[13]
Thomas Boquet, Laure Delisle, Denis Kochetkov, Nathan Schucher, Boris N. Oreshkin, and Julien Cornebise. 2019. Reproducibility and stability analysis in metric-based few-shot learning. In RML@ICLR, Vol. 3. Retrieved from https://openreview.net/forum?id=B1g-SnUaUN
[14]
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, and Pascal Vincent. 2021. Accounting for variance in machine learning benchmarks. In Proceedings of Machine Learning and Systems, Vol. 3. 747–769. Retrieved from https://proceedings.mlsys.org/paper/2021/hash/cfecdb276f634854f3ef915e2e980c31-Abstract.html
[15]
Xavier Bouthillier, César Laurent, and Pascal Vincent. 2019. Unreproducible research is reproducible. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 725–734. Retrieved from https://proceedings.mlr.press/v97/bouthillier19a.html
[16]
Jonathan Bragg, Arman Cohan, Kyle Lo, and Iz Beltagy. 2021. FLEX: Unifying evaluation for few-shot NLP. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, 15787–15800.
[17]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, 1877–1901.
[18]
Ernie Chang, Xiaoyu Shen, Hui-Syuan Yeh, and Vera Demberg. 2021. On training instance selection for few-shot neural text generation. In Proceedings of the 59th Annual Meeting of the ACL and the 11th International Joint Conference on Natural Language Processing. ACL, Online, 8–13. DOI:
[19]
Haw-Shiuan Chang, Ruei-Yao Sun, Kathryn Ricci, and Andrew McCallum. 2023. Multi-CLS BERT: An efficient alternative to traditional ensembling. In Proceedings of the 61st Annual Meeting of the ACL. ACL, 821–854. DOI:
[20]
Ting-Yun Chang and Robin Jia. 2023. Data curation alone can stabilize in-context learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. ACL, 8123–8144. DOI:
[21]
Boyuan Chen, Mingzhi Wen, Yong Shi, Dayi Lin, Gopi Krishnan Rajbahadur, and Zhen Ming (Jack) Jiang. 2022. Towards training reproducible deep learning models. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). Association for Computing Machinery, New York, NY, 2202–2214. DOI:
[22]
Derek Chen, Kun Qian, and Zhou Yu. 2023. Stabilized in-context learning with pre-trained language models for few shot dialogue state tracking. In Proceedings of the Association for Computational Linguistics (EACL’23). ACL, 1551–1564. DOI:
[23]
Guanzheng Chen, Fangyu Liu, Zaiqiao Meng, and Shangsong Liang. 2022. Revisiting parameter-efficient tuning: Are we really there yet? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 2612–2626. DOI:
[24]
Lichang Chen, Jiuhai Chen, Heng Huang, and Minhao Cheng. 2023. PTP: Boosting stability and performance of prompt tuning with perturbation-based regularizer. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 13512–13525. DOI:
[25]
Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, and Taeuk Kim. 2023. Prompt-augmented linear probing: Scaling beyond the limit of few-shot in-context learners. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 12709–12718.
[26]
Alexandru Cioba, Michael Bromberg, Qian Wang, Ritwik Niyogi, Georgios Batzolis, Jezabel Garcia, Da-shan Shiu, and Alberto Bernacchia. 2022. How to distribute data across tasks for meta-learning? In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6394–6401. DOI:
[27]
Ashok Cutkosky and Francesco Orabona. 2019. Momentum-based variance reduction in non-convex SGD. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates.
[28]
Alexander D’Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, and D. Sculley. 2022. Underspecification presents challenges for credibility in modern machine learning. J. Mach. Learn. Res. 23, 226 (2022), 1–61. Retrieved from http://jmlr.org/papers/v23/20-1335.html
[29]
Yann N. Dauphin and Samuel Schoenholz. 2019. MetaInit: Initializing learning by learning to initialize. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates.
[30]
Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, and Oriol Vinyals. 2021. The Benchmark Lottery. DOI:
[31]
Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and Noah Smith. 2020. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping. DOI:
[32]
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, and Zhifang Sui. 2022. A survey for in-context learning. Retrieved from https://arXiv:2301.00234
[33]
Rotem Dror, Segev Shlomov, and Roi Reichart. 2019. Deep dominance - how to properly compare deep neural models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, 2773–2785. DOI:
[34]
Avia Efrat, Or Honovich, and Omer Levy. 2022. LMentry: A Language Model Benchmark of Elementary Language Tasks. Retrieved from http://arxiv.org/abs/2211.02069
[35]
Yu Fei, Yifan Hou, Zeming Chen, and Antoine Bosselut. 2023. Mitigating label biases for in-context learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. ACL, 14014–14031. DOI:
[36]
Minghao Fu, Yun-Hao Cao, and Jianxin Wu. 2022. Worst case matters for few-shot recognition. In Proceedings of the European Conference on Computer Vision. Springer, 99–115.
[37]
Chengguang Gan and Tatsunori Mori. 2023. Sensitivity and robustness of large language models to prompt template in Japanese text classification tasks. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation. ACL, Hong Kong, China, 1–11. https://aclanthology.org/2023.paclic-1.1
[38]
Hila Gonen, Srini Iyer, Terra Blevins, Noah Smith, and Luke Zettlemoyer. 2023. Demystifying prompts in language models via perplexity estimation. In Proceedings of the Association for Computational Linguistics (EMNLP’23). ACL, 10136–10148. DOI:
[39]
Odd Erik Gundersen, Kevin Coakley, and Christine Kirkpatrick. 2022. Sources of irreproducibility in machine learning: A review. Retrieved from https://arXiv:2204.07610
[40]
Odd Erik Gundersen, Saeid Shamsaliei, Håkon Sletten Kjærnli, and Helge Langseth. 2023. On reporting robust and trustworthy conclusions from model comparison studies involving neural networks and randomness. In Proceedings of the ACM Conference on Reproducibility and Replicability (ACM REP’23). Association for Computing Machinery, New York, NY, 37–61. DOI:
[41]
Vipul Gupta, David Pantoja, Candace Ross, Adina Williams, and Megan Ung. 2024. Changing answer order can decrease mmlu accuracy. Retrieved from https://arXiv:2406.19470
[42]
Christopher Hidey, Fei Liu, and Rahul Goel. 2022. Reducing model churn: Stable re-training of conversational agents. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue. ACL, 14–25. https://aclanthology.org/2022.sigdial-1.2
[43]
Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. 2021. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9 (2021), 5149–5169.
[44]
Yutai Hou, Hongyuan Dong, Xinghao Wang, Bohan Li, and Wanxiang Che. 2022. MetaPrompting: Learning to learn better prompts. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 3251–3262. https://aclanthology.org/2022.coling-1.287
[45]
Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, and Jiebo Luo. 2021. Noise stability regularization for improving BERT fine-tuning. In Proceedings of the Conference of the NAACL: Human Language Technologies. ACL, 3229–3241. DOI:
[46]
Baijun Ji, Xiangyu Duan, Zhenyu Qiu, Tong Zhang, Junhui Li, Hao Yang, and Min Zhang. 2024. Submodular-based in-context example selection for LLMs-based machine translation. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING’24). ELRA and ICCL, 15398–15409. https://aclanthology.org/2024.lrec-main.1337
[47]
Mingjian Jiang, Yangjun Ruan, Sicong Huang, Saifei Liao, Silviu Pitis, Roger Baker Grosse, and Jimmy Ba. 2023. Calibrating language models via augmented prompt ensembles. In Challenges in Deployable Generative AI. Retrieved from https://openreview.net/forum?id=L0dc4wqbNs
[48]
Iman Jundi and Gabriella Lapesa. 2022. How to translate your samples and choose your shots? Analyzing translate-train and few-shot cross-lingual transfer. In Proceedings of the Association for Computer Linguistics (NAACL’22). ACL, 129–150. DOI:
[49]
Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. Challenges and applications of large language models. Retrieved from https://arXiv:2307.10169
[50]
Urja Khurana, Eric Nalisnick, and Antske Fokkens. 2021. How emotionally stable is ALBERT? Testing robustness with stochastic weight averaging on a sentiment analysis task. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems. ACL, 16–31. DOI:
[51]
Abdullatif Köksal, Timo Schick, and Hinrich Schuetze. 2023. MEAL: Stable and active learning for few-shot prompting. In Proceedings of the Association for Computational Linguistics (EMNLP’23). ACL, 506–517. DOI:
[52]
Sawan Kumar and Partha Talukdar. 2021. Reordering examples helps during priming-based few-shot learning. In Proceedings of the Association for Computational Linguistics (ACL-IJCNLP’21). ACL, 4507–4518. DOI:
[53]
Kfir Levy, Ali Kavis, and Volkan Cevher. 2021. STORM+: Fully adaptive SGD with recursive momentum for nonconvex optimization. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, 20571–20582.
[54]
Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, and Wanxiang Che. 2023. MixPro: Simple yet effective data augmentation for prompt-based learning. Retrieved from https://arXiv:2304.09402
[55]
Jia Li, Ge Li, Chongyang Tao, Huangzhao Zhang, Fang Liu, and Zhi Jin. 2023. Large language model-aware in-context learning for code generation. Retrieved from https://arXiv:2310.09748
[56]
Xiaonan Li and Xipeng Qiu. 2023. Finding support examples for in-context learning. In Proceedings of the Association for Computational Linguistics (EMNLP’23). ACL, 6219–6235. DOI:
[57]
Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, and Tuo Zhao. 2022. CAMERO: Consistency regularized ensemble of perturbed language models with weight sharing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 7162–7175. DOI:
[58]
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Alexander Cosgrove, Christopher D. Manning, Christopher Re, Diana Acosta-Navas, Drew Arad Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue WANG, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri S. Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Andrew Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, and Yuta Koreeda. 2023. Holistic evaluation of language models. Trans. Mach. Learn. Res. (2023). Retrieved from https://openreview.net/forum?id=iO4LZibEqW
[59]
Hongfu Liu and Ye Wang. 2023. Towards informative few-shot prompt with maximum information gain for in-context learning. In Proceedings of the Association for Computational Linguistics (EMNLP’23). ACL, 15825–15838. DOI:
[60]
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2022. What makes good in-context examples for GPT-3? In Proceedings of the 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures: Deep Learning Inside Out (DeeLIO’22). ACL, 100–114. DOI:
[61]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55, 9, Article 195 (Jan.2023), 35 pages. DOI:
[62]
Wei Liu, Weihao Zeng, Keqing He, Yong Jiang, and Junxian He. 2023. What makes good data for alignment? A comprehensive study of automatic data selection in instruction tuning. Retrieved from https://openreview.net/forum?id=BTKAeLqLMw
[63]
Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, and Chao Shen. 2024. StablePT: Towards stable prompting for few-shot learning via input separation. Retrieved from https://arXiv:2404.19335
[64]
Yinpeng Liu, Jiawei Liu, Xiang Shi, Qikai Cheng, and Wei Lu. 2024. Let’s learn step by step: Enhancing in-context learning ability with curriculum learning. Retrieved from https://arXiv:2402.10738
[65]
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. 2022. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 8086–8098. DOI:
[66]
Huan Ma, Changqing Zhang, Yatao Bian, Lemao Liu, Zhirui Zhang, Peilin Zhao, Shu Zhang, Huazhu Fu, Qinghua Hu, and Bingzhe Wu. 2023. Fairness-guided few-shot prompting for large language models. In Advances in Neural Information Processing Systems, Vol. 36. Curran Associates, 43136–43155. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2023/file/8678da90126aa58326b2fc0254b33a8c-Paper-Conference.pdf
[67]
Yubo Ma, Yixin Cao, Yong Hong, and Aixin Sun. 2023. Large language model is not a good few-shot information extractor, but a good reranker for hard samples!. In Proceedings of Association for Computational Linguistics (EMNLP’23). ACL, 10572–10601. DOI:
[68]
Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, and Dieuwke Hupkes. 2024. Quantifying variance in evaluation benchmarks. Retrieved from https://arXiv:2406.10229
[69]
Katerina Margatina, Timo Schick, Nikolaos Aletras, and Jane Dwivedi-Yu. 2023. Active learning principles for in-context learning with large language models. In Proceedings of Association for Computational Linguistics (EMNLP’23). ACL, 5011–5034. DOI:
[70]
Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, and George Karypis. 2023. Which examples to annotate for in-context learning? Towards effective and efficient selection. Retrieved from https://arXiv:2310.20046
[71]
R. Thomas McCoy, Junghyun Min, and Tal Linzen. 2020. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance. In Proceedings of the 3rd BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. ACL, 217–227. DOI:
[72]
Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, and Jiawei Han. 2022. Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning. Retrieved from http://arxiv.org/abs/2211.03044
[73]
Sewon Min, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Noisy channel language model prompting for few-shot text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 5316–5330. DOI:
[74]
Moran Mizrahi, Guy Kaplan, Dan Malkin, Rotem Dror, Dafna Shahaf, and Gabriel Stanovsky. 2023. State of what art? A call for multi-prompt llm evaluation. Retrieved from https://arXiv:2401.00595
[75]
David Moher, Alessandro Liberati, Jennifer Tetzlaff, Douglas G. Altman, and PRISMA Group. 2009. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Ann. Internal Med. 151, 4 (2009), 264–269.
[76]
Marius Mosbach, Maksym Andriushchenko, and Dietrich Klakow. 2021. On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines. Retrieved from https://openreview.net/forum?id=nzpLWnVAyah
[77]
Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, Dietrich Klakow, and Yanai Elazar. 2023. Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. In Proceedings of the Association for Computational Linguistics (ACL’23). ACL, 12284–12314. DOI:
[78]
Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, and Jianfeng Gao. 2021. CLUES: Few-Shot Learning Evaluation in Natural Language Understanding. Retrieved from http://arxiv.org/abs/2111.02570
[79]
Tai Nguyen and Eric Wong. 2023. In-context example selection with influences. Retrieved from https://arXiv:2302.11042
[80]
Uche Osahor and Nasser M. Nasrabadi. 2022. Ortho-shot: Low displacement rank regularization with data augmentation for few-shot learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’22). 2040–2049. DOI:
[81]
Matthew J. Page, Joanne E. McKenzie, Patrick M. Bossuyt, Isabelle Boutron, Tammy C. Hoffmann, Cynthia D. Mulrow, Larissa Shamseer, Jennifer M. Tetzlaff, Elie A. Akl, Sue E. Brennan et al. 2021. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. System. Rev. 10, 1 (2021), 1–11.
[82]
Kaihang Pan, Juncheng Li, Hongye Song, Jun Lin, Xiaozhong Liu, and Siliang Tang. 2023. Self-supervised meta-prompt learning with meta-gradient regularization for few-shot generalization. In Proceedings of the Association for Computational Linguistics (EMNLP’23). ACL, 1059–1077. DOI:
[83]
Branislav Pecher, Jan Cegin, Robert Belanec, Jakub Simko, Ivan Srba, and Maria Bielikova. 2024. Fighting randomness with randomness: Mitigating optimisation instability of fine-tuning using delayed ensemble and noisy interpolation. Retrieved from https://arXiv:2406.12471
[84]
Branislav Pecher, Ivan Srba, and Maria Bielikova. 2024. Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance. Retrieved from https://arXiv:2402.12819
[85]
Branislav Pecher, Ivan Srba, and Maria Bielikova. 2024. On sensitivity of learning with limited labelled data to the effects of randomness: Impact of interactions and systematic choices. Retrieved from https://arXiv:2402.12817
[86]
Branislav Pecher, Ivan Srba, Maria Bielikova, and Joaquin Vanschoren. 2024. Automatic combination of sample selection strategies for few-shot learning. Retrieved from https://arXiv:2402.03038
[87]
Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, and Dacheng Tao. 2024. Revisiting demonstration selection strategies in in-context learning. Retrieved from https://arXiv:2401.12087
[88]
Pouya Pezeshkpour and Estevam Hruschka. 2023. Large language models sensitivity to the order of options in multiple-choice questions. Retrieved from https://arXiv:2308.11483
[89]
Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, and Nachiappan Nagappan. 2021. Problems and opportunities in training deep learning software systems: An analysis of variance. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). Association for Computing Machinery, New York, NY, 771–783. DOI:
[90]
Jason Phang, Thibault Févry, and Samuel R. Bowman. 2019. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks. Retrieved from https://arxiv.org/abs/1811.01088
[91]
Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Onkar Bhardwaj, Leshem Choshen, Allysson Flavio Melo de Oliveira, Yuekai Sun, and Mikhail Yurochkin. 2024. Efficient multi-prompt evaluation of LLMs. Retrieved from https://arXiv:2405.17202
[92]
Reid Pryzant, Dan Iter, Jerry Li, Yin Lee, Chenguang Zhu, and Michael Zeng. 2023. Automatic prompt optimization with “Gradient Descent” and beam search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 7957–7968. DOI:
[93]
Jian Qian, Miao Sun, Sifan Zhou, Ziyu Zhao, Ruizhi Hun, and Patrick Chiang. 2024. Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation. Retrieved from https://arxiv.org/abs/2407.05693
[94]
Chengwei Qin, Shafiq Joty, Qian Li, and Ruochen Zhao. 2023. Learning to initialize: Can meta learning improve cross-task generalization in prompt tuning?. In Proceedings of the 61st Annual Meeting of the ACL. ACL, 11802–11832. DOI:
[95]
Chengwei Qin, Aston Zhang, Anirudh Dagar, and Wenming Ye. 2023. In-context learning with iterative demonstration selection. Retrieved from https://arXiv:2310.09881
[96]
Anastasiia Razdaibiedina, Yuning Mao, Madian Khabsa, Mike Lewis, Rui Hou, Jimmy Ba, and Amjad Almahairi. 2023. Residual prompt tuning: Improving prompt tuning with residual reparameterization. In Proceedings of the ACL: ACL 2023. ACL, 6740–6757. DOI:
[97]
Yuval Reif and Roy Schwartz. 2024. Beyond performance: Quantifying and mitigating label bias in LLMs. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 6784–6798. https://aclanthology.org/2024.naacl-long.378
[98]
Nils Reimers and Iryna Gurevych. 2017. Reporting score distributions makes a difference: Performance study of LSTM-networks for sequence tagging. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. ACL, 338–348. DOI:
[99]
Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2022. Learning to retrieve prompts for in-context learning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2655–2671. DOI:
[100]
Victor Sanh, Albert Webson, Colin Raffel, Stephen Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Arun Raja, Manan Dey, M. Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Teven Le Scao, Stella Biderman, Leo Gao, Thomas Wolf, and Alexander M. Rush. 2022. Multitask prompted training enables zero-shot task generalization. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=9Vrb9D0WI4
[101]
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect?. In Proceedings of the 40th International Conference on Machine Learning. PMLR, 29971–30004. Retrieved from https://proceedings.mlr.press/v202/santurkar23a.html
[102]
Timo Schick and Hinrich Schütze. 2021. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2339–2352. DOI:
[103]
Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. 2023. Quantifying language models’ sensitivity to spurious features in prompt design or: How I learned to start worrying about prompt formatting. Retrieved from https://openreview.net/forum?id=RIu5lyNXjT
[104]
Thibault Sellam, Steve Yadlowsky, Ian Tenney, Jason Wei, Naomi Saphra, Alexander D’Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, and Ellie Pavlick. 2022. The MultiBERTs: BERT reproductions for robustness analysis. 30. Retrieved from https://openreview.net/forum?id=K0E_F0gFDgA
[105]
Amrith Setlur, Oscar Li, and Virginia Smith. 2021. Is Support Set Diversity Necessary for Meta-Learning? Retrieved from http://arxiv.org/abs/2011.14048
[106]
Amrith Setlur, Oscar Li, and Virginia Smith. 2021. Two sides of meta-learning evaluation: In vs. Out of Distribution. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, 3770–3783.
[107]
Kashun Shum, Shizhe Diao, and Tong Zhang. 2023. Automatic prompt augmentation and selection with chain-of-thought from labeled data. In Proceedings of the Association for Computational Linguistics (EMNLP’23). ACL, 12113–12139. DOI:
[108]
Sai Ashish Somayajula, Youwei Liang, Li Zhang, Abhishek Singh, and Pengtao Xie. 2024. Generalizable and stable finetuning of pretrained language models on low-resource texts. In Proceedings of the Conference of the NAACL: Human Language Technologies. ACL, 4936–4953. Retrieved from https://aclanthology.org/2024.naacl-long.277
[109]
Yisheng Song, Ting Wang, Puyu Cai, Subrota K. Mondal, and Jyoti Prakash Sahoo. 2023. A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Comput. Surv. (Feb.2023). DOI:
[110]
Taylor Sorensen, Joshua Robinson, Christopher Rytting, Alexander Shaw, Kyle Rogers, Alexia Delorey, Mahmoud Khalil, Nancy Fulda, and David Wingate. 2022. An information-theoretic approach to prompt engineering without ground truth labels. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 819–862. DOI:
[111]
Michal Štefánik, Marek Kadlčík, Piotr Gramacki, and Petr Sojka. 2023. Resources and few-shot learners for in-context learning in Slavic languages. In Proceedings of the 9th Workshop on Slavic Natural Language Processing (SlavicNLP’23). ACL, 94–105. DOI:
[112]
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, and Tao Yu. 2022. Selective annotation makes language models better few-shot learners. Retrieved from https://openreview.net/forum?id=qY1hlv7gwg
[113]
Cecilia Summers and Michael J. Dinneen. 2021. Nondeterminism and instability in neural network optimization. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 9913–9922. Retrieved from https://proceedings.mlr.press/v139/summers21a.html
[114]
Jiuding Sun, Chantal Shaib, and Byron C. Wallace. 2023. Evaluating the zero-shot robustness of instruction-tuned language models. Retrieved from https://openreview.net/forum?id=g9diuvxN6D
[115]
Anton Voronov, Lena Wolf, and Max Ryabinin. 2024. Mind your format: Towards consistent evaluation of in-context learning improvements. Retrieved from https://arXiv:2401.06766
[116]
Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou’, and Daniel Cer. 2022. SPoT: Better frozen model adaptation through soft prompt transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 5039–5059. DOI:
[117]
Lingxiao Wang, Kevin Huang, Tengyu Ma, Quanquan Gu, and Jing Huang. 2021. Variance-reduced first-order meta-learning for natural language processing tasks. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2609–2615. DOI:
[118]
Lijing Wang, Yingya Li, Timothy Miller, Steven Bethard, and Guergana Savova. 2023. Two-stage fine-tuning for improved bias and variance for large pretrained language models. In Proceedings of the 61st Annual Meeting of the ACL. ACL, 15746–15761. DOI:
[119]
Lucas Weber, Elia Bruni, and Dieuwke Hupkes. 2023. Mind the instructions: A holistic evaluation of consistency and interactions in prompt-based learning. In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL’23). ACL, 294–313. DOI:
[120]
Albert Webson and Ellie Pavlick. 2022. Do prompt-based models really understand the meaning of their prompts? In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2300–2344. DOI:
[121]
Sheng-Lun Wei, Cheng-Kuang Wu, Hen-Hsen Huang, and Hsin-Hsi Chen. 2024. Unveiling selection biases: Exploring order and token sensitivity in large language models. Retrieved from https://arXiv:2406.03009
[122]
Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2022. NoisyTune: A little noise can help you finetune pretrained language models better. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 680–685. DOI:
[123]
Sherry Wu, Hua Shen, Daniel S. Weld, Jeffrey Heer, and Marco Tulio Ribeiro. 2023. ScatterShot: Interactive in-context example curation for text transformation. In Proceedings of the 28th International Conference on Intelligent User Interfaces (IUI’23). Association for Computing Machinery, New York, NY, 353–367. DOI:
[124]
Shijie Wu, Benjamin Van Durme, and Mark Dredze. 2022. Zero-shot cross-lingual transfer is under-specified optimization. In Proceedings of the 7th Workshop on Representation Learning for NLP. ACL, 236–248. DOI:
[125]
Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, and Bryan Kian Hsiang Low. 2024. Prompt optimization with EASE? Efficient ordering-aware automated selection of exemplars. In Proceedings of the International Conference on Machine Learning Workshop on In-Context Learning. Retrieved from https://openreview.net/forum?id=TYxOXHYU6b
[126]
Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, and Lingpeng Kong. 2023. Self-adaptive in-context learning: An information compression perspective for in-context example selection and ordering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. ACL, 1423–1436. DOI:
[127]
Runxin Xu, Fuli Luo, Zhiyuan Zhang, Chuanqi Tan, Baobao Chang, Songfang Huang, and Fei Huang. 2021. Raise a child in large language model: Towards effective and generalizable fine-tuning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, Online and Punta Cana, Dominican Republic, 9514–9528. DOI:
[128]
Shangqing Xu and Chao Zhang. 2024. Misconfidence-based demonstration selection for llm in-context learning. Retrieved from https://arXiv:2401.06301
[129]
Xin Xu, Yue Liu, Panupong Pasupat, Mehran Kazemi et al. 2024. In-context learning with retrieved demonstrations for language models: A survey. Retrieved from https://arXiv:2401.11624
[130]
Zhiyang Xu, Ying Shen, and Lifu Huang. 2023. MultiInstruct: Improving multi-modal zero-shot learning via instruction tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. ACL, 11445–11465. DOI:
[131]
Hansi Yang and James Kwok. 2022. Efficient variance reduction for meta-learning. In Proceedings of the 39th International Conference on Machine Learning. PMLR, 25070–25095. Retrieved from https://proceedings.mlr.press/v162/yang22g.html
[132]
Zhao Yang, Yuanzhe Zhang, Dianbo Sui, Cao Liu, Jun Zhao, and Kang Liu. 2023. Representative demonstration selection for in-context learning with two-stage determinantal point process. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 5443–5456. DOI:
[133]
Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, and Lingpeng Kong. 2023. Compositional exemplars for in-context learning. In Proceedings of the International Conference on Machine Learning. PMLR, 39818–39833.
[134]
Qinyuan Ye, Bill Yuchen Lin, and Xiang Ren. 2021. CrossFit: A few-shot learning challenge for cross-task generalization in NLP. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 7163–7189. DOI:
[135]
Yue Yu, Rongzhi Zhang, Ran Xu, Jieyu Zhang, Jiaming Shen, and Chao Zhang. 2023. Cold-start data selection for better few-shot language model fine-tuning: A prompt-based uncertainty propagation approach. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. ACL, 2499–2521. DOI:
[136]
Pengwei Zhan, Zhen Xu, Qian Tan, Jie Song, and Ru Xie. 2024. Unveiling the lexical sensitivity of LLMs: Combinatorial optimization for prompt enhancement. Retrieved from https://arXiv:2405.20701
[137]
Biao Zhang, Barry Haddow, and Alexandra Birch. 2023. Prompting large language model for machine translation: A case study. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Article 1722, 19 pages.
[138]
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations.
[139]
Haojie Zhang, Ge Li, Jia Li, Zhongjin Zhang, Yuqi Zhu, and Zhi Jin. 2022. Fine-tuning pre-trained language models effectively by optimizing subnetworks adaptively. 15. Retrieved from https://openreview.net/forum?id=-r6-WNKfyhW
[140]
Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, and Rui Yan. 2024. Batch-ICL: Effective, efficient, and order-agnostic in-context learning. Retrieved from https://arXiv:2401.06469
[141]
Miaoran Zhang, Vagrant Gautam, Mingyang Wang, Jesujoba O. Alabi, Xiaoyu Shen, Dietrich Klakow, and Marius Mosbach. 2024. The impact of demonstrations on multilingual in-context learning: A multidimensional analysis. Retrieved from https://arXiv:2402.12976
[142]
Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, and Yoav Artzi. 2021. Revisiting few-sample BERT fine-tuning. Retrieved from arXiv. https://openreview.net/forum?id=cO1IH43yUF
[143]
Yiming Zhang, Shi Feng, and Chenhao Tan. 2022. Active Example Selection for In-Context Learning. Retrieved from http://arxiv.org/abs/2211.04486
[144]
Zhihan Zhang, Shuohang Wang, Wenhao Yu, Yichong Xu, Dan Iter, Qingkai Zeng, Yang Liu, Chenguang Zhu, and Meng Jiang. 2023. Auto-instruct: Automatic instruction generation and ranking for black-box language models. In Proceedings of the Association for Computational Linguistics: EMNLP 2023. ACL, 9850–9867. DOI:
[145]
Feng Zhao, Wan Xianlin, Cheng Yan, and Chu Kiong Loo. 2024. Correcting language model bias for text classification in true zero-shot learning. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING’24). ELRA and ICCL, 4036–4046. Retrieved from https://aclanthology.org/2024.lrec-main.359
[146]
Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vulić, Roi Reichart, Anna Korhonen, and Hinrich Schütze. 2021. A closer look at few-shot crosslingual transfer: The choice of shots matters. In Proceedings of the 59th Annual Meeting of the ACL and the 11th International Joint Conference on Natural Language Processing. ACL, 5751–5767. DOI:
[147]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong et al. 2023. A survey of large language models. Retrieved from https://arXiv:2303.18223
[148]
Yufeng Zhao, Yoshihiro Sakai, and Naoya Inoue. 2024. NoisyICL: A little noise in model parameters calibrates in-context learning. Retrieved from https://arXiv:2402.05515
[149]
Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 12697–12706. Retrieved from https://proceedings.mlr.press/v139/zhao21c.html
[150]
Chujie Zheng, Hao Zhou, Fandong Meng, Jie Zhou, and Minlie Huang. 2023. Large language models are not robust multiple choice selectors. Retrieved from https://openreview.net/forum?id=shr9PXz7T0
[151]
Yanan Zheng, Jing Zhou, Yujie Qian, Ming Ding, Chonghua Liao, Li Jian, Ruslan Salakhutdinov, Jie Tang, Sebastian Ruder, and Zhilin Yang. 2022. FewNLU: Benchmarking state-of-the-art methods for few-shot natural language understanding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 501–516. DOI:
[152]
Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, and Dacheng Tao. 2023. Can chatgpt understand too? A comparative study on chatgpt and fine-tuned bert. Retrieved from https://arXiv:2302.10198
[153]
Ruiqi Zhong, Dhruba Ghosh, Dan Klein, and Jacob Steinhardt. 2021. Are larger pretrained language models uniformly better? Comparing performance at the instance level. In Proceedings of the Association for Computer Linguistics (ACL-IJCNLP’21). ACL, 3813–3827. DOI:
[154]
Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2022. Prompt consistency for zero-shot task generalization. In Proceedings of the ACL (EMNLP’22). ACL, 2613–2626. DOI:
[155]
Han Zhou, Xingchen Wan, Lev Proleev, Diana Mincu, Jilin Chen, Katherine A. Heller, and Subhrajit Roy. 2023. Batch calibration: Rethinking calibration for in-context learning and prompt engineering. Retrieved from https://openreview.net/forum?id=L3FHMoKZcS
[156]
Xiang Zhou, Yixin Nie, Hao Tan, and Mohit Bansal. 2020. The curse of performance instability in analysis datasets: Consequences, source, and suggestions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). ACL, Online, 8215–8228. DOI:
[157]
Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang et al. 2023. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. Retrieved from https://arXiv:2306.04528
[158]
Shaolin Zhu, Menglong Cui, and Deyi Xiong. 2024. Towards robust in-context learning for machine translation with large language models. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING’24). ELRA and ICCL, 16619–16629. Retrieved from https://aclanthology.org/2024.lrec-main.1444
[159]
Donglin Zhuang, Xingyao Zhang, Shuaiwen Song, and Sara Hooker. 2022. Randomness in neural network training: Characterizing the impact of tooling. Proc. Mach. Learn. Syst. 4 (Apr.2022), 316–336. Retrieved from https://proceedings.mlsys.org/paper_files/paper/2022/hash/427e0e886ebf87538afdf0badb805b7f-Abstract.html
[160]
Yongshuo Zong, Tingyang Yu, Bingchen Zhao, Ruchika Chavhan, and Timothy Hospedales. 2023. Fool your (vision and) language model with embarrassingly simple permutations. Retrieved from https://arXiv:2310.01651

Index Terms

  1. A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 57, Issue 1
        January 2025
        984 pages
        EISSN:1557-7341
        DOI:10.1145/3696794
        • Editors:
        • David Atienza,
        • Michela Milano
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 October 2024
        Online AM: 02 September 2024
        Accepted: 22 August 2024
        Revised: 14 August 2024
        Received: 24 February 2023
        Published in CSUR Volume 57, Issue 1

        Check for updates

        Author Tags

        1. Randomness
        2. stability
        3. sensitivity
        4. meta-learning
        5. large language models
        6. fine-tuning
        7. prompting
        8. in-context learning
        9. instruction-tuning
        10. prompt-based learning
        11. PEFT
        12. literature survey

        Qualifiers

        • Survey

        Funding Sources

        • EU Horizon 2020 research and innovation programme
        • European Union under the Horizon Europe
        • European Union under the Horizon Europe

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 475
          Total Downloads
        • Downloads (Last 12 months)475
        • Downloads (Last 6 weeks)113
        Reflects downloads up to 23 Dec 2024

        Other Metrics

        Citations

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media