Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3589334.3645420acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Published: 13 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of large language models (LLMs). Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a costly API-based LLM or a locally finetuned small LLM, weighing cost against performance. Traditional selection methods often evaluate every candidate model before choosing one, which are becoming impractical given the rising costs of training and finetuning LLMs. Moreover, it is undesirable to allocate excessive resources towards exploring poor-performing models. While some recent works leverage online bandit algorithm to manage such exploration-exploitation trade-off in model selection, they tend to overlook the increasing-then-converging trend in model performances as the model is iteratively finetuned, leading to less accurate predictions and suboptimal model selections.
    In this paper, we propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances due to training or finetuning and efficiently balances exploration and exploitation in model selection. To further capture the converging points of models, we develop a change detection mechanism by comparing consecutive increase predictions. We theoretically prove that our algorithm achieves a lower regret upper bound, improving from prior works' polynomial regret to logarithmic in a similar setting. The advantage of our method is also empirically validated through extensive experiments on classification model selection and online selection of LLMs. Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection in the deployment of LLMs.

    Supplemental Material

    MP4 File
    Supplemental video

    References

    [1]
    Robin Allesiardo, Raphaël Féraud, and Odalric-Ambrym Maillard. 2017. The non-stationary stochastic multi-armed bandit problem. International Journal of Data Science and Analytics, Vol. 3 (2017), 267--283.
    [2]
    Peter Auer, Pratik Gajane, and Ronald Ortner. 2019. Adaptively tracking the best bandit arm with an unknown number of distribution changes. In Conference on Learning Theory. PMLR, 138--158.
    [3]
    Omar Besbes, Yonatan Gur, and Assaf Zeevi. 2014. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems, Vol. 27 (2014), 199--207.
    [4]
    Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, and Julien Seznec. 2022. Efficient change-point detection for tackling piecewise-stationary bandits. The Journal of Machine Learning Research, Vol. 23, 1 (2022), 3337--3376.
    [5]
    Djallel Bouneffouf and Raphael Féraud. 2016. Multi-armed bandit problem with known trend. Neurocomputing, Vol. 205 (2016), 16--21.
    [6]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
    [7]
    Yang Cao, Zheng Wen, Branislav Kveton, and Yao Xie. 2019. Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 418--427.
    [8]
    Leonardo Cella, Massimiliano Pontil, and Claudio Gentile. 2021. Best Model Identification: A Rested Bandit Formulation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 1362--1372. https://proceedings.mlr.press/v139/cella21a.html
    [9]
    Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2022. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
    [10]
    Richard Combes and Alexandre Proutiere. 2014. Unimodal bandits: Regret lower bounds and optimal algorithms. In International Conference on Machine Learning. PMLR, 521--529.
    [11]
    Ashok Cutkosky and Kwabena Boahen. 2017. Online learning without prior information. In Conference on learning theory. PMLR, 643--677.
    [12]
    Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80. 1436--1445.
    [13]
    Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. 2020. Auto-sklearn 2.0: The next generation. arXiv preprint arXiv:2007.04074, Vol. 24 (2020).
    [14]
    Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. Advances in neural information processing systems, Vol. 28 (2015).
    [15]
    Dylan J Foster, Satyen Kale, Mehryar Mohri, and Karthik Sridharan. 2017. Parameter-Free Online Learning via Model Selection. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/a2186aa7c086b46ad4e8bf81e2a3a19b-Paper.pdf
    [16]
    Dylan J Foster, Akshay Krishnamurthy, and Haipeng Luo. 2019. Model Selection for Contextual Bandits. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/433371e69eb202f8e7bc8ec2c8d48021-Paper.pdf
    [17]
    Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. In Proceedings of the 24th Annual Conference on Learning Theory (Proceedings of Machine Learning Research, Vol. 19), Sham M. Kakade and Ulrike von Luxburg (Eds.). PMLR, Budapest, Hungary, 359--376. https://proceedings.mlr.press/v19/garivier11a.html
    [18]
    Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, 174--188.
    [19]
    Hoda Heidari, Michael J Kearns, and Aaron Roth. 2016. Tight Policy Regret Bounds for Improving and Decaying Bandits. In IJCAI. 1562--1570.
    [20]
    Maxime Heuillet, Benoit Debaque, and Audrey Durand. 2021. Sequential Automated Machine Learning: Bandits-driven Exploration using a Collaborative Filtering Representation. In 8th ICML Workshop on Automated Machine Learning (AutoML). https://openreview.net/forum?id=6tlvEH9HaX
    [21]
    Kristen Howell, Gwen Christian, Pavel Fomitchov, Gitit Kehat, Julianne Marzulla, Leanne Rolston, Jadin Tredup, Ilana Zimmerman, Ethan Selfridge, and Joseph Bradley. 2023. The economic trade-offs of large language models: A case study. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). Association for Computational Linguistics, Toronto, Canada, 248--267. https://doi.org/10.18653/v1/2023.acl-industry.24
    [22]
    Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated machine learning: methods, systems, challenges. Springer Nature.
    [23]
    Mohammad Reza Karimi, Nezihe Merve Gürel, Bojan Karlavs, Johannes Rausch, Ce Zhang, and Andreas Krause. 2021. Online active model selection for pre-trained classifiers. In International Conference on Artificial Intelligence and Statistics. PMLR, 307--315.
    [24]
    Lars Kotthoff, Chris Thornton, Holger H Hoos, Frank Hutter, and Kevin Leyton-Brown. 2019. Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. Automated machine learning: methods, systems, challenges (2019), 81--95.
    [25]
    Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
    [26]
    Nir Levine, Koby Crammer, and Shie Mannor. 2017. Rotting bandits. Advances in neural information processing systems, Vol. 30 (2017).
    [27]
    Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. In Journal of Machine Learning Research, Vol. 18. 1--52.
    [28]
    Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang, and Bin Cui. 2020. Efficient automatic CASH via rising bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4763--4771.
    [29]
    Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74--81.
    [30]
    Fang Liu, Joohyun Lee, and Ness Shroff. 2018. A change-detection based framework for piecewise-stationary multi-armed bandit problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
    [31]
    Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, and Alexander Gray. 2020. An ADMM based framework for automl pipeline configuration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4892--4899.
    [32]
    Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. 142--150.
    [33]
    Alberto Maria Metelli, Francesco Trovo, Matteo Pirola, and Marcello Restelli. 2022. Stochastic Rising Bandits. In International Conference on Machine Learning. PMLR, 15421--15457.
    [34]
    Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018. Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1797--1807. https://doi.org/10.18653/v1/D18--1206
    [35]
    OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs.CL]
    [36]
    Francesco Orabona. 2014. Simultaneous model selection and optimization through parameter-free stochastic learning. Advances in Neural Information Processing Systems, Vol. 27 (2014).
    [37]
    Abraham Toluwase Owodunni and Chris Chinenye Emezue. 2023. Koya: A Recommender System for Large Language Model Selection. In 4th Workshop on African Natural Language Processing. https://openreview.net/forum?id=5DGm3lou3z
    [38]
    Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, et al. 2023. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813 (2023).
    [39]
    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
    [40]
    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, Vol. 21, 1 (2020), 5485--5551.
    [41]
    Yoan Russac, Claire Vernade, and Olivier Cappé. 2019. Weighted linear bandits for non-stationary environments. Advances in Neural Information Processing Systems, Vol. 32 (2019).
    [42]
    Julian Salazar, Davis Liang, Toan Q Nguyen, and Katrin Kirchhoff. 2020. Masked Language Model Scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2699--2712.
    [43]
    Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, and Michal Valko. 2019. Rotting bandits are no harder than stochastic ones. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2564--2572.
    [44]
    Julien Seznec, Pierre Menard, Alessandro Lazaric, and Michal Valko. 2020. A single algorithm for both restless and rested rotting bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 3784--3794.
    [45]
    Cem Tekin and Mingyan Liu. 2012. Online learning of rested and restless bandits. IEEE Transactions on Information Theory, Vol. 58, 8 (2012), 5588--5611.
    [46]
    Francesco Trovo, Stefano Paladino, Marcello Restelli, and Nicola Gatti. 2020. Sliding-window thompson sampling for non-stationary settings. Journal of Artificial Intelligence Research, Vol. 68 (2020), 311--364.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Proceedings of the ACM on Web Conference 2024
    May 2024
    4826 pages
    ISBN:9798400701719
    DOI:10.1145/3589334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. large language model
    2. model selection
    3. multi-armed bandit
    4. online learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 108
      Total Downloads
    • Downloads (Last 12 months)108
    • Downloads (Last 6 weeks)57

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media