research-article

Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Authors:

Yu Xia,

Fang Kong,

Tong Yu,

Liya Guo,

Ryan A. Rossi,

Sungchul Kim, and

Shuai LiAuthors Info & Claims

WWW '24: Proceedings of the ACM on Web Conference 2024

May 2024

Pages 4059 - 4070

https://doi.org/10.1145/3589334.3645420

Published: 13 May 2024 Publication History

Get Access

Abstract

Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of large language models (LLMs). Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a costly API-based LLM or a locally finetuned small LLM, weighing cost against performance. Traditional selection methods often evaluate every candidate model before choosing one, which are becoming impractical given the rising costs of training and finetuning LLMs. Moreover, it is undesirable to allocate excessive resources towards exploring poor-performing models. While some recent works leverage online bandit algorithm to manage such exploration-exploitation trade-off in model selection, they tend to overlook the increasing-then-converging trend in model performances as the model is iteratively finetuned, leading to less accurate predictions and suboptimal model selections.

In this paper, we propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances due to training or finetuning and efficiently balances exploration and exploitation in model selection. To further capture the converging points of models, we develop a change detection mechanism by comparing consecutive increase predictions. We theoretically prove that our algorithm achieves a lower regret upper bound, improving from prior works' polynomial regret to logarithmic in a similar setting. The advantage of our method is also empirically validated through extensive experiments on classification model selection and online selection of LLMs. Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection in the deployment of LLMs.

Supplemental Material

MP4 File

Supplemental video

Download
78.24 MB

References

[1]

Robin Allesiardo, Raphaël Féraud, and Odalric-Ambrym Maillard. 2017. The non-stationary stochastic multi-armed bandit problem. International Journal of Data Science and Analytics, Vol. 3 (2017), 267--283.

Abstract

Supplemental Material

References

Index Terms

Recommendations

Bandits with switching costs: T2/3 regret

High-Probability Kernel Alignment Regret Bounds for Online Kernel Selection

Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations

Bandits with switching costs: T^2/3 regret