Using Large Language Models for Hyperparameter Optimization

Zhang, Michael R.; Desai, Nishkrit; Bae, Juhan; Lorraine, Jonathan; Ba, Jimmy

Computer Science > Machine Learning

arXiv:2312.04528 (cs)

[Submitted on 7 Dec 2023 (v1), last revised 11 Nov 2024 (this version, v2)]

Title:Using Large Language Models for Hyperparameter Optimization

Authors:Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba

View PDF HTML (experimental)

Abstract:This paper explores the use of foundational large language models (LLMs) in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning models, yet their optimization often relies on manual approaches in limited-budget settings. By prompting LLMs with dataset and model descriptions, we develop a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance. Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods like Bayesian optimization across different models on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches.

Comments:	28 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2312.04528 [cs.LG]
	(or arXiv:2312.04528v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.04528

Submission history

From: Michael Zhang [view email]
[v1] Thu, 7 Dec 2023 18:46:50 UTC (3,684 KB)
[v2] Mon, 11 Nov 2024 17:30:55 UTC (3,878 KB)

Computer Science > Machine Learning

Title:Using Large Language Models for Hyperparameter Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Using Large Language Models for Hyperparameter Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators