Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads

Wilkins, Grant; Keshav, Srinivasan; Mortier, Richard

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2407.00010 (cs)

[Submitted on 25 Apr 2024]

Title:Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads

Authors:Grant Wilkins, Srinivasan Keshav, Richard Mortier

View PDF HTML (experimental)

Abstract:Both the training and use of Large Language Models (LLMs) require large amounts of energy. Their increasing popularity, therefore, raises critical concerns regarding the energy efficiency and sustainability of data centers that host them. This paper addresses the challenge of reducing energy consumption in data centers running LLMs. We propose a hybrid data center model that uses a cost-based scheduling framework to dynamically allocate LLM tasks across hardware accelerators that differ in their energy efficiencies and computational capabilities. Specifically, our workload-aware strategy determines whether tasks are processed on energy-efficient processors or high-performance GPUs based on the number of input and output tokens in a query. Our analysis of a representative LLM dataset, finds that this hybrid strategy can reduce CPU+GPU energy consumption by 7.5% compared to a workload-unaware baseline.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.00010 [cs.DC]
	(or arXiv:2407.00010v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2407.00010

Submission history

From: Grant Wilkins [view email]
[v1] Thu, 25 Apr 2024 11:24:08 UTC (164 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators