Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters
Abstract
References
Index Terms
- Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters
Recommendations
Lyra: Elastic Scheduling for Deep Learning Clusters
EuroSys '23: Proceedings of the Eighteenth European Conference on Computer SystemsOrganizations often build separate training and inference clusters for deep learning, and use separate schedulers to manage them. This leads to problems for both: inference clusters have low utilization when the traffic load is low; training jobs often ...
Dynamic Job Scheduling on Heterogeneous Clusters
ISPDC '09: Proceedings of the 2009 Eighth International Symposium on Parallel and Distributed ComputingThis paper addresses the problem of scheduling dynamicallymulti-user and independent jobs on clusters, both homogeneous and heterogeneous. The dynamic behaviormeans that the scheduler is able to adapt the schedulingwhen new jobs are submitted and also ...
Dynamic scheduling of a batch of parallel task jobs on heterogeneous clusters
This paper addresses the problem of minimizing the scheduling length (make-span) of a batch of jobs with different arrival times. A job is described by a direct acyclic graph (DAG) of parallel tasks. The paper proposes a dynamic scheduling method that ...
Comments
Information & Contributors
Information
Published In
- General Chairs:
- Nael Abu-Ghazaleh,
- Rajiv Gupta,
- Program Chairs:
- Madan Musuvathi,
- Dan Tsafrir
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Badges
Qualifiers
- Research-article
Funding Sources
- the Science and Technology Development Fund of Macau
Conference
Acceptance Rates
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 822Total Downloads
- Downloads (Last 12 months)822
- Downloads (Last 6 weeks)179
Other Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in