RunAI Platform Vs Kubernetes
RunAI Platform Vs Kubernetes
RunAI Platform Vs Kubernetes
Run:AI abstracts workloads from underlying GPU and compute infrastructure, by creating a
shared pool of resources that can be dynamically provisioned, enabling full utilization of GPU
compute by various distributed teams within enterprises.
Data science and IT teams gain control and real-time visibility – including seeing and
provisioning run-time, queueing, and GPU utilization of each job. In addition to real time visibility
the platform also displays historical metrics of cluster resources allowing the enterprise to make
more informed, analytical decisions.
The following capabilities which are critical for high GPU utilization and business units delivery are unique
to Run:AI and are not supported with the default Kubernetes scheduler :
● Guaranteed quotas
With the Run:AI scheduler and guaranteed quotas, the platform ensures that departments at
minimum can utilize a defined number of GPU resources. However the scheduler allows
departments to also exceed their quota and consume additional idle resources within the cluster
greatly increasing GPU utilization. Default Kubernetes scheduling only allows provisioning of
resources that are statically assigned to the respective namespace of the department.
● Automatic queueing/de-queueing
The Run:AI scheduler enables data scientists to easily queue many jobs at once that are
assigned automatically to available GPU resources based on advanced quotas and priorities,
policies and scheduling algorithms. Once the job has run to completion the workload is detached
from the GPU and made available to the next job for scheduling. Automatic
queueing/de-queueing allows administrators to take a hands off approach to resource allocation
and management while ensuring efficient sharing of resources.