LoRAX allows serving multiple fine-tuned models optimized on the same endpoint by dynamically loading and switching LoRA adapters.
The following command deploys Mistral 7B Instruct as a base model via a service:
dstack apply -f examples/deployment/lorax/.dstack.yml
See the configuration at .dstack.yml.
For more details, refer to services.