This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using SGLang :material-arrow-top-right-thin:{ .external }{:target="_blank"} and dstack
.
??? info "Prerequisites"
Once dstack
is installed, go ahead clone the repo, and run dstack init
.
<div class="termy">
```shell
$ git clone https://github.com/dstackai/dstack
$ cd dstack
$ dstack init
```
</div>
Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.
=== "AMD"
<div editor-title="examples/deployment/sglang/amd/.dstack.yml">
```yaml
type: service
name: deepseek-r1-amd
image: lmsysorg/sglang:v0.4.1.post4-rocm620
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
resources:
gpu: MI300x
disk: 300GB
```
</div>
=== "NVIDIA"
<div editor-title="examples/deployment/sglang/nvidia/.dstack.yml">
```yaml
type: service
name: deepseek-r1-nvidia
image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
resources:
gpu: 24GB
```
</div>
To run a configuration, use the dstack apply
command.
$ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
# BACKEND REGION RESOURCES SPOT PRICE
1 runpod EU-RO-1 24xCPU, 283GB, 1xMI300X (192GB) no $2.49
Submit the run deepseek-r1-amd? [y/n]: y
Provisioning...
---> 100%
Once the service is up, the model will be available via the OpenAI-compatible endpoint
at <dstack server URL>/proxy/models/<project name>/
.
curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
-X POST \
-H 'Authorization: Bearer <dstack token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is Deep Learning?"
}
],
"stream": true,
"max_tokens": 512
}'
When a gateway is configured, the OpenAI-compatible endpoint
is available at https://gateway.<gateway domain>/
.
The source-code of this example can be found in
examples/llms/deepseek/sglang
:material-arrow-top-right-thin:{ .external }{:target="_blank"}.